A functional analysis of omic network embedding spaces reveals key altered functions in cancer
AbstractMotivationAdvances in omics technologies have revolutionized cancer research by producing massive datasets. Common approaches to deciphering these complex data are by embedding algorithms of molecular interaction networks. These algorithms find a low-dimensional space in which similarities between the network nodes are best preserved. Currently available embedding approaches mine the gene embeddings directly to uncover new cancer-related knowledge. However, these gene-centric approaches produce incomplete knowledge, since they do not account for the functional implications of genomic alterations. We propose a new, function-centric perspective and approach, to complement the knowledge obtained from omic data.ResultsWe introduce our Functional Mapping Matrix to explore the functional organization of different tissue-specific and species-specific embedding spaces generated by a Non-negative Matrix Tri-Factorization algorithm. Also, we use our FMM to define the optimal dimensionality of these molecular interaction network embedding spaces. For this optimal dimensionality, we compare the FMMs of the most prevalent cancers in human to FMMs of their corresponding control tissues. We find that cancer alters the positions in the embedding space of cancer-related functions, while it keeps the positions of the non-cancer-related ones. We exploit this spacial “movement” to predict novel cancer-related functions. Finally, we predict novel cancer-related genes that the currently available methods for gene-centric analyses cannot identify; we validate these predictions by literature curation and retrospective analyses of patient survival data.AvailabilityData and source code can be accessed at https://github.com/gaiac/FMMSupplementary informationSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends
Predicting the pathogenicity of missense variants using features derived from AlphaFold2
AbstractMotivationMissense variants are a frequent class of variation within the coding genome, and some of them cause Mendelian diseases. Despite advances in computational prediction, classifying missense variants into pathogenic or benign remains a major challenge in the context of personalized medicine. Recently, the structure of the human proteome was derived with unprecedented accuracy using the artificial intelligence system AlphaFold2. This raises the question of whether AlphaFold2 wild-type structures can improve the accuracy of computational pathogenicity prediction for missense variants.ImplementationTo address this, we first engineered a set of features for each amino acid from these structures. We then trained a random forest to distinguish between relatively common (proxy-benign) and singleton (proxy-pathogenic) missense variants from gnomAD v3.1. This yielded a novel AlphaFold2-based pathogenicity prediction score, termed AlphScore.ResultsImportant feature classes used by AlphScore are solvent accessibility, amino acid network related features, features describing the physicochemical environment, and AlphaFold2’s quality parameter (pLDDT). AlphScore alone showed lower performance than existing in silico scores used for missense prediction, such as CADD or REVEL. However, when AlphScore was added to those scores, the performance increased, as measured by the approximation of deep mutational scan data, as well as the prediction of expert-curated missense variants from the ClinVar database. Overall, our data indicate that the integration of AlphaFold2 predicted structures can improve pathogenicity prediction of missense variants.AvailabilityAlphScore, combinations of AlphScore with existing scores, as well as variants used for training and testing are publicly available.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends
Effective design and inference for cell sorting and sequencing based massively parallel reporter assays
AbstractMotivationThe ability to measure the phenotype of millions of different genetic designs using Massively Parallel Reporter Assays (MPRAs) has revolutionised our understanding of genotype-to-phenotype relationships and opened avenues for data-centric approaches to biological design. However, our knowledge of how best to design these costly experiments and the effect that our choices have on the quality of the data produced is lacking.ResultsIn this article, we tackle the issues of data quality and experimental design by developing FORECAST, a Python package that supports the accurate simulation of cell-sorting and sequencing based MPRAs and robust maximum likelihood based inference of genetic design function from MPRA data. We use FORECAST's capabilities to reveal rules for MPRA experimental design that help ensure accurate genotype-to-phenotype links and show how the simulation of MPRA experiments can help us better understand the limits of prediction accuracy when this data is used for training deep learning based classifiers. As the scale and scope of MPRAs grows, tools like FORECAST will help ensure we make informed decisions during their development and the most of the data produced.Availability and implementationThe FORECAST package is available at: https://gitlab.com/Pierre-Aurelien/forecast. Code for the deep learning analysis performed in this study is available at: https://gitlab.com/Pierre-Aurelien/rebeca.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends
spongEffects: ceRNA modules offer patient-specific insights into the miRNA regulatory landscape
AbstractMotivationCancer is one of the leading causes of death worldwide. Despite significant improvements in prevention and treatment, mortality remains high for many cancer types. Hence, innovative methods that use molecular data to stratify patients and identify biomarkers are needed. Promising biomarkers can also be inferred from competing endogenous RNA (ceRNA) networks that capture the gene-miRNA gene regulatory landscape. Thus far, the role of these biomarkers could only be studied globally but not in a sample-specific manner. To mitigate this, we introduce spongEffects, a novel method that infers subnetworks (or modules) from ceRNA networks and calculates patient- or sample-specific scores related to their regulatory activity.ResultsWe show how spongEffects can be used for downstream interpretation and machine learning tasks such as tumor classification and for identifying subtype-specific regulatory interactions. In a concrete example of breast cancer subtype classification, we prioritize modules impacting the biology of the different subtypes. In summary, spongEffects prioritizes ceRNA modules as biomarkers and offers insights into the miRNA regulatory landscape. Notably, these module scores can be inferred from gene expression data alone and can thus be applied to cohorts where miRNA expression information is lacking.Availabilityhttps://bioconductor.org/packages/devel/bioc/html/SPONGE.htmlSupplementary dataare available at Bioinformatics online.
Categories: Bioinformatics Trends
AcrNET: Predicting anti-CRISPR with Deep Learning
AbstractMotivationAs an important group of proteins discovered in phages, anti-CRISPR inhibits the activity of the immune system of bacteria (i.e., CRISPR-Cas), offering promise for gene editing and phage therapy. However, the prediction and discovery of anti-CRISPR are challenging due to their high variability and fast evolution. Existing biological studies rely on known CRISPR and anti-CRISPR pairs, which may not be practical considering the huge number. Computational methods struggle with prediction performance. To address these issues, we propose a novel deep neural network for anti-CRISPR analysis (AcrNET), which achieves significant performance.ResultsOn both the cross-fold and cross-dataset validation, our method outperforms the state-of-the-art methods. Notably, AcrNET improves the prediction performance by at least 15% regarding the F1 score for the cross-dataset test problem comparing with state-of-art Deep Learning method. Moreover, AcrNET is the first computational method to predict the detailed anti-CRISPR classes, which may help illustrate the anti-CRISPR mechanism. Taking advantage of a Transformer protein language model ESM-1b, which was pre-trained on 250 million protein sequences, AcrNET overcomes the data scarcity problem. Extensive experiments and analysis suggest that the Transformer model feature, evolutionary feature, and local structure feature complement each other, which indicates the critical properties of anti-CRISPR proteins. AlphaFold prediction, further motif analysis, and docking experiments further demonstrate that AcrNET can capture the evolutionarily conserved pattern and the interaction between anti-CRISPR and the target implicitly.Availability and ImplementationWeb server: https://proj.cse.cuhk.edu.hk/aihlab/AcrNET/. Training code and pre-trained model are available at https://github.com/banma12956/AcrNET.•Supplementary Information
Categories: Bioinformatics Trends
FAS: Assessing the similarity between proteins using multi-layered feature architectures
AbstractMotivationProtein sequence comparison is a fundamental element in the bioinformatics toolkit. When sequences are annotated with features such as functional domains, transmembrane domains, low complexity regions or secondary structure elements, the resulting feature architectures allow better informed comparisons. However, many existing schemes for scoring architecture similarities cannot cope with features arising from multiple annotation sources. Those that do fall short in the resolution of overlapping and redundant feature annotations.ResultsHere, we introduce FAS, a scoring method that integrates features from multiple annotation sources in a directed acyclic architecture graph. Redundancies are resolved as part of the architecture comparison by finding the paths through the graphs that maximise the pair-wise architecture similarity. In a large-scale evaluation on more than 10,000 human-yeast ortholog pairs, architecture similarities assessed with FAS are consistently more plausible than those obtained using e-values to resolve overlaps or leaving overlaps unresolved. Three case studies demonstrate the utility of FAS on architecture comparison tasks: benchmarking of orthology assignment software, identification of functionally diverged orthologs, and diagnosing protein architecture changes stemming from faulty gene predictions. With the help of FAS, feature architecture comparisons can now be routinely integrated into these and many other applications.AvailabilityFAS is available as python package: https://pypi.org/project/greedyFAS/Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends
Deciphering associations between gut microbiota and clinical factors using microbial modules
AbstractMotivationHuman gut microbiota plays a vital role in maintaining body health. The dysbiosis of gut microbiota is associated with a variety of diseases. It is critical to uncover the associations between gut microbiota and disease states as well as other intrinsic or environmental factors. However, inferring alterations of individual microbial taxa based on relative abundance data likely leads to false associations and conflicting discoveries in different studies. Moreover, the effects of underlying factors and microbe-microbe interactions could lead to the alteration of larger sets of taxa. It might be more robust to investigate gut microbiota using groups of related taxa instead of the composition of individual taxa.ResultsWe proposed a novel method to identify underlying microbial modules, i.e., groups of taxa with similar abundance patterns affected by a common latent factor, from longitudinal gut microbiota and applied it to inflammatory bowel disease (IBD). The identified modules demonstrated closer intra-group relationships, indicating potential microbe-microbe interactions and influences of underlying factors. Associations between the modules and several clinical factors were investigated, especially disease states. The IBD-associated modules performed better in stratifying the subjects compared to the relative abundance of individual taxa. The modules were further validated in external cohorts, demonstrating the efficacy of the proposed method in identifying general and robust microbial modules. The study reveals the benefit of considering the ecological effects in gut microbiota analysis and the great promise of linking clinical factors with underlying microbial modules.Availabilityhttps://github.com/rwang-z/microbial_module.git.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends
DFHiC: A dilated full convolution model to enhance the resolution of Hi-C data
AbstractMotivationHi-C technology has been the most widely used chromosome conformation capture(3C) experiment that measures the frequency of all paired interactions in the entire genome, which is a powerful tool for studying the 3D structure of the genome. The fineness of the constructed genome structure depends on the resolution of Hi-C data. However, due to the fact that high-resolution Hi-C data require deep sequencing and thus high experimental cost, most available Hi-C data are in low-resolution. Hence, it is essential to enhance the quality of Hi-C data by developing the effective computational methods.ResultsIn this work, we propose a novel method, so-called DFHiC, which generates the high-resolution Hi-C matrix from the low-resolution Hi-C matrix in the framework of the dilated convolutional neural network. The dilated convolution is able to effectively explore the global patterns in the overall Hi-C matrix by taking advantage of the information of the Hi-C matrix in a way of the longer genomic distance. Consequently, DFHiC can improve the resolution of the Hi-C matrix reliably and accurately. More importantly, the super-resolution Hi-C data enhanced by DFHiC is more in line with the real high-resolution Hi-C data than those done by the other existing methods, in terms of both chromatin significant interactions and identifying topologically associating domains (TADs).Availabilityhttps://github.com/BinWangCSU/DFHiCSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends
matchRanges: Generating null hypothesis genomic ranges via covariate-matched sampling
AbstractMotivationDeriving biological insights from genomic data commonly requires comparing attributes of selected genomic loci to a null set of loci. The selection of this null set is non trivial, as it requires careful consideration of potential covariates, a problem that is exacerbated by the non-uniform distribution of genomic features including genes, enhancers, and transcription factor binding sites. Propensity score-based covariate matching methods allow selection of null sets from a pool of possible items while controlling for multiple covariates; however, existing packages do not operate on genomic data classes and can be slow for large data sets making them difficult to integrate into genomic workflows.ResultsTo address this, we developed matchRanges, a propensity score-based covariate matching method for the efficient and convenient generation of matched null ranges from a set of background ranges within the Bioconductor framework.Availability and implementationPackage: https://bioconductor.org/packages/nullranges, Code: https://github.com/nullranges, Documentation: https://nullranges.github.io/nullranges.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends
DEGGs: an R package with shiny app for the identification of Differentially Expressed Gene-Gene interactions in high-throughput sequencing data
AbstractSummaryThe discovery of differential gene-gene correlations across phenotypical groups can help identify the activation/deactivation of critical biological processes underlying specific conditions. The presented R package, provided with a count and design matrix, extract networks of group-specific interactions that can be interactively explored through a shiny user-friendly interface. For each gene-gene link, differential statistical significance is provided through robust linear regression with an interaction term.AvailabilityDEGGs is implemented in R and available on GitHub at https://github.com/elisabettasciacca/DEGGs. The package is also under submission on Bioconductor.
Categories: Bioinformatics Trends
Graph Convolutional Network-based Feature Selection for High-dimensional and Low-sample Size Data
AbstractMotivationFeature selection is a powerful dimension reduction technique which selects a subset of relevant features for model construction. Numerous feature selection methods have been proposed, but most of them fail under the high-dimensional and low-sample size (HDLSS) setting due to the challenge of overfitting.ResultsWe present a deep learning-based method—GRAph Convolutional nEtwork feature Selector (GRACES) – to select important features for HDLSS data. GRACES exploits latent relations between samples with various overfitting-reducing techniques to iteratively find a set of optimal features which gives rise to the greatest decreases in the optimization loss. We demonstrate that GRACES significantly outperforms other feature selection methods on both synthetic and real-world datasets.Availability and implementationThe source code is publicly available at https://github.com/canc1993/graces.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends
RING-PyMOL: residue interaction networks of structural ensembles and molecular dynamics
Abstract•RING-PyMOL is a plugin for PyMOL providing a set of analysis tools for structural ensembles and molecular dynamic (MD) simulations. RING-PyMOL combines residue interaction networks, as provided by the RING software, with structural clustering to enhance the analysis and visualization of the conformational complexity. It combines precise calculation of non-covalent interactions with the power of PyMOL to manipulate and visualize protein structures. The plugin identifies and highlights correlating contacts and interaction patterns that can explain structural allostery, active sites and structural heterogeneity connected with molecular function. It is easy to use and extremely fast, processing and rendering hundreds of models and long trajectories in seconds. RING-PyMOL generates a number of interactive plots and output files for use with external tools. The underlying RING software has been improved extensively. It is ten times faster, can process mmCIF files and it identifies typed interactions also for nucleic acids.Availability and implementationhttps://github.com/BioComputingUP/ring-pymol
Categories: Bioinformatics Trends
LogBTF: Gene regulatory network inference using Boolean threshold network model from single-cell gene expression data
AbstractMotivationFrom a systematic perspective, it is crucial to infer and analyze gene regulatory network (GRN) from high-throughput single-cell RNA sequencing (scRNA-seq) data. However, most existing GRN inference methods mainly focus on the network topology, only few of them consider how to explicitly describe the updated logic rules of regulation in GRNs to obtain their dynamics. Moreover, some inference methods also fail to deal with the over-fitting problem caused by the noise in time series data.ResultsIn this paper, we propose a novel embedded Boolean threshold network method called LogBTF, which effectively infers GRN by integrating regularized logistic regression and Boolean threshold function. First, the continuous gene expression values are converted into Boolean values and the elastic net regression model is adopted to fit the binarized time series data. Then, the estimated regression coefficients are applied to represent the unknown Boolean threshold function of the candidate Boolean threshold network as the dynamical equations. To overcome the multi-collinearity and over-fitting problems, a new and effective approach is designed to optimize the network topology by adding a perturbation design matrix to the input data and thereafter setting sufficiently small elements of the output coefficient vector to zeros. In addition, the cross-validation procedure is implemented into the Boolean threshold network model framework to strengthen the inference capability. Finally, extensive experiments on one simulated Boolean value dataset, dozens of simulation datasets and three real scRNA-seq datasets demonstrate that the LogBTF method can infer GRNs from time series data more accurately than some other alternative methods for GRN inference.Availability and implementationThe source data and code are available at https://github.com/zpliulab/LogBTF.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends
The DynaSig-ML Python package: automated learning of biomolecular dynamics-function relationships
Abstract The DynaSig-ML (“Dynamical Signatures—Machine Learning”) Python package allows the efficient, user-friendly exploration of 3D dynamics-function relationships in biomolecules, using datasets of experimental measures from large numbers of sequence variants. It does so by predicting 3D structural dynamics for every variant using the Elastic Network Contact Model (ENCoM), a sequence-sensitive coarse-grained normal mode analysis model. Dynamical Signatures represent the fluctuation at every position in the biomolecule and are used as features fed into machine learning models of the user's choice. Once trained, these models can be used to predict experimental outcomes for theoretical variants. The whole pipeline can be run with just a few lines of Python and modest computational resources. The compute-intensive steps are easily parallelized in the case of either large biomolecules or vast amounts of sequence variants. As an example application, we use the DynaSig-ML package to predict the maturation efficiency of human microRNA miR-125a variants from high-throughput enzymatic assays.AvailabilityDynaSig-ML is open-source software available at https://github.com/gregorpatof/dynasigml_packageSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends
Molecular Property Prediction by Contrastive Learning with Attention-Guided Positive Sample Selection
AbstractMotivationPredicting molecular properties is one of the fundamental problems in drug design and discovery. In recent years, self-supervised learning has shown its promising performance in image recognition, natural language processing, and single-cell data analysis. Contrastive learning is a typical self-supervised learning method used to learn the features of data so that the trained model can more effectively distinguish the data. One important issue of contrastive learning is how to select positive samples for each training example, which will significantly impact the performance of contrastive learning.ResultsIn this paper, we propose a new method for molecular property prediction by Contrastive Learning with Attention-guided Positive-sample Selection (CLAPS). Firstly, we generate positive samples for each training example based on an attention-guided selection scheme. Secondly, we employ a Transformer encoder to extract latent feature vectors and compute the contrastive loss aiming to distinguish positive and negative sample pairs. Finally, we use the trained encoder for predicting molecular properties. Experiments on various benchmark datasets show that our approach outperforms the state-of-the-art (SOTA) methods in most cases.AvailabilityThe code is publicly available at https://github.com/wangjx22/CLAPS.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends
CONNECTOR, fitting and clustering of longitudinal data to reveal a new risk stratification system
AbstractMotivationThe transition from evaluating a single time point to examining the entire dynamic evolution of a system is possible only in the presence of the proper framework. The strong variability of dynamic evolution makes the definition of an explanatory procedure for data fitting and clustering challenging.ResultsWe developed CONNECTOR, a data-driven framework able to analyze and inspect longitudinal data in a straightforward and revealing way. When used to analyze tumor growth kinetics over time in 1599 patient-derived xenograft growth curves from ovarian and colorectal cancers, CONNECTOR allowed the aggregation of time-series data through an unsupervised approach in informative clusters. We give a new perspective of mechanism interpretation, specifically, we define novel model aggregations and we identify unanticipated molecular associations with response to clinically approved therapies.AvailabilityCONNECTOR is freely available under GNU GPL license at https://qbioturin.github.io/connector and dx.doi.org/10.17504/protocols.io.8epv56e74g1b/v1.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends
ConsAlign: simultaneous RNA structural aligner based on rich transfer learning and thermodynamic ensemble model of alignment scoring
AbstractMotivationTo capture structural homology in RNAs, alignment and folding (= AF) of RNA homologs has been a fundamental framework around RNA science. Learning sufficient scoring parameters for simultaneous AF (= SAF) is an undeveloped subject because evaluating them is computationally expensive.ResultsWe developed ConsTrain—a gradient-based machine learning method for rich SAF scoring. We also implemented ConsAlign—a SAF tool composed of ConsTrain’s learned scoring parameters. To aim for better AF quality, ConsAlign employs (1) transfer learning from well-defined scoring models and (2) the ensemble model between the ConsTrain model and a well-established thermodynamic scoring model. Keeping comparable running time, ConsAlign demonstrated competitive AF prediction quality among current AF tools.Availability and implementationOur code and our data are freely available at https://github.com/heartsh/consalign and https://github.com/heartsh/consprob-trained.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics
Categories: Bioinformatics Trends
PyHMMER: A Python library binding to HMMER for efficient sequence analysis
AbstractSummaryPyHMMER provides Python integration of the popular profile Hidden Markov Model software HMMER via Cython bindings. This allows annotation of protein sequences with profile HMMs and building new ones directly with Python. PyHMMER increases flexibility of use, allowing creating queries directly from Python code, launching searches and obtaining results without I/O, or accessing previously unavailable statistics like uncorrected p-values. A new parallelization model greatly improves performance when running multithreaded searches, while producing the exact same results as HMMER.Availability and implementationPyHMMER supports all modern Python versions (Python 3.6+) and similar platforms as HMMER (x86 or PowerPC UNIX systems). Pre-compiled packages are released via PyPI (https://pypi.org/project/pyhmmer/) and Bioconda (https://anaconda.org/bioconda/pyhmmer). The PyHMMER source code is available under the terms of the open-source MIT licence and hosted on GitHub (https://github.com/althonos/pyhmmer); its documentation is available on ReadTheDocs (https://pyhmmer.readthedocs.io).Supplementary informationSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends
Evolink: A Phylogenetic Approach for Rapid Identification of Genotype-Phenotype Associations in Large-scale Microbial Multi-Species Data
AbstractMotivationThe discovery of the genetic features that underly a phenotype is a fundamental task in microbial genomics. With the growing number of microbial genomes that are paired with phenotypic data, new challenges and opportunities are arising for genotype-phenotype inference. Phylogenetic approaches are frequently used to adjust for the population structure of microbes but scaling them to trees with thousands of leaves representing heterogeneous populations is highly challenging. This greatly hinders the identification of prevalent genetic features that contribute to phenotypes that are observed in a wide diversity of species.ResultsIn this study, Evolink was developed as an approach to rapidly identify genotypes associated with phenotypes in large-scale multi-species microbial datasets. Compared to other similar tools, Evolink was consistently among the top-performing methods in terms of precision and sensitivity when applied to simulated and real-world flagella datasets. In addition, Evolink significantly outperformed all other approaches in terms of computation time. Application of Evolink on flagella and gram-staining datasets revealed findings that are consistent with known markers and supported by the literature. In conclusion, Evolink can rapidly detect phenotype-associated genotypes across multiple species, demonstrating its potential to be broadly utilized to identify gene families associated with traits of interest.Availability and implementationThe source code, docker container and web server for Evolink are freely available at https://github.com/nlm-irp-jianglab/Evolink.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends
3D-MSNet: A point cloud based deep learning model for untargeted feature detection and quantification in profile LC-HRMS data
AbstractMotivationLiquid chromatography coupled with high-resolution mass spectrometry (LC-HRMS) is widely used in composition profiling in untargeted metabolomics research. While retaining complete sample information, mass spectrometry (MS) data naturally have the characteristics of high dimensionality, high complexity, and huge data volume. In mainstream quantification methods, none of the existing methods can perform direct three-dimensional analysis on lossless profile MS signals. All software simplifies calculations by dimensionality reduction or lossy grid transformation, ignoring the full three-dimensional signal distribution of mass spectrometry data and resulting in inaccurate feature detection and quantification.ResultsOn the basis that the neural network is effective for high-dimensional data analysis and can discover implicit features from large amounts of complex data, in this work, we propose 3D-MSNet, a novel deep-learning-based model for untargeted feature extraction. 3D-MSNet performs direct feature detection on three-dimensional MS point clouds as an instance segmentation task. After training on a self-annotated 3D feature dataset, we compared our model with 9 popular software (MS-DIAL, MZmine 2, XCMS Online, MarkerView, Compound Discoverer, MaxQuant, Dinosaur, DeepIso, PointIso) on two metabolomics and one proteomics public benchmark datasets. Our 3D-MSNet model outperformed other software with significant improvement in feature detection and quantification accuracy on all evaluation datasets. Furthermore, 3D-MSNet has high feature extraction robustness and can be widely applied to profile MS data acquired with various high-resolution mass spectrometers with various resolutions.Availability3D-MSNet is open-source and freely available at https://github.com/CSi-Studio/3D-MSNet under a permissive license. Benchmark datasets, training dataset, evaluation methods and results are available at https://doi.org/10.5281/zenodo.6582912Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends
Pages
