DrForna: visualization of cotranscriptional folding
AbstractMotivationUnderstanding RNA folding at the level of secondary structures can give important insights concerning the function of a molecule. We are interested to learn how secondary structures change dynamically during transcription, as well as whether particular secondary structures form already during or only after transcription. While different approaches exist to simulate cotranscriptional folding, the current strategies for visualization are lagging behind. New, more suitable approaches are necessary to help with exploring the generated data from cotranscriptional folding simulations.ResultsWe present DrForna, an interactive visualization app for viewing the time course of a cotranscriptional RNA folding simulation. Specifically, users can scroll along the time axis and see the population of structures that are present at any particular time point.AvailabilityDrForna is a JavaScript project available on Github at https://github.com/ViennaRNA/drforna and deployed at https://viennarna.github.io/drfornaSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends
An Extensive Benchmark Study on Biomedical Text Generation and Mining with ChatGPT
AbstractMotivationIn recent years, the development of natural language process (NLP) technologies and deep learning hardware has led to significant improvement in large language models(LLMs). The ChatGPT, the state-of-the-art LLM built on GPT-3.5 and GPT-4, shows excellent capabilities in general language understanding and reasoning. Researchers also tested the GPTs on a variety of NLP related tasks and benchmarks and got excellent results. With exciting performance on daily chat, researchers began to explore the capacity of ChatGPT on expertise that requires professional education for human and we are interested in the biomedical domain.ResultsTo evaluate the performance of ChatGPT on biomedical related tasks, this paper presents a comprehensive benchmark study on the use of ChatGPT for biomedical corpus, including article abstracts, clinical trials description, biomedical questions and so on. Typical NLP tasks like named entity recognization, relation extraction, sentence similarity, question and answering, and document classification are included. Overall, ChatGPT got a BLURB score of 58.50 while the state-of-the-art model had a score of 84.30. Through a series of experiments, we demonstrated the effectiveness and versatility of ChatGPT in biomedical text understanding, reasoning and generation and the limitation of ChatGPT build on GPT-3.5.Availability and ImplementationAll the datasets is available from BLURB benchmark https://microsoft.github.io/BLURB/index.html. The prompts are described in the article.
Categories: Bioinformatics Trends
FunTaxIS-lite: a simple and light solution to investigate protein functions in all living organisms
AbstractMotivationdefining the full domain of protein functions belonging to an organism is a complex challenge that is due to the huge heterogeneity of the taxonomy, where single or small groups of species can bear unique functional characteristics. FunTaxIS-lite provides a solution to this challenge by determining taxon-based constraints on Gene Ontology (GO) terms, which specify the functions that an organism can or cannot perform. The tool employs a set of rules to generate and spread the constraints across both the taxon hierarchy and the GO graph.Resultsthe taxon-based constraints produced by FunTaxIS-lite extend those provided by the Gene Ontology Consortium by an average of 300%. The implementation of these rules significantly reduces errors in function predictions made by automatic algorithms and can assist in correcting inconsistent protein annotations in databases.AvailabilityFunTaxIS-lite is available on https://www.medcomp.medicina.unipd.it/funtaxis-lite and from https://github.com/MedCompUnipd/FunTaxIS-lite.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends
scAce: an adaptive embedding and clustering method for single-cell gene expression data
AbstractMotivationSince the development of single-cell RNA sequencing (scRNA-seq) technologies, clustering analysis of single-cell gene expression data has been an essential tool for distinguishing cell types and identifying novel cell types. Even though many methods have been available for scRNA-seq clustering analysis, the majority of them are constrained by the requirement on predetermined cluster numbers or the dependence on selected initial cluster assignment.ResultsIn this article, we propose an adaptive embedding and clustering method named scAce, which constructs a variational autoencoder to simultaneously learn cell embeddings and cluster assignments. In the scAce method, we develop an adaptive cluster merging approach which achieves improved clustering results without the need to estimate the number of clusters in advance. Additionally, scAce provides an option to perform clustering enhancement, which can update and enhance cluster assignments based on previous clustering results from other methods. Based on computational analysis of both simulated and real datasets, we demonstrate that scAce outperforms state-of-the-art clustering methods for scRNA-seq data, and achieves better clustering accuracy and robustness.Availability and implementationThe scAce package is implemented in python 3.8 and is freely available from https://github.com/sldyns/scAce.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends
Automated machine learning for Genome Wide Association Studies
AbstractMotivationGenome Wide Association Studies (GWAS) present several computational and statistical challenges for their data analysis, including knowledge discovery, interpretability, and translation to clinical practice.ResultsWe develop, apply, and comparatively evaluate an Automated Machine Learning (AutoML) approach, customized for genomic data that delivers reliable predictive and diagnostic models, the set of genetic variants that are important for predictions (called a biosignature), and an estimate of the out-of-sample predictive power. This AutoML approach discovers variants with higher predictive performance compared to standard GWAS methods, computes an individual risk prediction score, generalizes to new, unseen data, is shown to better differentiate causal variants from other highly correlated variants, and enhances knowledge discovery and interpretability by reporting multiple equivalent biosignatures.AvailabilityCode for this paper is available at: https://github.com/mensxmachina/autoML-GWAS. JADBio offers a free version at: https://jadbio.com/sign-up/. SNP data can be downloaded from the EGA repository (https://ega-archive.org/). PRS data are found at: https://www.aicrowd.com/challenges/opensnp-height-prediction. Simulation data to study population structure can be found at: https://easygwas.ethz.ch/data/public/dataset/view/1/Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends
DeepMHCI: An anchor position-aware deep interaction model for accurate MHC-I peptide binding affinity prediction
AbstractMotivationComputationally predicting MHC-I peptide binding affinity is an important problem in immunological bioinformatics, which is also crucial for the identification of neoantigens for personalized therapeutic cancer vaccines. Recent cutting-edge deep learning-based methods for this problem cannot achieve satisfactory performance, especially for non-9-mer peptides. This is because such methods generate the input by simply concatenating the two given sequences: a peptide and (the pseudo sequence of) an MHC class I molecule, which cannot precisely capture the anchor positions of the MHC binding motif for the peptides with variable lengths. We thus developed an anchor position-aware and high-performance deep model, DeepMHCI, with a position-wise gated layer and a residual binding interaction convolution layer. This allows the model to control the information flow in peptides to be aware of anchor positions and model the interactions between peptides and the MHC pseudo (binding) sequence directly with multiple convolutional kernels.ResultsThe performance of DeepMHCI has been thoroughly validated by extensive experiments on four benchmark datasets under various settings, such as five-fold cross-validation, validation with the independent testing set, external HPV vaccine identification and external CD8+ epitope identification. Experimental results with visualization of binding motifs demonstrate that DeepMHCI outperformed all competing methods, especially on non-9-mer peptides binding prediction.AvailabilityDeepMHCI is publicly available at https://github.com/ZhuLab-Fudan/DeepMHCI.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends
Reducing Cost in DNA-based Data Storage by Sequence Analysis-aided Soft Information Decoding of Variable-Length Reads
AbstractMotivationDNA-based data storage is one of the most attractive research areas for future archival storage. However, it faces the problems of high writing and reading costs for practical use. There have been many efforts to resolve this problem, but existing schemes are not fully suitable for DNA-based data storage, and more cost reduction is needed.ResultsWe propose whole encoding and decoding procedures for DNA storage. The encoding procedure consists of a carefully designed single low-density parity-check code as an inter-oligo code, which corrects errors and dropouts efficiently. We apply new clustering and alignment methods that operate on variable-length reads to aid the decoding performance. We use edit distance and quality scores during the sequence analysis-aided decoding procedure, which can discard abnormal reads and utilize high-quality soft information. We store 548.83 KB of an image file in DNA oligos and achieve a writing cost reduction of 7.46% and a significant reading cost reduction of 26.57% and 19.41% compared to the two previous works.Availability and implementationData and codes for all the algorithms proposed in this study are available at: https://github.com/sjpark0905/DNA-LDPC-codes.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends
PyDESeq2: a python package for bulk RNA-seq differential expression analysis
AbstractSummaryWe present PyDESeq2, a python implementation of the DESeq2 workflow for differential expression analysis on bulk RNA-seq data. This re-implementation yields similar, but not identical, results: it achieves higher model likelihood, allows speed improvements on large datasets, as shown in experiments on TCGA data, and can be more easily interfaced with modern python-based data science tools.Availability and ImplementationPyDESeq2 is released as an open-source software under the MIT license. The source code is available on GitHub at https://github.com/owkin/PyDESeq2 and documented at https://pydeseq2.readthedocs.io. PyDESeq2 is part of the scverse ecosystem.
Categories: Bioinformatics Trends
Multimodal learning of noncoding variant effects using genome sequence and chromatin structure
AbstractMotivationA growing amount of noncoding genetic variants, including single-nucleotide polymorphisms (SNPs), are found to be associated with complex human traits and diseases. Their mechanistic interpretation is relatively limited and can use the help from computational prediction of their effects on epigenetic profiles. However, current models often focus on local, 1D genome sequence determinants and disregard global, 3D chromatin structure that critically affects epigenetic events.ResultsWe find that noncoding variants of unexpected high similarity in epigenetic profiles, with regards to their relatively low similarity in local sequences, can be largely attributed to their proximity in chromatin structure. Accordingly we have developed a multimodal deep learning scheme that incorporates both data of 1D genome sequence and 3D chromatin structure for predicting noncoding variant effects. Specifically, we have integrated convolutional and recurrent neural networks for sequence embedding and graph neural networks for structure embedding despite the resolution gap between the two types of data, while utilizing recent DNA language models. Numerical results show that our models outperform competing sequence-only models in predicting epigenetic profiles and their use of long-range interactions complement sequence-only models in extracting regulatory motifs. They prove to be excellent predictors for noncoding variant effects in gene expression and pathogenicity, whether in unsupervised “zero-shot” learning or supervised “few-shot” learning.AvailabilityCodes and data can be accessed at https://github.com/Shen-Lab/ncVarPred-1D3D and https://zenodo.org/record/7975777.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends
Phylogenetic inference using Generative Adversarial Networks
AbstractMotivationThe application of machine learning approaches in phylogenetics has been impeded by the vast model space associated with inference. Supervised machine learning approaches require data from across this space to train models. Because of this, previous approaches have typically been limited to inferring relationships among unrooted quartets of taxa, where there are only three possible topologies. Here, we explore the potential of generative adversarial networks (GANs) to address this limitation. GANs consist of a generator and a discriminator: at each step, the generator aims to create data that is similar to real data, while the discriminator attempts to distinguish generated and real data. By using an evolutionary model as the generator, we use GANs to make evolutionary inferences. Since a new model can be considered at each iteration, heuristic searches of complex model spaces are possible. Thus, GANs offer a potential solution to the challenges of applying machine learning in phylogenetics.ResultsWe developed phyloGAN, a GAN that infers phylogenetic relationships among species. phyloGAN takes as input a concatenated alignment, or a set of gene alignments, and infers a phylogenetic tree either considering or ignoring gene tree heterogeneity. We explored the performance of phyloGAN for up to fifteen taxa in the concatenation case and six taxa when considering gene tree heterogeneity. Error rates are relatively low in these simple cases. However, run times are slow and performance metrics suggest issues during training. Future work should explore novel architectures that may result in more stable and efficient GANs for phylogenetics.AvailabilityphyloGAN is available on github: https://github.com/meganlsmith/phyloGAN/.
Categories: Bioinformatics Trends
Towards precise PICO extraction from abstracts of randomized controlled trials using a section-specific learning approach
AbstractMotivationAutomated extraction of participants, intervention, comparison/control, and outcome (PICO) from the randomized controlled trial (RCT) abstracts is important for evidence synthesis. Previous studies have demonstrated the feasibility of applying natural language processing (NLP) for PICO extraction. However, the performance is not optimal due to the complexity of PICO information in RCT abstracts and the challenges involved in their annotation.ResultsWe propose a two-step NLP pipeline to extract PICO elements from RCT abstracts: (i) sentence classification using a prompt-based learning model and (ii) PICO extraction using a named entity recognition (NER) model. First, the sentences in abstracts were categorized into four sections namely background, methods, results, and conclusions. Next, the NER model was applied to extract the PICO elements from the sentences within the title and methods sections that include >96% of PICO information. We evaluated our proposed NLP pipeline on three datasets, the EBM-NLPmoddataset, a randomly selected and reannotated dataset of 500 RCT abstracts from the EBM-NLP corpus, a dataset of 150 COVID-19 RCT abstracts, and a dataset of 150 Alzheimer’s disease (AD) RCT abstracts. The end-to-end evaluation reveals that our proposed approach achieved an overall micro F1 score of 0.833 on the EBM-NLPmod dataset, 0.928 on the COVID-19 dataset, and 0.899 on the AD dataset when measured at the token-level and an overall micro F1 score of 0.712 on EBM-NLPmod dataset, 0.850 on the COVID-19 dataset, and 0.805 on the AD dataset when measured at the entity-level.AvailabilityOur codes and datasets are publicly available at https://github.com/BIDS-Xu-Lab/section_specific_annotation_of_PICOSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends
Genome-wide multi-mediator analyses using the generalized Berk–Jones statistics with the composite test
AbstractMotivationMediation analysis is performed to evaluate the effects of a hypothetical causal mechanism that marks the progression from an exposure, through mediators, to an outcome. In the age of high-throughput technologies, it has become routine to assess numerous potential mechanisms at the genome or proteome scales. Alongside this, the necessity to address issues related to multiple testing has also arisen. In a sparse scenario where only a few genes or proteins are causally involved, conventional methods for assessing mediation effects lose statistical power because the composite null distribution behind this experiment can not be attained. The power loss hence decreases the true mechanisms identified after multiple testing corrections. To fairly delineate a uniform distribution under the composite null, Huang (2019, AoAS) proposed the composite test to provide adjusted p-values for single-mediator analyses.ResultsOur contribution is to extend the method to multi-mediator analyses, which are commonly encountered in genomic studies and also flexible to various biological interests. Using the generalized Berk-Jones statistics with the composite test, we proposed a multivariate approach that favors dense and diverse mediation effects, a decorrelation approach that favors sparse and consistent effects, and a hybrid approach that captures the edges of both approaches. Our analysis suite has been implemented as an R package MACtest. The utility is demonstrated by analyzing the lung adenocarcinoma datasets from The Cancer Genome Atlas and Clinical Proteomic Tumor Analysis Consortium. We further investigate the genes and networks whose expression may be regulated by smoking-induced epigenetic aberrations.Availability and ImplementationAn R package MACtest is available on https://github.com/roqe/MACtestSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends
Machine learning-based quantification for disease uncertainty increases the statistical power of genetic association studies
AbstractMotivationAllowance for increasingly large samples is a key to identify the association of genetic variants with Alzheimer’s disease (AD) in genome-wide association studies (GWAS). Accordingly, we aimed to develop a method that incorporates patients with mild cognitive impairment (MCI) and unknown cognitive status in GWAS using a machine learning-based AD prediction model.ResultsSimulation analyses showed that weighting imputed phenotypes (WIP) method increased the statistical power compared to ordinary logistic regression using only AD cases and controls. Applied to real-world data, the penalized logistic method had the highest AUC (0.96) for AD prediction and WIP method performed well in terms of power. We identified an association (p < 5.0×10-8) of AD with several variants in the APOE region and rs143625563 in LMX1A. Our method, which allows the inclusion of individuals with MCI, improves the statistical power of GWAS for AD. We discovered a novel association with LMX1A.Availability and implementationSimulation codes can be accessed at https://github.com/Junkkkk/wGEE_GWAS.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends
AliSim-HPC: parallel sequence simulator for phylogenetics
AbstractMotivationSequence simulation plays a vital role in phylogenetics with many applications, such as evaluating phylogenetic methods, testing hypotheses, and generating training data for machine-learning applications. We recently introduced a new simulator for multiple sequence alignments called Ali-Sim, which outperformed existing tools. However, with the increasing demands of simulating large data sets, AliSim is still slow due to its sequential implementation; for example, to simulate millions of sequence alignments, AliSim took several days or weeks. Parallelization has been used for many phylogenetic inference methods but not yet for sequence simulation.ResultsThis paper introduces AliSim-HPC, which, for the first time, employs high-performance computing for phylogenetic simulations. AliSim-HPC parallelizes the simulation process at both multi-core and multi-CPU levels using the OpenMP and MPI libraries, respectively. AliSim-HPC is highly efficient and scalable, which reduces the runtime to simulate 100 large gap-free alignments (30,000 sequences of one million sites) from over one day to 11 minutes using 256 CPU cores from a cluster with 6 computing nodes, a 153-fold speedup. While the OpenMP version can only simulate gap-free alignments, the MPI version supports insertion-deletion models like the sequential AliSim.AvailabilityAliSim-HPC is open source and available as part of the new IQ-TREE version v2.2.3 at https://github.com/iqtree/iqtree2/releases with a user manual at http://www.iqtree.org/doc/AliSim.Supplementary informationSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends
Tracking and curating putative SARS-CoV-2 recombinants with RIVET
AbstractMotivationIdentifying and tracking recombinant strains of SARS-CoV-2 is critical to understanding the evolution of the virus and controlling its spread. But confidently identifying SARS-CoV-2 recombinants from thousands of new genome sequences that are being shared online every day is quite challenging, causing many recombinants to be missed or suffer from weeks of delay in being formally identified while undergoing expert curation.ResultsWe present RIVET—a software pipeline and visual platform that takes advantage of recent algorithmic advances in recombination inference to comprehensively and sensitively search for potential SARS-CoV-2 recombinants and organize the relevant information in a web interface that would help greatly accelerate the process of identifying and tracking recombinants.AvailabilityRIVET-based web interface displaying the most updated analysis of potential SARS-CoV-2 recombinants is available at https://rivet.ucsd.edu/. RIVET’s frontend and backend code is freely available under the MIT license at https://github.com/TurakhiaLab/rivet and the documentation for RIVET is available at https://turakhialab.github.io/rivet/. The inputs necessary for running RIVET’s backend workflow for SARS-CoV-2 are available through a public database maintained and updated daily by UCSC (https://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/UShER_SARS-CoV-2/).Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends
DecentTree: Scalable Neighbour-Joining for the Genomic Era
AbstractMotivationNeighbour-Joining is one of the most widely used distance-based phylogenetic inference methods. However, current implementations do not scale well for datasets with more than 10,000 sequences. Given the increasing pace of generating new sequence data, particularly in outbreaks of emerging diseases, and the already enormous existing databases of sequence data for which NJ is a useful approach, new implementations of existing methods are warranted.ResultsHere we present DecentTree, which provides highly optimised and parallel implementations of Neighbour-Joining and several of its variants. DecentTree is designed as a stand-alone application and a header-only library easily integrated with other phylogenetic software (e.g., it is integral in the popular IQ-TREE software). We show that DecentTree shows similar or improved performance over existing software (BIONJ, Quicktree, FastME, and RapidNJ), especially for handling very large alignments. For example, DecentTree is up to 6-fold faster than the fastest existing Neighbour-Joining software (e.g., RapidNJ) when generating a tree of 64,000 SARS-CoV-2 genomes.AvailabilityDecentTree is open source and freely available at https://github.com/iqtree/decenttree. All code and data used in this analysis are available on Github (https://github.com/asdcid/Comparison-of-neighbour-joining-software).Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends
GraphCpG: Imputation of Single-cell Methylomes Based on Locus-aware Neighboring Subgraphs
AbstractMotivationSingle-cell DNA methylation sequencing can assay DNA methylation at single-cell resolution. However, incomplete coverage compromises related downstream analyses, outlining the importance of imputation techniques. With a rising number of cell samples in recent large datasets, scalable and efficient imputation models are critical to addressing the sparsity for genome-wide analyses.ResultsWe proposed a novel graph-based deep learning approach to impute methylation matrices based on locus-aware neighboring subgraphs with locus-aware encoding orienting on one cell type. Merely using the CpGs methylation matrix, the obtained GraphCpG outperforms previous methods on datasets containing more than hundreds of cells and achieves competitive performance on smaller datasets, with subgraphs of predicted sites visualized by retrievable bipartite graphs. Besides better imputation performance with increasing cell num, it significantly reduces computation time and demonstrates improvement in downstream analysis.AvailabilityThe source code is freely available at https://github.com/yuzhong-deng/graphcpg.git.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends
DCAlign v1.0: Aligning biological sequences using co-evolution models and informed priors
AbstractSummaryDCAlign is a new alignment method able to cope with the conservation and the co-evolution signals that characterize the columns of multiple sequence alignments of homologous sequences. However, the pre-processing steps required to align a candidate sequence are computationally demanding. We show in v1.0 how to dramatically reduce the overall computing time by including an empirical prior over an informative set of variables mirroring the presence of insertions and deletions.Availability and implementationDCAlign v1.0 is implemented in Julia and it is fully available at https://github.com/infernet-h2020/DCAlignSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends
HAPNEST: efficient, large-scale generation and evaluation of synthetic datasets for genotypes and phenotypes
AbstractMotivationExisting methods for simulating synthetic genotype and phenotype datasets have limited scalability, constraining their usability for large-scale analyses. Moreover, a systematic approach for evaluating synthetic data quality and a benchmark synthetic dataset for developing and evaluating methods for polygenic risk scores are lacking.ResultsWe present HAPNEST, a novel approach for efficiently generating diverse individual-level genotypic and phenotypic data. In comparison to alternative methods, HAPNEST shows faster computational speed and a lower degree of relatedness with reference panels, while generating datasets that preserve key statistical properties of real data. These desirable synthetic data properties enabled us to generate 6.8 million common variants and nine phenotypes with varying degrees of heritability and polygenicity across 1 million individuals. We demonstrate how HAPNEST can facilitate biobank-scale analyses through the comparison of seven methods to generate polygenic risk scoring across multiple ancestry groups and different genetic architectures.Availability and ImplementationA synthetic dataset of 1,008,000 individuals and 9 traits for 6.8 million common variants is available at https://www.ebi.ac.uk/biostudies/studies/S-BSST936. The HAPNEST software for generating synthetic datasets is available as Docker/Singularity containers and open source Julia and C code at https://github.com/intervene-EU-H2020/synthetic_data.Supplementary InformationSupplementary dataSupplementary data are available at Bioinformatics online
Categories: Bioinformatics Trends
A neighborhood-regularization method leveraging multi-view data for predicting the frequency of drug side effects
AbstractMotivationA critical issue in drug benefit-risk assessment is to determine the frequency of side effects, which is performed by randomized controlled trails. Computationally predicted frequencies of drug side effects can be used to effectively guide the randomized controlled trails. However, it is more challenging to predict drug side effect frequencies, and thus only a few studies cope with this problem.ResultsIn this work, we propose a neighborhood-regularization method (NRFSE) that leverages multi-view data on drugs and side effects to predict the frequency of side effects. First, we adopt a class-weighted non-negative matrix factorization to decompose the drug-side effect frequency matrix, in which Gaussian likelihood is used to model unknown drug-side effect pairs. Second, we design a multi-view neighborhood regularization to integrate three drug attributes and two side effect attributes, respectively, which makes most similar drugs and most similar side effects have similar latent signatures. The regularization can adaptively determine the weights of different attributes. We conduct extensive experiments on one benchmark dataset, and NRFSE improves the prediction performance compared with five state-of-the-art approaches. Independent test set of post-marketing side effects further validate the effectiveness of NRFSE.AvailabilitySource code and datasets are available at https://github.com/linwang1982/NRFSE or https://codeocean.com/capsule/4741497/tree/v1.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends