Jump to Navigation

VirBot: an RNA viral contig detector for metagenomic data

Bioinformatics Oxford Journals - Thu, 16/02/2023 - 5:30am
AbstractSummaryWithout relying on cultivation, metagenomic sequencing greatly accelerated the novel RNA virus detection. However, it is not trivial to accurately identify RNA viral contigs from a mixture of species. The low content of RNA viruses in metagenomic data requires a highly specific detector, while new RNA viruses can exhibit high genetic diversity, posing a challenge for alignment-based tools. In this work, we developed VirBot, a simple yet effective RNA virus identification tool based on the protein families and the corresponding adaptive score cutoffs. We benchmarked it with seven popular tools for virus identification on both simulated and real sequencing data. VirBot shows its high specificity in metagenomic datasets and superior sensitivity in detecting novel RNA viruses.Availability and implementationhttps://github.com/GreyGuoweiChen/RNA_virus_detectorSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

BioPlexR and BioPlexPy: integrated data products for the analysis of human protein interactions

Bioinformatics Oxford Journals - Thu, 16/02/2023 - 5:30am
AbstractSummaryThe BioPlex project has created two proteome-scale, cell-line-specific protein-protein interaction (PPI) networks: the first in 293T cells, including 120k interactions among 15k proteins; and the second in HCT116 cells, including 70k interactions between 10k proteins. Here, we describe programmatic access to the BioPlex PPI networks and integration with related resources from within R and Python. Besides PPI networks for 293T and HCT116 cells, this includes access to CORUM protein complex data, PFAM protein domain data, PDB protein structures, and transcriptome and proteome data for the two cell lines. The implemented functionality serves as a basis for integrative downstream analysis of BioPlex PPI data with domain-specific R and Python packages, including efficient execution of maximum scoring subnetwork analysis, protein domain-domain association analysis, mapping of PPIs onto 3D protein structures, and analysis of BioPlex PPIs at the interface of transcriptomic and proteomic data.AvailabilityThe BioPlex R package is available from Bioconductor (bioconductor.org/packages/BioPlex) and the BioPlex Python package is available from PyPI (pypi.org/project/bioplexpy). Applications and downstream analyses are available from GitHub (github.com/ccb-hms/BioPlexAnalysis).
Categories: Bioinformatics Trends

Using Graph Neural Networks for Site-of-Metabolism Prediction and its Applications to Ranking Promiscuous Enzymatic Products

Bioinformatics Oxford Journals - Wed, 15/02/2023 - 5:30am
AbstractMotivationWhile traditionally utilized for identifying site-specific metabolic activity within a compound to alter its interaction with a metabolizing enzyme, predicting the Site-of-Metabolism (SOM) is essential in analyzing the promiscuity of enzymes on substrates. The successful prediction of SOMs and the relevant promiscuous products has a wide range of applications that include creating extended metabolic models that account for enzyme promiscuity and the construction of novel heterologous synthesis pathways. There is therefore a need to develop generalized methods that can predict molecular SOMs for a wide range of metabolizing enzymes.ResultsThis paper develops a Graph Neural Network (GNN) model for the classification of an atom (or a bond) being an SOM. Our model, GNN-SOM, is trained on enzymatic interactions, available in the KEGG database, that span all enzyme commission numbers. We demonstrate that GNN-SOM consistently outperforms baseline Machine Learning (ML) models, when trained on all enzymes, on Cytochrome P450 (CYP) enzymes, or on non-CYP enzymes. We showcase the utility of GNN-SOM in prioritizing predicted enzymatic products due to enzyme promiscuity for two biological applications: the construction of Extended Metabolic Models (EMMs) and the construction of synthesis pathways.AvailabilityA python implementation of the trained SOM predictor model can be found at https://github.com/HassounLab/GNN-SOMSupplementary informationNot applicable
Categories: Bioinformatics Trends

wpLogicNet: logic gate and structure inference in gene regulatory networks

Bioinformatics Oxford Journals - Wed, 15/02/2023 - 5:30am
AbstractMotivationThe gene regulatory process resembles a logic system in which a target gene is regulated by a logic gate among its regulators. While various computational techniques are developed for a gene regulatory network (GRN) reconstruction, the study of logical relationships has received little attention. Here, we propose a novel tool called wpLogicNet that simultaneously infers both the directed GRN structures and logic gates among genes or transcription factors (TFs) that regulate their target genes, based on continuous steady-state gene expressions.ResultswpLogicNet proposes a framework to infer the logic gates among any number of regulators, with a low time-complexity. This distinguishes wpLogicNet from the existing logic-based models that are limited to inferring the gate between two genes or TFs. Our method applies a Bayesian mixture model to estimate the likelihood of the target gene profile and to infer the logic gate a posteriori. Furthermore, in structure-aware mode, wpLogicNet reconstructs the logic gates in TF-gene or gene-gene interaction networks with known structures. The predicted logic gates are validated on simulated datasets of TF-gene interaction networks from Escherichia coli (E.coli). For the directed-edge inference, the method is validated on datasets from E.coli and DREAM project. The results show that compared to other well-known methods, wpLogicNet is more precise in reconstructing the network and logical relationships among genes.Availability and ImplementationThe datasets and R package of wpLogicNet are available in the github repository, https://github.com/CompBioIPM/wpLogicNet.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Generalizations of the Genomic Rank Distance to Indels

Bioinformatics Oxford Journals - Wed, 15/02/2023 - 5:30am
AbstractMotivationThe rank distance model, introduced by Zanetti et al. (2016), represents genome rearrangements in multi-chromosomal genomes as matrix operations, which allows the reconstruction of parsimonious histories of evolution by rearrangements. We seek to generalize this model by allowing for genomes with different gene content, to accommodate a broader range of biological contexts. We approach this generalization by using a matrix representation of genomes. This leads to simple distance formulas and sorting algorithms for genomes with different gene contents, but without duplications.ResultsWe generalize the rank distance to genomes with different gene content in two different ways. The first approach adds insertions, deletions, and the substitution of a single extremity to the basic operations. We show how to efficiently compute this distance. To avoid genomes with incomplete markers, our alternative distance, the rank-indel distance, only uses insertions and deletions of entire chromosomes. We construct phylogenetic trees with our distances and the DCJ-Indel distance for simulated data and real prokaryotic genomes, and compare them against reference trees. For simulated data, our distances outperform the DCJ-Indel distance using the Quartet metric as baseline. This suggests that rank distances are more robust for comparing distantly related species. For real prokaryotic genomes, all rearrangement-based distances yield phylogenetic trees that are topologically distant from the reference (65% similarity with Quartet metric), but are able to cluster related species within their respective clades and distinguish the Shigella strains as the farthest relative of the E. coli strains, a feature not seen in the reference tree.AvailabilityCode and instructions available at https://github.com/meidanis-lab/rank-indel.Supplementary informationSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends


Subscribe to Centre for Bioinformatics aggregator - Bioinformatics Trends


June 2023