Jump to Navigation

Deep learning models for RNA secondary structure prediction (probably) do not generalise across families

Bioinformatics Oxford Journals - Fri, 24/06/2022 - 5:30am
AbstractMotivationThe secondary structure of RNA is of importance to its function. Over the last few years, several papers attempted to use machine learning to improve de novo RNA secondary structure prediction. Many of these papers report impressive results for intra-family predictions, but seldom address the much more difficult (and practical) inter-family problem.ResultsWe demonstrate that it is nearly trivial with convolutional neural networks to generate pseudo-free energy changes, modeled after structure mapping data, that improve the accuracy of structure prediction for intra-family cases. We propose a more rigorous method for inter-family cross-validation that can be used to assess the performance of learning-based models. Using this method, we further demonstrate that intra-family performance is insufficient proof of generalisation despite the widespread assumption in the literature, and provide strong evidence that many existing learning-based models have not generalised inter-family. AvailabilitySource code and data is available at https://github.com/marcellszi/dl-rna.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

MIAMI: Mutual Information-based Analysis of Multiplex Imaging data

Bioinformatics Oxford Journals - Fri, 24/06/2022 - 5:30am
AbstractMotivationStudying the interaction or co-expression of the proteins or markers in the tumor microenvironment (TME) of cancer subjects can be crucial in the assessment of risks, such as death or recurrence. In the conventional approach, the cells need to be declared positive or negative for a marker based on its intensity. For multiple markers, manual thresholds are required for each marker, which can become cumbersome. The performance of the subsequent analysis relies heavily on this step and thus suffers from subjectivity and lacks robustness.ResultsWe present a new method where different marker intensities are viewed as dependent random variables, and the mutual information (MI) between them is considered to be a metric of co-expression. Estimation of the joint density, as required in the traditional form of MI, becomes increasingly challenging as the number of markers increases. We consider an alternative formulation of MI which is conceptually similar but has an efficient estimation technique for which we develop a new generalization. With the proposed method, we analyzed a lung cancer dataset finding the co-expression of the markers, HLA-DR and CK to be associated with survival. We also analyzed a triple negative breast cancer dataset finding the co-expression of the immuno-regulatory proteins, PD1, PD-L1, Lag3 and IDO, to be associated with disease recurrence. We demonstrated the robustness of our method through different simulation studies.AvailabilityThe associated R package can be found here, https://github.com/sealx017/MIAMI.Supplementary informationThe Supplementary MaterialSupplementary Material is attached.
Categories: Bioinformatics Trends

XSI - A genotype compression tool for compressive genomics in large biobanks

Bioinformatics Oxford Journals - Fri, 24/06/2022 - 5:30am
AbstractMotivationGeneration of genotype data has been growing exponentially over the last decade. With the large size of recent datasets comes a storage and computational burden with ever increasing costs. To reduce this burden we propose XSI, a file format with reduced storage footprint that also allows computation on the compressed data and we show how this can improve future analyses.ResultsWe show that XSI allows for a file size reduction of 4-20x compared to compressed BCF and demonstrate its potential for “compressive genomics” on the UK Biobank whole genome sequencing genotypes with 8x faster loading times, 5x faster run of homozygozity computation, 30x faster dot products computation, and 280x faster allele counts.AvailabilityThe xSqueezeIt file format (XSI) specifications, API, and command line tool are released under open-source (MIT) license and are available at https://github.com/rwk-unil/xSqueezeItSupplementary informationSupplementary materialsSupplementary materials are available at Bioinformatics online.
Categories: Bioinformatics Trends

EasyGDB, a low-maintenance and highly customizable system to develop genomics portals

Bioinformatics Oxford Journals - Fri, 24/06/2022 - 5:30am
AbstractSummaryEasyGDB is an easy to implement low-maintenance tool developed to create genomic data management web platforms. It can be used for any species, group of species, or multiple genome or annotation versions. EasyGDB provides a framework to develop a web portal that includes the general information about species, projects and members, and bioinformatics tools such as file downloads, BLAST, genome browser, annotation search, gene expression visualization, annotation and sequence download, and gene ids and orthologs lookup. The code of EasyGDB facilitates data maintenance and update for non-experienced bioinformaticians, using BLAST databases to store and retrieve sequence data in gene annotation pages and bioinformatics tools, and JSON files to customize metadata. EasyGDB is a highly customizable tool. Any section and tool can be enabled or disabled like a switch through a single configuration file. This tool aims to simplify the development of genomics portals in non-model species, providing a modern web style with embedded interactive bioinformatics tools to cover all the common needs derived from genomics projects.Availability and Implementationhttps://github.com/noefp/easy_gdb.
Categories: Bioinformatics Trends

The Practical Haplotype Graph, a platform for storing and using pangenomes for imputation

Bioinformatics Oxford Journals - Fri, 24/06/2022 - 5:30am
AbstractMotivationPangenomes provide novel insights for population and quantitative genetics, genomics, and breeding not available from studying a single reference genome. Instead, a species is better represented by a pangenome or collection of genomes. Unfortunately, managing and using pangenomes for genomically diverse species is computationally and practically challenging. We developed a trellis graph representation anchored to the reference genome that represents most pangenomes well and can be used to impute complete genomes from low density sequence or variant data.ResultsThe Practical Haplotype Graph (PHG) is a pangenome pipeline, database (PostGRES & SQLite), data model (Java, Kotlin, or R), and Breeding API (BrAPI) web service. The PHG has already been able to accurately represent diversity in four major crops including maize, one of the most genomically diverse species, with up to 1000-fold data compression. Using simulated data, we show that, at even 0.1X coverage, with appropriate reads and sequence alignment, imputation results in extremely accurate haplotype reconstruction. The PHG is a platform and environment for the understanding and application of genomic diversity.AvailabilityAll resources listed here are freely available. The PHG Docker used to generate the simulation results is https://hub.docker.com/ as maizegenetics/phg:0.0.27. PHG source code is at https://bitbucket.org/bucklerlab/practicalhaplotypegraph/src/master/. The code used for the analysis of simulated data is at https://bitbucket.org/bucklerlab/phg-manuscript/src/master/. The PHG database of NAM parent haplotypes is in the CyVerse data store (https://de.cyverse.org/de/) and named /iplant/home/shared/panzea/panGenome/PHG_db_maize/phg_v5Assemblies_20200608.db.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Batch alignment via retention orders for preprocessing large-scale multi-batch LC-MS experiments

Bioinformatics Oxford Journals - Fri, 24/06/2022 - 5:30am
AbstractMotivationMeticulous selection of chromatographic peak detection parameters and algorithms is a crucial step in preprocessing LC-MS data. However, as mass-to-charge ratio (m/z) and retention time shifts are larger between batches than within batches, finding apt parameters for all samples of a large-scale multi-batch experiment with the aim of minimizing information loss becomes a challenging task. Preprocessing independent batches individually can curtail said problems but requires a method for aligning and combining them for further downstream analysis.ResultsWe present two methods for aligning and combining individually preprocessed batches in multi-batch LC-MS experiments. Our developed methods were tested on six sets of simulated and six sets of real datasets. Furthermore, by estimating the probabilities of peak insertion, deletion, and swap between batches in authentic datasets we demonstrate that retention order swaps are not rare in untargeted LC-MS data.AvailabilitykmersAlignment and rtcorrectedAlignment algorithms are made available as an R package with raw data at https://metabocombiner.img.cas.czSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Functional Characterization of Co-Phosphorylation Networks

Bioinformatics Oxford Journals - Wed, 22/06/2022 - 5:30am
AbstractMotivationProtein phosphorylation is a ubiquitous regulatory mechanism that plays a central role in cellular signaling. According to recent estimates, up to 70% of human proteins can be phosphorylated. Therefore, characterization of phosphorylation dynamics is critical for understanding a broad range of biological and biochemical processes. Technologies based on mass spectrometry are rapidly advancing to meet the needs for high-throughput screening of phosphorylation. These technologies enable untargeted quantification of thousands of phosphorylation sites in a given sample. Many labs are already utilizing these technologies to comprehensively characterize signaling landscapes by examining perturbations with drugs and knockdown approaches, or by assessing diverse phenotypes in cancers, neuro-degerenational diseases, infectious diseases, and normal development.ResultsWe comprehensively investigate the concept of “co-phosphorylation”, defined as the correlated phosphorylation of a pair of phosphosites across various biological states. We integrate nine publicly available phosphoproteomics datasets for various diseases (including breast cancer, ovarian cancer and Alzheimer’s disease) and utilize functional data related to sequence, evolutionary histories, kinase annotations, and pathway annotations to investigate the functional relevance of co-phosphorylation. Our results across a broad range of studies consistently show that functionally associated sites tend to exhibit significant positive or negative co-phosphorylation. Specifically, we show that co-phosphorylation can be used to predict with high precision the sites that are on the same pathway or that are targeted by the same kinase. Overall, these results establish co-phosphorylation as a useful resource for analyzing phosphoproteins in a network context, which can help extend our knowledge on cellular signaling and its dysregulation.
Categories: Bioinformatics Trends

Identifying interactions in omics data for clinical biomarker discovery using symbolic regression

Bioinformatics Oxford Journals - Wed, 22/06/2022 - 5:30am
AbstractMotivationThe identification of predictive biomarker signatures from omics and multi-omics data for clinical applications is an active area of research. Recent developments in assay technologies and machine learning (ML) methods have led to significant improvements in predictive performance. However, most high-performing ML methods suffer from complex architectures and lack interpretability.ResultsWe present the application of a novel symbolic-regression-based algorithm, the QLattice, on a selection of clinical omics datasets. This approach generates parsimonious high-performing models that can both predict disease outcomes and reveal putative disease mechanisms, demonstrating the importance of selecting maximally relevant and minimally redundant features in omics-based machine-learning applications. The simplicity and high predictive power of these biomarker signatures make them attractive tools for high-stakes applications in areas such as primary care, clinical decision making and patient stratification.AvailabilityThe QLattice is available as part of a python package (feyn), which is available at the Python Package Index (https://pypi.org/project/feyn/) and can be installed via pip. The documentation provides guides, tutorials, and the API reference (https://docs.abzu.ai/). All code and data used to generate the models and plots discussed in this work can be found in (https://github.com/abzu-ai/QLattice-clinical-omics).Supplementary informationSupplementary materialSupplementary material is available at Bioinformatics online.
Categories: Bioinformatics Trends

Figbird: A probabilistic method for filling gaps in genome assemblies

Bioinformatics Oxford Journals - Wed, 22/06/2022 - 5:30am
AbstractMotivationAdvances in sequencing technologies have led to the sequencing of genomes of a multitude of organisms. However, draft genomes of many of these organisms contain a large number of gaps due to the repeats in genomes, low sequencing coverage and limitations in sequencing technologies. Although there exist several tools for filling gaps, many of these do not utilize all information relevant to gap filling.ResultsHere, we present a probabilistic method for filling gaps in draft genome assemblies using second generation reads based on a generative model for sequencing that takes into account information on insert sizes and sequencing errors. Our method is based on the expectation-maximization (EM) algorithm unlike the graph based methods adopted in the literature. Experiments on real biological datasets show that this novel approach can fill up large portions of gaps with small number of errors and misassemblies compared to other state of the art gap filling tools.Availability and ImplementationThe method is implemented using C ++ in a software named “Filling Gaps by Iterative Read Distribution (Figbird)”, which is available at: https://github.com/SumitTarafder/Figbird.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

riboCleaner: a pipeline to identify and quantify rRNA read contamination from RNA-seq data in plants

Bioinformatics Oxford Journals - Wed, 22/06/2022 - 5:30am
AbstractMotivationAnalysis of gene expression data can be crucial for elucidating biological relationships within living organisms. However, accurate quantification of gene expression relies directly upon the accuracy of the reference genome or transcriptome to which the expression data is mapped. Errors in gene annotation can lead to errors in quantification of gene expression. One source of gene annotation error in eukaryotes arises from incorrect predictions of mRNA gene models within ribosomal DNA (rDNA) regions.ResultsHere, we provide examples of how the presence of false gene models in rDNA regions can result in a handful of genes appearing to contribute to > 50% of the total transcripts per million (TPM) values of entire RNA-seq datasets. To this end, we have created riboCleaner, a bioinformatics pipeline designed to identify misannotated gene models in rDNA regions and quantify rRNA-derived reads in RNA-seq data. We also show the applicability of riboCleaner in several plant genome assemblies.AvailabilityWe have implemented riboCleaner as a containerized Snakemake workflow. The workflow, instructions for building the container, and other documentation is available at https://github.com/basf. For convenience, a prebuilt Docker image containing riboCleaner is available at https://hub.docker.com/u/basfcontainers.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

matOptimize: A parallel tree optimization method enables online phylogenetics for SARS-CoV-2

Bioinformatics Oxford Journals - Wed, 22/06/2022 - 5:30am
AbstractMotivationPhylogenetic tree optimization is necessary for precise analysis of evolutionary and transmission dynamics, but existing tools are inadequate for handling the scale and pace of data produced during the COVID-19 pandemic. One transformative approach, online phylogenetics, aims to incrementally add samples to an ever-growing phylogeny, but there are no previously-existing approaches that can efficiently optimize this vast phylogeny under the time constraints of the pandemic.ResultsHere, we present matOptimize, a fast and memory-efficient phylogenetic tree optimization tool based on parsimony that can be parallelized across multiple CPU threads and nodes, and provides orders of magnitude improvement in runtime and peak memory usage compared to existing state-of-the-art methods. We have developed this method particularly to address the pressing need during the COVID-19 pandemic for daily maintenance and optimization of a comprehensive SARS-CoV-2 phylogeny. matOptimize is currently helping refine on a daily basis possibly the largest-ever phylogenetic tree, containing millions of SARS-CoV-2 sequences.AvailabilityThe matOptimize code is freely available as part of the UShER package (https://github.com/yatisht/usher) and can also be installed via bioconda (https://bioconda.github.io/recipes/usher/README.html). All scripts we used to perform the experiments in this manuscript are available at https://github.com/yceh/matOptimize-experiments.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

PrISM: Precision for Integrative Structural Models

Bioinformatics Oxford Journals - Mon, 20/06/2022 - 5:30am
AbstractMotivationA single precision value is currently reported for an integrative model. However, precision may vary for different regions of an integrative model owing to varying amounts of input information.ResultsWe develop PrISM (Precision for Integrative Structural Models), to efficiently identify high and low-precision regions for integrative models.AvailabilityPrISM is written in Python and available under the GNU General Public License v3.0 at https://github.com/isblab/prism; benchmark data used in this paper is available at doi:10.5281/zenodo.6241200.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

A rarefaction-without-resampling extension of PERMANOVA for testing presence-absence associations in the microbiome

Bioinformatics Oxford Journals - Mon, 20/06/2022 - 5:30am
AbstractMotivationPERMANOVA (McArdle and Anderson, 2001) is currently the most commonly used method for testing community-level hypotheses about microbiome associations with covariates of interest. PERMANOVA can test for associations that result from changes in which taxa are present or absent by using the Jaccard or unweighted UniFrac distance. However, such presence-absence analyses face a unique challenge: confounding by library size (total sample read count), which occurs when library size is associated with covariates in the analysis. It is known that rarefaction (subsampling to a common library size) controls this bias, but at the potential costs of information loss and the introduction of a stochastic component into the analysis.ResultsHere we develop a non-stochastic approach to PERMANOVA presence-absence analyses that aggregates information over all potential rarefaction replicates without actual resampling, when the Jaccard or unweighted UniFrac distance is used. We compare this new approach to three possible ways of aggregating PERMANOVA over multiple rarefactions obtained from resampling: averaging the distance matrix, averaging the (element-wise) squared distance matrix, and averaging the F-statistic. Our simulations indicate that our non-stochastic approach is robust to confounding by library size and outperforms each of the stochastic resampling approaches. We also show that, when overdispersion is low, averaging the (element-wise) squared distance outperforms averaging the unsquared distance, currently implemented in the R package vegan. We illustrate our methods using an analysis of data on inflammatory bowel disease (IBD) in which samples from case participants have systematically smaller library sizes than samples from control participants.Availability and ImplementationWe have implemented all the approaches described above, including the function for calculating the analytical average of the squared or unsquared distance matrix, in our R package LDM, which is available on GitHub at https://github.com/yijuanhu/LDM.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

NetRAX: Accurate and Fast Maximum Likelihood Phylogenetic Network Inference

Bioinformatics Oxford Journals - Fri, 17/06/2022 - 5:30am
AbstractMotivationPhylogenetic networks can represent non-treelike evolutionary scenarios. Current, actively developed approaches for phylogenetic network inference jointly account for non-treelike evolution and incomplete lineage sorting. Unfortunately, this induces a very high computational complexity and current tools can only analyze small data sets.ResultsWe present NetRAX, a tool for maximum likelihood inference of phylogenetic networks in the absence of incomplete lineage sorting. Our tool leverages state-of-the-art methods for efficiently computing the phylogenetic likelihood function on trees, and extends them to phylogenetic networks via the notion of “displayed trees”. NetRAX can infer maximum likelihood phylogenetic networks from partitioned multiple sequence alignments and returns the inferred networks in Extended Newick format. On simulated data, our results show a very low relative difference in BIC score and a near-zero unrooted softwired cluster distance to the true, simulated networks. With NetRAX, a network inference on a partitioned alignment with 8, 000 sites, 30 taxa, and 3 reticulations completes within a few minutes on a standard laptop.Availability and ImplementationOur implementation is available under the GNU General Public License v3.0 at https://github.com/lutteropp/NetRAX.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

MarkerMAG: linking metagenome-assembled genomes (MAGs) with 16S rRNA marker genes using paired-end short reads

Bioinformatics Oxford Journals - Fri, 17/06/2022 - 5:30am
AbstractMotivationMetagenome-assembled genomes (MAGs) have substantially extended our understanding of microbial functionality. However, 16S rRNA genes, which are commonly used in phylogenetic analysis and environmental surveys, are often missing from MAGs. Here, we developed MarkerMAG, a pipeline that links 16S rRNA genes to MAGs using paired-end sequencing reads.ResultsAssessment of MarkerMAG on three benchmarking metagenomic datasets with various degrees of complexity shows substantial increases in the number of MAGs with 16S rRNA genes and a 100% assignment accuracy. MarkerMAG also estimates the copy number of 16S rRNA genes in MAGs with high accuracy. Assessments on three real metagenomic datasets demonstrates 1.1- to 14.2-fold increases in the number of MAGs with 16S rRNA genes. We also show that MarkerMAG-improved MAGs increase the accuracy of functional prediction from 16S rRNA gene amplicon data. MarkerMAG is helpful in connecting information in MAG database with those in 16S rRNA databases and surveys and hence contributes to our increasing understanding of microbial diversity, function, and phylogeny.AvailabilityMarkerMAG is implemented in Python3 and freely available at https://github.com/songweizhi/MarkerMAG.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Sequence Tagging For Biomedical Extractive Question Answering

Bioinformatics Oxford Journals - Fri, 17/06/2022 - 5:30am
AbstractMotivationCurrent studies in extractive question answering (EQA) have modeled the single-span extraction setting, where a single answer span is a label to predict for a given question-passage pair. This setting is natural for general domain EQA as the majority of the questions in the general domain can be answered with a single span. Following general domain EQA models, current biomedical EQA (BioEQA) models utilize the single-span extraction setting with post-processing steps.ResultsIn this paper, we investigate the question distribution across the general and biomedical domains and discover biomedical questions are more likely to require list-type answers (multiple answers) than factoid-type answers (single answer). This necessitates the models capable of producing multiple answers for a question. Based on this preliminary study, we propose a sequence tagging approach for BioEQA, which is a multi-span extraction setting. Our approach directly tackles questions with a variable number of phrases as their answer and can learn to decide the number of answers for a question from training data. Our experimental results on the BioASQ 7 b and 8 b list-type questions outperformed the best-performing existing models without requiring post-processing steps.AvailabilitySource codes and resources are freely available for download at https://github.com/dmis-lab/SeqTagQASupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Shepherd: Accurate Clustering for Correcting DNA Barcode Errors

Bioinformatics Oxford Journals - Thu, 16/06/2022 - 5:30am
AbstractMotivationDNA barcodes are short, random nucleotide sequences introduced into cell populations to track the relative counts of hundreds of thousands of individual lineages over time. Lineage tracking is widely applied, e.g. to understand evolutionary dynamics in microbial populations and the progression of breast cancer in humans. Barcode sequences are unknown upon insertion and must be identified using next-generation sequencing technology, which is error prone. In this study, we frame the barcode error correction task as a clustering problem with the aim to identify true barcode sequences from noisy sequencing data. We present Shepherd, a novel clustering method that is based on an indexing system of barcode sequences using k-mers, and a Bayesian statistical test incorporating a substitution error rate to distinguish true from error sequences.ResultsWhen benchmarking with synthetic data, Shepherd provides barcode count estimates that are significantly more accurate than state-of-the-art methods, producing 10-150 times fewer spurious lineages. For empirical data, Shepherd produces results that are consistent with the improvements seen on synthetic data. These improvements enable higher resolution lineage tracking and more accurate estimates of biologically relevant quantities, e.g. the detection of small effect mutations.AvailabilityA Python implementation of Shepherd is freely available at: https://www.github.com/Nik-Tavakolian/Shepherd.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

DrawTetrado to create layer diagrams of G4 structures

Bioinformatics Oxford Journals - Wed, 15/06/2022 - 5:30am
AbstractMotivationQuadruplexes are specific 3D structures found in nucleic acids. Due to the exceptional properties of these motifs, their exploration with the general-purpose bioinformatics methods can be problematic or insufficient. The same applies to visualizing their structure. A hand-drawn layer diagram is the most common way to represent the quadruplex anatomy. No molecular visualization software generates such a structural model based on atomic coordinates.ResultsDrawTetrado is an open-source Python program for automated visualization targeting the structures of quadruplexes and G4-helices. It generates static layer diagrams that represent structural data in a pseudo-3D perspective. The possibility to set color schemes, nucleotide labels, inter-element distances, or angle of view allows for easy customization of the output drawing.AvailabilityThe program is available under the MIT license at https://github.com/RNApolis/drawtetrado
Categories: Bioinformatics Trends

scCNC: A method based on Capsule Network for Clustering scRNA-seq Data

Bioinformatics Oxford Journals - Tue, 14/06/2022 - 5:30am
AbstractMotivationA large number of studies have shown that clustering is a crucial step in scRNA-seq analysis. Most existing methods are based on unsupervised learning without the prior exploitation of any domain knowledge, which does not utilize available gold-standard labels. When confronted by the high dimensionality and general dropout events of scRNA-seq data, purely unsupervised clustering methods may not produce biologically interpretable clusters, which complicates cell type assignment.ResultsIn this paper, we propose a semi-supervised clustering method based on a capsule network named scCNC, that integrates domain knowledge into the clustering step. Significantly, we also propose a Semi-supervised Greedy Iterative Training (SGIT) method used to train the whole network. Experiments on some real scRNA-seq datasets show that scCNC can significantly improve clustering performance and facilitate downstream analyses.AvailabilityThe source code of scCNC is freely available at https://github.com/WHY-17/scCNC.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

MetBP: A Software Tool for Detection of Interaction between Metal Ion-RNA Base Pairs

Bioinformatics Oxford Journals - Mon, 13/06/2022 - 5:30am
AbstractMotivationThe role of metals in shaping and functioning of RNA is a well established fact and the understanding of that through the analysis of structural data has biological relevance. Often metal ions bind to one or more atoms of the nucleobase of an RNA. This fact becomes more interesting when such bases form a base pair with any other base. Furthermore, when metal ions bind to any residue of an RNA, the secondary structural features of the residue (helix, loop, unpaired etc) are also biologically important. The available metal binding related software tools cannot address such type specific queries.ResultsTo fill this limitation, we have designed a software tool, called MetBP, that meets the goal. This tool is a stand-alone command line based tool and has no dependency on the other existing software. It accepts a structure file in mmCIF or PDB format and computes the base pairs and thereafter reports all metals that bind to one or more nucleotides that form pairs with another. It reports binding distance, angles along with base pair stability. It also reports several other important aspects, e.g. secondary structure of the residue in the RNA. MetBP can be used as a generalized metal binding site detection tool for Proteins and DNA as well.Availabilityhttps://github.com/computational-biology/metbpSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends


Subscribe to Centre for Bioinformatics aggregator - Bioinformatics Trends


June 2022