Jump to Navigation

Improved genomic island predictions with IslandPath-DIMOB

Bioinformatics Oxford Journals - Fri, 23/02/2018 - 5:30am
AbstractMotivationGenomic islands (GIs) are clusters of genes of probable horizontal origin that play a major role in bacterial and archaeal genome evolution and microbial adaptability. They are of high medical and industrial interest, due to their enrichment in virulence factors, some antimicrobial resistance genes and adaptive metabolic pathways. The development of more sensitive but precise prediction tools, using either sequence composition-based methods or comparative genomics, is needed as large-scale analyses of microbial genomes increase.ResultsIslandPath-DIMOB, a leading GI prediction tool in the IslandViewer webserver, has now been significantly improved by modifying both the decision algorithm to determine sequence composition biases, and the underlying database of HMM profiles for associated mobility genes. The accuracy of IslandPath-DIMOB and other major software has been assessed using a reference GI dataset predicted by comparative genomics, plus a manually curated dataset from literature review. Compared to the previous version (v0.2.0), this IslandPath-DIMOB v1.0.0 achieves 11.7% and 5.3% increase in recall and precision, respectively. IslandPath-DIMOB has the highest Matthews correlation coefficient among individual prediction methods tested, combining one of the highest recall measures (46.9%) at high precision (87.4%). The only method with higher recall had notably lower precision (55.1%). This new IslandPath-DIMOB v1.0.0 will facilitate more accurate studies of GIs, including their key roles in microbial adaptability of medical, environmental, and industrial interest.AvailabilityIslandPath-DIMOB v1.0.0 is freely available through the IslandViewer webserver {{http://www.pathogenomics.sfu.ca/islandviewer/}} and as standalone software {{https://github.com/brinkmanlab/islandpath/}} under the GNU-GPLv3.Contactbrinkman@sfu.caSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

IDP-denovo: de novo transcriptome assembly and isoform annotation by hybrid sequencing

Bioinformatics Oxford Journals - Fri, 23/02/2018 - 5:30am
AbstractMotivationIn the past years, the long read (LR) sequencing technologies, such as Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT), have been demonstrated to substantially improve the quality of genome assembly and transcriptome characterization. Compared to the high cost of genome assembly by LR sequencing, it is more affordable to generate LRs for transcriptome characterization. That is, when informative transcriptome LR data are available without a high-quality genome, a method for de novo transcriptome assembly and annotation is of high demand.ResultsWithout a reference genome, IDP-denovo performs de novo transcriptome assembly, isoform annotation and quantification by integrating the strengths of LRs and short reads (SRs). Using the GM12878 human data as a gold standard, we demonstrated that IDP-denovo had superior sensitivity of transcript assembly and high accuracy of isoform annotation. In addition, IDP-denovo outputs two abundance indices to provide a comprehensive expression profile of genes/isoforms. IDP-denovo represents a robust approach for transcriptome assembly, isoform annotation and quantification for non-model organism studies. Applying IDP-denovo to a non-model organism, Dendrobium officinale, we discovered a number of novel genes and novel isoforms that were not reported by the existing annotation library. These results reveal the high diversity of gene isoforms in D. officinale, which was not reported in the existing annotation library.Availability and ImplementationIDP-denovo is freely available at www.healthcare.uiowa.edu/labs/au/IDP-denovo/Contactkinfai-au@uiowa.eduSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Integrative DNA copy number detection and genotyping from sequencing and array-based platforms

Bioinformatics Oxford Journals - Fri, 23/02/2018 - 5:30am
AbstractMotivationCopy number variations (CNVs) are gains and losses of DNA segments and have been associated with disease. Many large-scale genetic association studies are performing CNV analysis using whole exome sequencing (WES) and whole genome sequencing (WGS). In many of these studies, previous SNP-array data are available. An integrated cross-platform analysis is expected to improve resolution and accuracy, yet there is no tool for effectively combining data from sequencing and array platforms. The detection of CNVs using sequencing data alone can also be further improved by the utilization of allele-specific reads.ResultsWe propose a statistical framework, integrated Copy Number Variation detection algorithm (iCNV), which can be applied to multiple study designs: WES only, WGS only, SNP array only, or any combination of SNP and sequencing data. iCNV applies platform specific normalization, utilizes allele specific reads from sequencing and integrates matched NGS and SNP-array data by a Hidden Markov Model (HMM). We compare integrated two-platform CNV detection using iCNV to naïve intersection or union of platforms and show that iCNV increases sensitivity and robustness. We also assess the accuracy of iCNV on WGS data only, and show that the utilization of allele-specific reads improve CNV detection accuracy compared to existing methods.Availabilityhttps://github.com/zhouzilu/iCNVContactnzh@wharton.upenn.edu, zhouzilu@mail.med.upenn.eduSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

PartsGenie: an integrated tool for optimising and sharing synthetic biology parts

Bioinformatics Oxford Journals - Fri, 23/02/2018 - 5:30am
AbstractMotivationSynthetic biology is typified by developing novel genetic constructs from the assembly of reusable synthetic DNA parts, which contain one or more features such as promoters, ribosome binding sites, coding sequences and terminators. PartsGenie is introduced to facilitate the computational design of such synthetic biology parts, bridging the gap between optimisation tools for the design of novel parts, the representation of such parts in community-developed data standards such as Synthetic Biology Open Language (SBOL), and their sharing in journal-recommended data repositories. Consisting of a drag-and-drop web interface, a number of DNA optimisation algorithms, and an interface to the well-used data repository JBEI ICE, PartsGenie facilitates the design, optimisation and dissemination of reusable synthetic biology parts through an integrated application.AvailabilityPartsGenie is freely available at https://parts.synbiochem.co.uk.Contactneil.swainston@manchester.ac.uk
Categories: Bioinformatics Trends

MARSI: metabolite analogues for rational strain improvement

Bioinformatics Oxford Journals - Fri, 23/02/2018 - 5:30am
AbstractSummaryMetabolite analogues (MAs) mimic the structure of native metabolites, can competitively inhibit their utilization in enzymatic reactions, and are commonly used as selection tools for isolating desirable mutants of industrial microorganisms. Genome-scale metabolic models representing all biochemical reactions in an organism can be used to predict effects of MAs on cellular phenotypes. Here, we present the Metabolite Analogues for Rational Strain Improvement (MARSI) framework. MARSI provides a rational approach to strain improvement by searching for metabolites as targets instead of genes or reactions. The designs found by MARSI can be implemented by supplying MAs in the culture media, enabling metabolic rewiring without the use of recombinant DNA technologies that cannot always be used due to regulations. To facilitate experimental implementation, MARSI provides tools to identify candidate MAs to a target metabolite from a database of known drugs and analogues.Availability and ImplementationThe code is freely available at https://github.com/biosustain/marsi under the Apache License V2. MARSI is implemented in Python.ContactDKAHZE@chr-hansen.com, herrgard@biosustain.dtu.dkSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

nVenn: Generalized, quasi-proportional Venn and Euler diagrams

Bioinformatics Oxford Journals - Fri, 23/02/2018 - 5:30am
AbstractMotivationVenn and Euler diagrams are extensively used for the visualization of relationships between experiments and data sets. However, representing more than three data sets while keeping the proportions of each region is still not feasible with existing tools.ResultsWe present an algorithm to render all the regions of a generalized n-dimensional Venn diagram, while keeping the area of each region approximately proportional to the number of elements included. In addition, missing regions in Euler diagrams lead to simplified representations. The algorithm generates an n-dimensional Venn diagram and inserts circles of given areas in each region. Then, the diagram is rearranged with a dynamic, self-correcting simulation in which each set border is contracted until it contacts the circles inside. This algorithm is implemented in a C ++ tool (nVenn) with or without a web interface. The web interface also provides the ability to analyze the regions of the diagram.AvailabilityThe source code and pre-compiled binaries of nVenn are available at https://github.com/vqf/nVenn. A web interface for up to six sets can be accessed at http://degradome.uniovi.es/cgi-bin/nVenn/nvenn.cgi.Contactquesadavictor@uniovi.esSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

TAPAS: Tool for Alternative Polyadenylation Site Analysis

Bioinformatics Oxford Journals - Fri, 23/02/2018 - 5:30am
AbstractMotivationThe length of the 3′ untranslated region (3′ UTR) of an mRNA is essential for many biological activities such as mRNA stability, sub-cellular localization, protein translation, protein binding, and translation efficiency. Moreover, correlation between diseases and the shortening (or lengthening) of 3′ UTRs has been reported in the literature. This length is largely determined by the polyadenylation cleavage site in the mRNA. As alternative polyadenylation (APA) sites are common in mammalian genes, several tools have been published recently for detecting APA sites from RNA-Seq data or performing shortening/lengthening analysis. These tools consider either up to only two APA sites in a gene or only APA sites that occur in the last exon of a gene, although a gene may generally have more than two APA sites and an APA site may sometimes occur before the last exon. Furthermore, the tools are unable to integrate the analysis of shortening/lengthening events with APA site detection.ResultsWe propose a new tool, called TAPAS, for detecting novel APA sites from RNA-Seq data. It can deal with more than two APA sites in a gene as well as APA sites that occur before the last exon. The tool is based on an existing method for finding change points in time series data, but some filtration techniques are also adopted to remove change points that are likely false APA sites. It is then extended to identify APA sites that are expressed differently between two biological samples and genes that contain 3′ UTRs with shortening/lengthening events. Our extensive experiments on simulated and real RNA-Seq data demonstrate that TAPAS outperforms the existing tools for APA site detection or shortening/lengthening analysis significantly.Availabilityhttps://github.com/arefeen/TAPASContactjiang@cs.ucr.edu, gxxiao@ucla.eduSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

PhyloMAd: Efficient assessment of phylogenomic model adequacy

Bioinformatics Oxford Journals - Wed, 21/02/2018 - 5:30am
AbstractSummaryStatistical phylogenetic inference plays an important role in evolutionary biology. The accuracy of phylogenetic methods relies on having suitable models of the evolutionary process. Various tools allow comparisons of candidate phylogenetic models, but assessing the absolute performance of models remains a considerable challenge. We introduce PhyloMAd, a user-friendly application for assessing the adequacy of commonly used models of nucleotide substitution and among-lineage rate variation. Our software implements a fast, likelihood-based method of model assessment that is tractable for analyses of large multi-locus data sets. PhyloMAd provides a means of informing model improvement, or selecting data to enhance the evolutionary signal in phylogenomic analyses.AvailabilityPhyloMAd, together with a manual, a tutorial, and the source code, are freely available from the GitHub repository github.com/duchene/phylomadContactdavid.duchene@sydney.edu.au
Categories: Bioinformatics Trends

Isoform specific gene expression analysis of KRAS in the prognosis of lung adenocarcinoma patients

BMC Bioinformatics - Mon, 19/02/2018 - 5:30am
Aberrant mutations in KRAS play a critical role in tumor initiation and progression, and are a negative prognosis factor in lung adenocarcinoma (LUAD).
Categories: Bioinformatics Trends

Improving prediction of heterodimeric protein complexes using combination with pairwise kernel

BMC Bioinformatics - Mon, 19/02/2018 - 5:30am
Since many proteins become functional only after they interact with their partner proteins and form protein complexes, it is essential to identify the sets of proteins that form complexes. Therefore, several c...
Categories: Bioinformatics Trends

Introducing difference recurrence relations for faster semi-global alignment of long sequences

BMC Bioinformatics - Mon, 19/02/2018 - 5:30am
The read length of single-molecule DNA sequencers is reaching 1 Mb. Popular alignment software tools widely used for analyzing such long reads often take advantage of single-instruction multiple-data (SIMD) op...
Categories: Bioinformatics Trends

BRCA-Pathway: a structural integration and visualization system of TCGA breast cancer data on KEGG pathways

BMC Bioinformatics - Mon, 19/02/2018 - 5:30am
Bioinformatics research for finding biological mechanisms can be done by analysis of transcriptome data with pathway based interpretation. Therefore, researchers have tried to develop tools to analyze transcri...
Categories: Bioinformatics Trends

Closha: bioinformatics workflow system for the analysis of massive sequencing data

BMC Bioinformatics - Mon, 19/02/2018 - 5:30am
While next-generation sequencing (NGS) costs have fallen in recent years, the cost and complexity of computation remain substantial obstacles to the use of NGS in bio-medical care and genomic research. The rap...
Categories: Bioinformatics Trends

Multiobjective multifactor dimensionality reduction to detect SNP–SNP interactions

Bioinformatics Oxford Journals - Mon, 19/02/2018 - 5:30am
AbstractMotivationSingle-nucleotide polymorphism (SNP)–SNP interactions (SSIs) are popular markers for understanding disease susceptibility. Multifactor dimensionality reduction (MDR) can successfully detect considerable SSIs. Currently, MDR-based methods mainly adopt a single-objective function (a single measure based on contingency tables) to detect SSIs. However, generally, a single-measure function might not yield favorable results due to potential model preferences and disease complexities.ApproachThis study proposes a multiobjective MDR (MOMDR) method that is based on a contingency table of MDR as an objective function. MOMDR considers the incorporated measures, including correct classification and likelihood rates, to detect SSIs and adopts set theory to predict the most favorable SSIs with cross-validation consistency. MOMDR enables simultaneously using multiple measures to determine potential SSIs.ResultsThree simulation studies were conducted to compare the detection success rates of MOMDR and single-objective MDR (SOMDR), revealing that MOMDR had higher detection success rates than SOMDR. Furthermore, the Wellcome Trust Case Control Consortium data set was analyzed by MOMDR to detect SSIs associated with coronary artery disease.
Categories: Bioinformatics Trends

StructRNAfinder: an automated pipeline and web server for RNA families prediction

BMC Bioinformatics - Sat, 17/02/2018 - 5:30am
The function of many noncoding RNAs (ncRNAs) depend upon their secondary structures. Over the last decades, several methodologies have been developed to predict such structures or to use them to functionally a...
Categories: Bioinformatics Trends

SecretSanta: flexible pipelines for functional secretome prediction

Bioinformatics Oxford Journals - Fri, 16/02/2018 - 5:30am
AbstractMotivationThe secretome denotes the collection of secreted proteins exported outside of the cell. The functional roles of secreted proteins include the maintenance and remodelling of the extracellular matrix as well as signalling between host and non-host cells. These features make secretomes rich reservoirs of biomarkers for disease classification and host-pathogen interaction studies. Common biomarkers are extracellular proteins secreted via classical pathways that can be predicted from sequence by annotating the presence or absence of N-terminal signal peptides. Several heterogeneous command line tools and web-interfaces exist to identify individual motifs, signal sequences and domains that are either characteristic or strictly excluded from secreted proteins. However, a single flexible secretome-prediction workflow that combines all analytic steps is still missing.ResultsTo bridge this gap the SecretSanta package implements wrapper and parser functions around established command line tools for the integrative prediction of extracellular proteins that are secreted via classical pathways. The modularity of SecretSanta enables users to create tailored pipelines and apply them across the whole tree of life to facilitate comparison of secretomes across multiple species or under various conditions.AvailabilitySecretSanta is implemented in the R programming language and is released under GPL-3 license. All functions have been optimized and parallelized to allow large-scale processing of sequences. The open-source code, installation instructions, and vignette with use case scenarios can be downloaded from https://github.com/gogleva/SecretSanta.Contactanna.gogleva@slcu.cam.ac.ukSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

A new approach for interpreting random forest models and its application to the biology of ageing

Bioinformatics Oxford Journals - Fri, 16/02/2018 - 5:30am
AbstractMotivationThis work uses the Random Forest (RF) classification algorithm to predict if a gene is overexpressed, underexpressed or has no change in expression with age in the brain. RFs have high predictive power, and RF models can be interpreted using a feature (variable) importance measure. However, current feature importance measures evaluate a feature as a whole (all feature values). We show that, for a popular type of biological data (Gene Ontology-based), usually only one value of a feature is particularly important for classification and the interpretation of the RF model. Hence, we propose a new algorithm for identifying the most important and most informative feature values in an RF model.ResultsThe new feature importance measure identified highly relevant Gene Ontology terms for the aforementioned gene classification task, producing a feature ranking that is much more informative to biologists than an alternative, state-of-the-art feature importance measure.AvailabilityThe dataset and source codes used in this paper are available as “supplementary materialsupplementary material” and the description of the data can be found at: https://fabiofabris.github.io/bioinfo2018/ web/.
Categories: Bioinformatics Trends

flowLearn: Fast and precise identification and quality checking of cell populations in flow cytometry

Bioinformatics Oxford Journals - Thu, 15/02/2018 - 5:30am
AbstractMotivationIdentification of cell populations in flow cytometry is a critical part of the analysis and lays the groundwork for many applications and research discovery. The current paradigm of manual analysis is time consuming and subjective. A common goal of users is to replace manual analysis with automated methods that replicate their results. Supervised tools provide the best performance in such a use case, however they require fine parameterization to obtain the best results. Hence, there is a strong need for methods that are fast to setup, accurate and interpretable.ResultsflowLearn is a semi-supervised approach for the quality-checked identification of cell populations. Using a very small number of manually gated samples, through density alignments it is able to predict gates on other samples with high accuracy and speed. On two state-of-the-art data sets, our tool achieves median(F1)-measures exceeding 0.99 for 31%, and 0.90 for 80% of all analyzed populations. Furthermore, users can directly interpret and adjust automated gates on new sample files to iteratively improve the initial training.AvailabilityFlowLearn is available as an R package on https://github.com/mlux86/flowLearn. Evaluation data is publicly available online. Details can be found in the supplementary materialsupplementary material.Contactmlux|bhammer@techfak.uni-bielefeld.deSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

LS-align: an atom-level, flexible ligand structural alignment algorithm for high-throughput virtual screening

Bioinformatics Oxford Journals - Thu, 15/02/2018 - 5:30am
AbstractMotivationSequence-order independent structural comparison, also called structural alignment, of small ligand molecules is often needed for computer-aided virtual drug screening. Although many ligand structure alignment programs are proposed, most of them build the alignments based on rigid-body shape comparison which cannot provide atom-specific alignment information nor allow structural variation; both abilities are critical to efficient high-throughput virtual screening.ResultsWe propose a novel ligand comparison algorithm, LS-align, to generate fast and accurate atom-level structural alignments of ligand molecules, through an iterative heuristic search of the target function that combines inter-atom distance with mass and chemical bond comparisons. LS-align contains two modules of Rigid-LS-align and Flexi-LS-align, designed for rigid-body and flexible alignments, respectively, where a ligand-size independent, statistics-based scoring function is developed to evaluate the similarity of ligand molecules relative to random ligand pairs. Large-scale benchmark tests are performed on prioritizing chemical ligands of 102 protein targets involving 1,415,871 candidate compounds from the DUD-E (Database of Useful Decoys: Enhanced) database, where LS-align achieves an average enrichment factor (EF) of 22.0 at the 1% cutoff and the AUC score of 0.75, which are significantly higher than other state-of-the-art methods. Detailed data analyses show that the advanced performance is mainly attributed to the design of the target function that combines structural and chemical information to enhance the sensitivity of recognizing subtle difference of ligand molecules and the introduces of structural flexibility that help capture the conformational changes induced by the ligand-receptor binding interactions. These data demonstrate a new avenue to improve the virtual screening efficiency through the development of sensitive ligand structural alignments.Availabilityhttp://zhanglab.ccmb.med.umich.edu/LS-align/Contactnjyudj@njust.edu.cn or zhng@umich.eduSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Combining co-evolution and secondary structure prediction to improve fragment library generation.

Bioinformatics Oxford Journals - Thu, 15/02/2018 - 5:30am
AbstractMotivationRecent advances in co-evolution techniques have made possible the accurate prediction of protein structures in the absence of a template. Here, we provide a general approach that further utilizes co- evolution constraints to generate better fragment libraries for fragment-based protein structure prediction.ResultsWe have compared five different fragment library generation programmes on three different data sets encompassing over 400 unique protein folds.We show that considering the secondary structure of the fragments when assembling these libraries provides a critical way to assess their usefulness to structure prediction. We then use co-evolution constraints to improve the fragment libraries by enriching them with fragments that satisfy constraints and discarding those that do not. These improved libraries have better precision and lead to consistently better modelling results.AvailabilityData is available for download from: http://opig.stats.ox.ac.uk/resources. Flib-Coevo is available for download from: https://github.com/sauloho/Flib-CoevoContactsaulo.deoliveira@dtc.ox.ac.ukSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends
Subscribe to Centre for Bioinformatics aggregator - Bioinformatics Trends

Calendar

Mon
Tue
Wed
Thu
Fri
Sat
Sun
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
 
February 2018