Jump to Navigation

Interpretable factor models of single-cell RNA-seq via variational autoencoders

Bioinformatics Oxford Journals - Mon, 16/03/2020 - 5:30am
AbstractMotivationSingle-cell RNA-seq makes possible the investigation of variability in gene expression among cells, and dependence of variation on cell type. Statistical inference methods for such analyses must be scalable, and ideally interpretableResultsWe present an approach based on a modification of a recently published highly scalable variational autoencoder framework that provides interpretability without sacrificing much accuracy. We demonstrate that our approach enables identification of gene programs in massive datasets. Our strategy, namely the learning of factor models with the auto-encoding variational Bayes framework, is not domain specific and may be useful for other applications.AvailabilityThe factor model is available in the scVI package hosted on https://github.com/YosefLab/scVI/.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

How to Get Your Goat: Automated Identification of Species from MALDI-ToF Spectra

Bioinformatics Oxford Journals - Mon, 16/03/2020 - 5:30am
AbstractMotivationClassification of archaeological animal samples is commonly achieved via manual examination of MALDI-ToF spectra. This is a time-consuming process which requires significant training and which does not produce a measure of confidence in the classification. We present a new, automated method for arriving at a classification of a MALDI-ToF sample, provided the collagen sequences for each candidate species are available. The approach derives a set of peptide masses from the sequence data for comparison with the sample data, which is carried out by cross-correlation. A novel way of combining evidence from multiple marker peptides is used to interpret the raw alignments and arrive at a classification with an associated confidence measure.ResultsTo illustrate the efficacy of the approach, we tested the new method with a previously published classification of parchment folia from a copy of the Gospel of Luke, produced around 1120 C.E. by scribes at St. Augustine’s Abbey in Canterbury, U.K. 80 of the 81 samples were given identical classifications by both methods. In addition the new method gives a quantifiable level of confidence in each classification.AvailabilityThe software can be found at https://github.com/bioarch-sjh/bacollite, and can be installed in R using devtools.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Ultraplexing: increasing the efficiency of long-read sequencing for hybrid assembly with k-mer-based multiplexing

Genome Biology - BiomedCentral - Sat, 14/03/2020 - 5:30am
Hybrid genome assembly has emerged as an important technique in bacterial genomics, but cost and labor requirements limit large-scale application. We present Ultraplexing, a method to improve per-sample sequen...
Categories: Bioinformatics Trends

GlycoGlyph: A glycan visualizing, drawing and naming application

Bioinformatics Oxford Journals - Sat, 14/03/2020 - 5:30am
AbstractMotivationGlycan structures are commonly represented using symbols or linear nomenclature such as that from the Consortium for Functional Glycomics (CFG) (also known as modified IUPAC condensed nomenclature). No current tool allows for writing the name in such format using a graphical user interface (GUI); thus, names are prone to errors or non-standardized representations.ResultsHere we present GlycoGlyph, a web application built using JavaScript, which is capable of drawing glycan structures using a GUI and providing the linear nomenclature as an output or using it as an input in a dynamic manner. GlycoGlyph also allows users to save the structures as an SVG vector graphic, and allows users to export the structure as condensed GlycoCT.AvailabilityThe application can be used at: https://glycotoolkit.com/Tools/GlycoGlyph/. The application is tested to work in modern web browsers such as Firefox or Chrome.Supplementary informationCode, and instructions along with tutorials are available at https://github.com/akulmehta/GlycoGlyphPublic/
Categories: Bioinformatics Trends

Ultraplexing: increasing the efficiency of long-read sequencing for hybrid assembly with k-mer-based multiplexing

Genome Biology - Sat, 14/03/2020 - 5:30am
Hybrid genome assembly has emerged as an important technique in bacterial genomics, but cost and labor requirements limit large-scale application. We present Ultraplexing, a method to improve per-sample sequen...
Categories: Bioinformatics Trends

Joining Illumina paired-end reads for classifying phylogenetic marker sequences

BMC Bioinformatics - Sat, 14/03/2020 - 5:30am
Illumina sequencing of a marker gene is popular in metagenomic studies. However, Illumina paired-end (PE) reads sometimes cannot be merged into single reads for subsequent analysis. When mergeable PE reads are...
Categories: Bioinformatics Trends

Machine learning prediction of oncology drug targets based on protein and network properties

BMC Bioinformatics - Sat, 14/03/2020 - 5:30am
The selection and prioritization of drug targets is a central problem in drug discovery. Computational approaches can leverage the growing number of large-scale human genomics and proteomics data to make in-si...
Categories: Bioinformatics Trends

MoAIMS: efficient software for detection of enriched regions of MeRIP-Seq

BMC Bioinformatics - Sat, 14/03/2020 - 5:30am
Methylated RNA immunoprecipitation sequencing (MeRIP-Seq) is a popular sequencing method for studying RNA modifications and, in particular, for N6-methyladenosine (m6A), the most abundant RNA methylation modif...
Categories: Bioinformatics Trends

tugHall: a simulator of cancer-cell evolution based on the hallmarks of cancer and tumor-related genes

Bioinformatics Oxford Journals - Sat, 14/03/2020 - 5:30am
AbstractSummaryThe flood of recent cancer genomic data requires a coherent model that can sort out the findings to systematically explain clonal evolution and the resultant intra-tumor heterogeneity (ITH). Here, we present a new mathematical model designed to computationally simulate the evolution of cancer cells. The model connects the well-known hallmarks of cancer with the specific mutational states of tumor-related genes. The cell behavior phenotypes are stochastically determined and the hallmarks probabilistically interfere with the phenotypic probabilities. In turn, the hallmark variables depend on the mutational states of tumor-related genes. Thus, our software can deepen our understanding of cancer-cell evolution and generation of ITH.Availability and implementationThe open-source code is available in the repository https://github.com/nagornovys/Cancer_cell_evolution.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

VISAR: an interactive tool for dissecting chemical features learned by deep neural network QSAR models

Bioinformatics Oxford Journals - Sat, 14/03/2020 - 5:30am
AbstractWhile many quantitative structure-activity relationship (QSAR) models are trained and evaluated for their predictive merits, understanding what models have been learning is of critical importance. However, the interpretation and visualization of QSAR model results remain challenging, especially for ‘black box’ models such as deep neural network (DNN). Here we take a step forward to interpret the learned chemical features from DNN QSAR models, and present VISAR, an interactive tool for visualizing the structure-activity relationship (SAR). VISAR firstly provides functions to construct and train DNN models. Then VISAR builds the activity landscapes based on a series of compounds using the trained model, showing the correlation between the chemical feature space and the experimental activity space after model training, and allowing for knowledge mining from a global perspective. VISAR also maps the gradients of the chemical features to the corresponding compounds as contribution weights for each atom, and visualizes the positive and negative contributor substructures suggested by the models from a local perspective. Using the web application of VISAR, users could interactively explore the activity landscape and the color-coded atom contributions. We propose that VISAR could serve as a helpful tool for training and interactive analysis of the DNN QSAR model, providing insights for drug design, and an additional level of model validation.Availability and ImplementationThe source code and usage instructions for VISAR are available on github https://github.com/Svvord/visar.Supplementary InformationSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

FAME: Fast And Memory Efficient multiple sequences alignment tool through compatible chain of roots

Bioinformatics Oxford Journals - Sat, 14/03/2020 - 5:30am
AbstractMotivationMultiple sequence alignment (MSA) is important and challenging problem of computational biology. Most of the existing methods can only provide a short length multiple alignments in an acceptable time. Nevertheless, when the researchers confront the genome size in the multiple alignments, the process has required a huge processing space/time. Accordingly, using the method that can align genome size rapidly and precisely has a great effect, especially on the analysis of the very long alignments. Herein, we have proposed an efficient method, called FAME, which vertically divides sequences from the places that they have common areas; then they are arranged in consecutive order. Then these common areas are shifted and placed under each other, and the subsequences between them are aligned using any existing MSA tool.ResultsThe results demonstrate that the combination of FAME and the MSA methods and deploying minimizer are capable to be executed on personal computer and finely align long length sequences with much higher sum-of-pair (SP) score compared to the standalone MSA tools. As we select genomic datasets with longer length, the SP score of the combinatorial methods is gradually improved. The calculated computational complexity of methods supports the results in a way that combining FAME and the MSA tools leads to at least four times faster execution on the datasets.AvailabilityThe source code and all datasets and run-parameters are accessible free on http://github.com/naznoosh/msa.
Categories: Bioinformatics Trends

κ-helix and the helical lock and key model: A pivotal way of looking at polyproline II

Bioinformatics Oxford Journals - Sat, 14/03/2020 - 5:30am
AbstractMotivationPolyproline II (PPII) is a common conformation, comparable to α-helix and β-sheet and is a candidate for being the most prevalent secondary structure. PPII, recently termed with a more generic name – κ-helix, adopts a left-handed structure with 3-fold rotational symmetry. Lately, a new type of binding mechanism – the helical lock and key model was introduced in SH3-domain complexes, where the interaction is characterized by a sliding helical pattern. However, whether this binding mechanism is unique only to SH3 domains is unreported.ResultsHere, we show that the helical binding pattern is a universal feature of the κ-helix conformation, present within all the major target families - SH3, WW, profilin, MHC-II, EVH1, and GYF domains. Based on a geometric analysis of 255 experimentally solved structures, we found that they are characterized by a distinctive rotational angle along the helical axis. Furthermore, we found that the range of helical pitch varies between different protein domains or peptide orientations and that the interaction is also represented by a rotational displacement mimicking helical motion. The discovery of rotational interactions as a mechanism, reveals a new dimension in the realm of protein-protein interactions, which introduces a new layer of information encoded by the helical conformation. Due to the extensive involvement of the conformation in functional interactions, we anticipate our model to expand the current molecular understanding of the relationship between protein structure and function.AvailabilityWe have implemented the proposed methods in an R package freely available at https://github.com/Grantlab/bio3dSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

primirTSS: An R package for identifying cell-specific microRNA transcription start sites

Bioinformatics Oxford Journals - Sat, 14/03/2020 - 5:30am
AbstractSummaryThe R/Bioconductor package primirTSS is a fast and convenient tool that allows implementation of the analytical method to identify transcription start sites of microRNAs by integrating ChIP-seq data of H3K4me3 and Pol II. It further ensures the precision by employing the conservation score and sequence features. The tool showed a good performance when using H3K4me3 or Pol II Chip-seq data alone as input, which brings convenience to applications where multiple data sets are hard to acquire. This flexible package is provided with both R-programming interfaces as well as graphical web interfaces.AvailabilityprimirTSS is available at: http://bioconductor.org/packages/primirTSS The documentation of the package including an accompanying tutorial was deposited at: https://bioconductor.org/packages/release/bioc/vignettes/primirTSS/inst/doc/primirTSS.htmlSupplementary informationSupplementary DataSupplementary Data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Reconstructing ribosomal genes from large scale total RNA meta-transcriptomic data

Bioinformatics Oxford Journals - Fri, 13/03/2020 - 5:30am
AbstractMotivationTechnological advances in metatranscriptomics have enabled a deeper understanding of the structure and function of microbial communities. “Total RNA” metatranscriptomics, sequencing of total reverse transcribed RNA, provides a unique opportunity to investigate both the structure and function of active microbial communities from all three domains of life simultaneously. A major step of this approach is the reconstruction of full-length taxonomic marker genes such as the small subunit ribosomal RNA (SSU rRNA). However, current tools for this purpose are mainly targeted towards analysis of amplicon and metagenomic data and thus lack the ability to handle the massive and complex datasets typically resulting from total RNA experiments.ResultsIn this work we introduce MetaRib, a new tool for reconstructing ribosomal gene sequences from total RNA meta-transcriptomic data. MetaRib is based on the popular rRNA assembly program EMIRGE (Miller et al., 2013), together with several improvements. We address the challenge posed by large complex datasets by integrating sub-assembly, dereplication and mapping in an iterative approach, with additional post-processing steps. We applied the method to both simulated and real-world datasets. Our results show that MetaRib can deal with larger data sets and recover more rRNA genes, which achieve around 60 times speedup and higher F1 score compared to EMIRGE in simulated datasets. In the real-world dataset, it shows similar trends but recovers more contigs compared with a previous analysis based on random sub-sampling, while enabling the comparison of individual contig abundances across samples for the first time.AvailabilityThe source code of MetaRib is freely available at https://github.com/yxxue/MetaRibSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

NIHBA: A Network Interdiction Approach for Metabolic Engineering Design

Bioinformatics Oxford Journals - Fri, 13/03/2020 - 5:30am
AbstractMotivationFlux balance analysis (FBA) based bilevel optimisation has been a great success in redesigning metabolic networks for biochemical overproduction. To date, many computational approaches have been developed to solve the resulting bilevel optimisation problems. However, most of them are of limited use due to biased optimality principle, poor scalability with the size of metabolic networks, potential numeric issues, or low quantity of design solutions in a single run.ResultsHere, we have employed a network interdiction (NI) model free of growth optimality assumptions, a special case of bilevel optimisation, for computational strain design and have developed a hybrid Benders algorithm (HBA) that deals with complicating binary variables in the model, thereby achieving high efficiency without numeric issues in search of best design strategies. More importantly, HBA can list solutions that meet users’ production requirements during the search, making it possible to obtain numerous design strategies at a small runtime overhead (typically ∼1 hour for examples studied in this paper).AvailabilitySource code implemented in the MATALAB Cobratoolbox is freely available at https://github.com/chang88ye/NIHBA.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

GraphBin: Refined binning of metagenomic contigs using assembly graphs

Bioinformatics Oxford Journals - Fri, 13/03/2020 - 5:30am
AbstractMotivationThe field of metagenomics has provided valuable insights into the structure, diversity and ecology within microbial communities. One key step in metagenomics analysis is to assemble reads into longer contigs which are then binned into groups of contigs that belong to different species present in the metagenomic sample. Binning of contigs plays an important role in metagenomics and most available binning algorithms bin contigs using genomic features such as oligonucleotide/k-mer composition and contig coverage. As metagenomic contigs are derived from the assembly process, they are output from the underlying assembly graph which contains valuable connectivity information between contigs that can be used for binning.ResultsWe propose GraphBin, a new binning method that makes use of the assembly graph and applies a label propagation algorithm to refine the binning result of existing tools.We show that GraphBin can make use of the assembly graphs constructed from both the de Bruijn graph and the overlap-layout-consensus approach. Moreover, we demonstrate improved experimental results from GraphBin in terms of identifying mis-binned contigs and binning of contigs discarded by existing binning tools. To the best of our knowledge, this is the first time that the information from the assembly graph has been used in a tool for the binning of metagenomic contigs.AvailabilityThe source code of GraphBin is available at https://github.com/Vini2/GraphBinSupplementary InformationSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

projectR: An R/Bioconductor package for transfer learning via PCA, NMF, correlation, and clustering

Bioinformatics Oxford Journals - Fri, 13/03/2020 - 5:30am
AbstractMotivationDimension reduction techniques are widely used to interpret high-dimensional biological data. Features learned from these methods are used to discover both technical artifacts and novel biological phenomena. Such feature discovery is critically import to large single-cell datasets, where lack of a ground truth limits validation and interpretation. Transfer learning (TL) can be used to relate the features learned from one source dataset to a new target dataset to perform biologically-driven validation by evaluating their use in or association with additional sample annotations in that independent target dataset.ResultsWe developed an R/Bioconductor package, projectR, to perform TL for analyses of genomics data via TL of clustering, correlation, and factorization methods. We then demonstrate the utility TL for integrated data analysis with an example for spatial single-cell analysis.AvailabilityprojectR is available on Bioconductor and at https://github.com/genesofeve/projectR.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Apollo: A Sequencing-Technology-Independent, Scalable, and Accurate Assembly Polishing Algorithm

Bioinformatics Oxford Journals - Fri, 13/03/2020 - 5:30am
AbstractMotivationThird-generation sequencing technologies can sequence long reads that contain as many as 2 million base pairs (bp). These long reads are used to construct an assembly (i.e., the subject’s genome), which is further used in downstream genome analysis. Unfortunately, third-generation sequencing technologies have high sequencing error rates and a large proportion of bps in these long reads are incorrectly identified. These errors propagate to the assembly and affect the accuracy of genome analysis. Assembly polishing algorithms minimize such error propagation by polishing or fixing errors in the assembly by using information from alignments between reads and the assembly (i.e., read-to-assembly alignment information). However, current assembly polishing algorithms can only polish an assembly using reads either from a certain sequencing technology or from a small assembly. Such technology-dependency and assembly-size dependency require researchers to 1) run multiple polishing algorithms and 2) use small chunks of a large genome to use all available read sets and polish large genomes, respectively.ResultsWe introduce Apollo, a universal assembly polishing algorithm that scales well to polish an assembly of any size (i.e., both large and small genomes) using reads from all sequencing technologies (i.e., second- and third-generation). Our goal is to provide a single algorithm that uses read sets from all available sequencing technologies to improve the accuracy of assembly polishing and that can polish large genomes. Apollo 1) models an assembly as a profile hidden Markov model (pHMM), 2) uses read-to-assembly alignment to train the pHMM with the Forward-Backward algorithm, and 3) decodes the trained model with the Viterbi algorithm to produce a polished assembly. Our experiments with real read sets demonstrate that Apollo is the only algorithm that 1) uses reads from any sequencing technology within a single run and 2) scales well to polish large assemblies without splitting the assembly into multiple parts.Supplementary informationSupplementary dataSupplementary data is available at Bioinformatics online. online.AvailabilitySource code is available at https://github.com/CMU-SAFARI/Apollo
Categories: Bioinformatics Trends

Pages

Subscribe to Centre for Bioinformatics aggregator - Bioinformatics Trends

Calendar

Mon
Tue
Wed
Thu
Fri
Sat
Sun
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
 
April 2020