Jump to Navigation

TCRBuilder: Multi-state T-cell receptor structure prediction

Bioinformatics Oxford Journals - Tue, 17/03/2020 - 5:30am
AbstractMotivationT-cell receptors (TCRs) are immune proteins that primarily target peptide antigens presented by the major histocompatibility complex. They tend to have lower specificity and affinity than their antibody counterparts, and their binding sites have been shown to adopt multiple conformations, which is potentially an important factor for their polyspecificity. None of the current TCR modelling tools predict this variability which limits our ability to accurately predict TCR binding.ResultsWe present TCRBuilder, a multi-state TCR structure prediction tool. Given a paired α βTCR sequence, TCRBuilder returns a model or an ensemble of models covering the potential conformations of the binding site. This enables the analysis of structurally-driven polyspecificity in TCRs, which is not possible with existing tools.Availabilityhttp://opig.stats.ox.ac.uk/resourcesSupplementary informationSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

BANDITS: Bayesian differential splicing accounting for sample-to-sample variability and mapping uncertainty

Genome Biology - BiomedCentral - Mon, 16/03/2020 - 5:30am
Alternative splicing is a biological process during gene expression that allows a single gene to code for multiple proteins. However, splicing patterns can be altered in some conditions or diseases. Here, we p...
Categories: Bioinformatics Trends

BANDITS: Bayesian differential splicing accounting for sample-to-sample variability and mapping uncertainty

Genome Biology - Mon, 16/03/2020 - 5:30am
Alternative splicing is a biological process during gene expression that allows a single gene to code for multiple proteins. However, splicing patterns can be altered in some conditions or diseases. Here, we p...
Categories: Bioinformatics Trends

M2IA: a Web Server for Microbiome and Metabolome Integrative Analysis

Bioinformatics Oxford Journals - Mon, 16/03/2020 - 5:30am
AbstractMotivationMicrobiome-metabolome association studies have experienced exponential growth for an in-depth understanding of the impact of microbiota on human health over the last decade. However, analyzing the resulting multi-omics data and their correlations remains a significant challenge due to the lack of a comprehensive computational tool that can facilitate data integration and interpretation. In this study, an automated microbiome and metabolome integrative analysis pipeline (M2IA) has been developed to meet the urgent needs for tools that can effectively integrate microbiome and metabolome data to derive biological insights.ResultsM2IA streamlines the integrative data analysis between metabolome and microbiome, from data preprocessing, univariate and multivariate statistical analyses, advanced functional analysis for biological interpretation, to a summary report. The functionality of M2IA was demonstrated using TwinsUK cohort datasets consisting of 1116 fecal metabolites and 16s rRNA microbiome from 786 individuals. Moreover, two important metabolic pathways, i.e., benzoate degradation and phosphotransferase system, were identified to be closely associated with obesity.AvailabilityM2IA is public available at http://m2ia.met-bioinformatics.cnSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Projected t-SNE for batch correction

Bioinformatics Oxford Journals - Mon, 16/03/2020 - 5:30am
AbstractMotivationLow-dimensional representations of high-dimensional data are routinely employed in biomedical research to visualize, interpret and communicate results from different pipelines. In this article, we propose a novel procedure to directly estimate t-SNE embeddings that are not driven by batch effects. Without correction, interesting structure in the data can be obscured by batch effects. The proposed algorithm can therefore significantly aid visualization of high-dimensional data.ResultsThe proposed methods are based on linear algebra and constrained optimization, leading to efficient algorithms and fast computation in many high-dimensional settings. Results on artificial single-cell transcription profiling data show that the proposed procedure successfully removes multiple batch effects from t-SNE embeddings, while retaining fundamental information on cell types. When applied to single-cell gene expression data to investigate mouse medulloblastoma, the proposed method successfully removes batches related with mice identifiers and the date of the experiment, while preserving clusters of oligodendrocytes, astrocytes, and endothelial cells and microglia, which are expected to lie in the stroma within or adjacent to the tumors.AvailabilitySource code implementing the proposed approach is available as an R package at https://github.com/emanuelealiverti/BC_tSNE, including a tutorial to reproduce the simulation studies.Supplementary informationSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

DeCoDe: degenerate codon design for complete protein-coding DNA libraries

Bioinformatics Oxford Journals - Mon, 16/03/2020 - 5:30am
AbstractMotivationHigh-throughput protein screening is a critical technique for dissecting and designing protein function. Libraries for these assays can be created through a number of means, including targeted or random mutagenesis of a template protein sequence or direct DNA synthesis. However, mutagenic library construction methods often yield vastly more non-functional than functional variants and, despite advances in large-scale DNA synthesis, individual synthesis of each desired DNA template is often prohibitively expensive. Consequently, many protein screening libraries rely on the use of degenerate codons (DCs), mixtures of DNA bases incorporated at specific positions during DNA synthesis, to generate highly diverse protein variant pools from only a few low-cost synthesis reactions. However, selecting DCs for sets of sequences that covary at multiple positions dramatically increases the difficulty of designing a DC library and leads to the creation of many undesired variants that can quickly outstrip screening capacity.ResultsWe introduce a novel algorithm for total DC library optimization, DeCoDe, based on integer linear programming. DeCoDe significantly outperforms state-of-the-art DC optimization algorithms and scales well to more than a hundred proteins sharing complex patterns of covariation (e.g., the lab-derived avGFP lineage). Moreover, DeCoDe is, to our knowledge, the first DC design algorithm with the capability to encode mixed-length protein libraries. We anticipate DeCoDe to be broadly useful for a variety of library generation problems, ranging from protein engineering attempts that leverage mutual information to the reconstruction of ancestral protein states.Availabilitygithub.com/OrensteinLab/DeCoDeSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Exploring High-Dimensional Biological Data with Sparse Contrastive Principal Component Analysis

Bioinformatics Oxford Journals - Mon, 16/03/2020 - 5:30am
AbstractMotivationStatistical analyses of high-throughput sequencing data have re-shaped the biological sciences. In spite of myriad advances, recovering interpretable biological signal from data corrupted by technical noise remains a prevalent open problem. Several classes of procedures, among them classical dimensionality reduction techniques and others incorporating subject-matter knowledge, have provided effective advances; however, no procedure currently satisfies the dual objectives of recovering stable and relevant features simultaneously.ResultsInspired by recent proposals for making use of control data in the removal of unwanted variation, we propose a variant of principal component analysis, sparse contrastive principal component analysis, that extracts sparse, stable, interpretable, and relevant biological signal. The new methodology is compared to competing dimensionality reduction approaches through a simulation study as well as via analyses of several publicly available protein expression, microarray gene expression, and single-cell transcriptome sequencing datasets.AvailabilityA free and open-source software implementation of the methodology, the scPCA R package, is made available via the Bioconductor Project. Code for all analyses presented in the paper is also available via GitHub.
Categories: Bioinformatics Trends

Interpretable factor models of single-cell RNA-seq via variational autoencoders

Bioinformatics Oxford Journals - Mon, 16/03/2020 - 5:30am
AbstractMotivationSingle-cell RNA-seq makes possible the investigation of variability in gene expression among cells, and dependence of variation on cell type. Statistical inference methods for such analyses must be scalable, and ideally interpretableResultsWe present an approach based on a modification of a recently published highly scalable variational autoencoder framework that provides interpretability without sacrificing much accuracy. We demonstrate that our approach enables identification of gene programs in massive datasets. Our strategy, namely the learning of factor models with the auto-encoding variational Bayes framework, is not domain specific and may be useful for other applications.AvailabilityThe factor model is available in the scVI package hosted on https://github.com/YosefLab/scVI/.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

How to Get Your Goat: Automated Identification of Species from MALDI-ToF Spectra

Bioinformatics Oxford Journals - Mon, 16/03/2020 - 5:30am
AbstractMotivationClassification of archaeological animal samples is commonly achieved via manual examination of MALDI-ToF spectra. This is a time-consuming process which requires significant training and which does not produce a measure of confidence in the classification. We present a new, automated method for arriving at a classification of a MALDI-ToF sample, provided the collagen sequences for each candidate species are available. The approach derives a set of peptide masses from the sequence data for comparison with the sample data, which is carried out by cross-correlation. A novel way of combining evidence from multiple marker peptides is used to interpret the raw alignments and arrive at a classification with an associated confidence measure.ResultsTo illustrate the efficacy of the approach, we tested the new method with a previously published classification of parchment folia from a copy of the Gospel of Luke, produced around 1120 C.E. by scribes at St. Augustine’s Abbey in Canterbury, U.K. 80 of the 81 samples were given identical classifications by both methods. In addition the new method gives a quantifiable level of confidence in each classification.AvailabilityThe software can be found at https://github.com/bioarch-sjh/bacollite, and can be installed in R using devtools.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Ultraplexing: increasing the efficiency of long-read sequencing for hybrid assembly with k-mer-based multiplexing

Genome Biology - BiomedCentral - Sat, 14/03/2020 - 5:30am
Hybrid genome assembly has emerged as an important technique in bacterial genomics, but cost and labor requirements limit large-scale application. We present Ultraplexing, a method to improve per-sample sequen...
Categories: Bioinformatics Trends

GlycoGlyph: A glycan visualizing, drawing and naming application

Bioinformatics Oxford Journals - Sat, 14/03/2020 - 5:30am
AbstractMotivationGlycan structures are commonly represented using symbols or linear nomenclature such as that from the Consortium for Functional Glycomics (CFG) (also known as modified IUPAC condensed nomenclature). No current tool allows for writing the name in such format using a graphical user interface (GUI); thus, names are prone to errors or non-standardized representations.ResultsHere we present GlycoGlyph, a web application built using JavaScript, which is capable of drawing glycan structures using a GUI and providing the linear nomenclature as an output or using it as an input in a dynamic manner. GlycoGlyph also allows users to save the structures as an SVG vector graphic, and allows users to export the structure as condensed GlycoCT.AvailabilityThe application can be used at: https://glycotoolkit.com/Tools/GlycoGlyph/. The application is tested to work in modern web browsers such as Firefox or Chrome.Supplementary informationCode, and instructions along with tutorials are available at https://github.com/akulmehta/GlycoGlyphPublic/
Categories: Bioinformatics Trends

Ultraplexing: increasing the efficiency of long-read sequencing for hybrid assembly with k-mer-based multiplexing

Genome Biology - Sat, 14/03/2020 - 5:30am
Hybrid genome assembly has emerged as an important technique in bacterial genomics, but cost and labor requirements limit large-scale application. We present Ultraplexing, a method to improve per-sample sequen...
Categories: Bioinformatics Trends

Joining Illumina paired-end reads for classifying phylogenetic marker sequences

BMC Bioinformatics - Sat, 14/03/2020 - 5:30am
Illumina sequencing of a marker gene is popular in metagenomic studies. However, Illumina paired-end (PE) reads sometimes cannot be merged into single reads for subsequent analysis. When mergeable PE reads are...
Categories: Bioinformatics Trends

Machine learning prediction of oncology drug targets based on protein and network properties

BMC Bioinformatics - Sat, 14/03/2020 - 5:30am
The selection and prioritization of drug targets is a central problem in drug discovery. Computational approaches can leverage the growing number of large-scale human genomics and proteomics data to make in-si...
Categories: Bioinformatics Trends

MoAIMS: efficient software for detection of enriched regions of MeRIP-Seq

BMC Bioinformatics - Sat, 14/03/2020 - 5:30am
Methylated RNA immunoprecipitation sequencing (MeRIP-Seq) is a popular sequencing method for studying RNA modifications and, in particular, for N6-methyladenosine (m6A), the most abundant RNA methylation modif...
Categories: Bioinformatics Trends

tugHall: a simulator of cancer-cell evolution based on the hallmarks of cancer and tumor-related genes

Bioinformatics Oxford Journals - Sat, 14/03/2020 - 5:30am
AbstractSummaryThe flood of recent cancer genomic data requires a coherent model that can sort out the findings to systematically explain clonal evolution and the resultant intra-tumor heterogeneity (ITH). Here, we present a new mathematical model designed to computationally simulate the evolution of cancer cells. The model connects the well-known hallmarks of cancer with the specific mutational states of tumor-related genes. The cell behavior phenotypes are stochastically determined and the hallmarks probabilistically interfere with the phenotypic probabilities. In turn, the hallmark variables depend on the mutational states of tumor-related genes. Thus, our software can deepen our understanding of cancer-cell evolution and generation of ITH.Availability and implementationThe open-source code is available in the repository https://github.com/nagornovys/Cancer_cell_evolution.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

VISAR: an interactive tool for dissecting chemical features learned by deep neural network QSAR models

Bioinformatics Oxford Journals - Sat, 14/03/2020 - 5:30am
AbstractWhile many quantitative structure-activity relationship (QSAR) models are trained and evaluated for their predictive merits, understanding what models have been learning is of critical importance. However, the interpretation and visualization of QSAR model results remain challenging, especially for ‘black box’ models such as deep neural network (DNN). Here we take a step forward to interpret the learned chemical features from DNN QSAR models, and present VISAR, an interactive tool for visualizing the structure-activity relationship (SAR). VISAR firstly provides functions to construct and train DNN models. Then VISAR builds the activity landscapes based on a series of compounds using the trained model, showing the correlation between the chemical feature space and the experimental activity space after model training, and allowing for knowledge mining from a global perspective. VISAR also maps the gradients of the chemical features to the corresponding compounds as contribution weights for each atom, and visualizes the positive and negative contributor substructures suggested by the models from a local perspective. Using the web application of VISAR, users could interactively explore the activity landscape and the color-coded atom contributions. We propose that VISAR could serve as a helpful tool for training and interactive analysis of the DNN QSAR model, providing insights for drug design, and an additional level of model validation.Availability and ImplementationThe source code and usage instructions for VISAR are available on github https://github.com/Svvord/visar.Supplementary InformationSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

FAME: Fast And Memory Efficient multiple sequences alignment tool through compatible chain of roots

Bioinformatics Oxford Journals - Sat, 14/03/2020 - 5:30am
AbstractMotivationMultiple sequence alignment (MSA) is important and challenging problem of computational biology. Most of the existing methods can only provide a short length multiple alignments in an acceptable time. Nevertheless, when the researchers confront the genome size in the multiple alignments, the process has required a huge processing space/time. Accordingly, using the method that can align genome size rapidly and precisely has a great effect, especially on the analysis of the very long alignments. Herein, we have proposed an efficient method, called FAME, which vertically divides sequences from the places that they have common areas; then they are arranged in consecutive order. Then these common areas are shifted and placed under each other, and the subsequences between them are aligned using any existing MSA tool.ResultsThe results demonstrate that the combination of FAME and the MSA methods and deploying minimizer are capable to be executed on personal computer and finely align long length sequences with much higher sum-of-pair (SP) score compared to the standalone MSA tools. As we select genomic datasets with longer length, the SP score of the combinatorial methods is gradually improved. The calculated computational complexity of methods supports the results in a way that combining FAME and the MSA tools leads to at least four times faster execution on the datasets.AvailabilityThe source code and all datasets and run-parameters are accessible free on http://github.com/naznoosh/msa.
Categories: Bioinformatics Trends

κ-helix and the helical lock and key model: A pivotal way of looking at polyproline II

Bioinformatics Oxford Journals - Sat, 14/03/2020 - 5:30am
AbstractMotivationPolyproline II (PPII) is a common conformation, comparable to α-helix and β-sheet and is a candidate for being the most prevalent secondary structure. PPII, recently termed with a more generic name – κ-helix, adopts a left-handed structure with 3-fold rotational symmetry. Lately, a new type of binding mechanism – the helical lock and key model was introduced in SH3-domain complexes, where the interaction is characterized by a sliding helical pattern. However, whether this binding mechanism is unique only to SH3 domains is unreported.ResultsHere, we show that the helical binding pattern is a universal feature of the κ-helix conformation, present within all the major target families - SH3, WW, profilin, MHC-II, EVH1, and GYF domains. Based on a geometric analysis of 255 experimentally solved structures, we found that they are characterized by a distinctive rotational angle along the helical axis. Furthermore, we found that the range of helical pitch varies between different protein domains or peptide orientations and that the interaction is also represented by a rotational displacement mimicking helical motion. The discovery of rotational interactions as a mechanism, reveals a new dimension in the realm of protein-protein interactions, which introduces a new layer of information encoded by the helical conformation. Due to the extensive involvement of the conformation in functional interactions, we anticipate our model to expand the current molecular understanding of the relationship between protein structure and function.AvailabilityWe have implemented the proposed methods in an R package freely available at https://github.com/Grantlab/bio3dSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

primirTSS: An R package for identifying cell-specific microRNA transcription start sites

Bioinformatics Oxford Journals - Sat, 14/03/2020 - 5:30am
AbstractSummaryThe R/Bioconductor package primirTSS is a fast and convenient tool that allows implementation of the analytical method to identify transcription start sites of microRNAs by integrating ChIP-seq data of H3K4me3 and Pol II. It further ensures the precision by employing the conservation score and sequence features. The tool showed a good performance when using H3K4me3 or Pol II Chip-seq data alone as input, which brings convenience to applications where multiple data sets are hard to acquire. This flexible package is provided with both R-programming interfaces as well as graphical web interfaces.AvailabilityprimirTSS is available at: http://bioconductor.org/packages/primirTSS The documentation of the package including an accompanying tutorial was deposited at: https://bioconductor.org/packages/release/bioc/vignettes/primirTSS/inst/doc/primirTSS.htmlSupplementary informationSupplementary DataSupplementary Data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Pages

Subscribe to Centre for Bioinformatics aggregator - Bioinformatics Trends

Calendar

Mon
Tue
Wed
Thu
Fri
Sat
Sun
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
 
March 2020