Jump to Navigation

High-resolution Repli-Seq defines the temporal choreography of initiation, elongation and termination of replication in mammalian cells

Genome Biology - Tue, 24/03/2020 - 5:30am
DNA replication in mammalian cells occurs in a defined temporal order during S phase, known as the replication timing (RT) programme. Replication timing is developmentally regulated and correlated with chromat...
Categories: Bioinformatics Trends

HiChIP-Peaks: A HiChIP peak calling algorithm

Bioinformatics Oxford Journals - Tue, 24/03/2020 - 5:30am
AbstractMotivationHiChIP is a powerful tool to interrogate 3D chromatin organization. Current tools to analyse chromatin looping mechanisms using HiChIP data require the identification of loop anchors to work properly. However, current approaches to discover these anchors from HiChIP data are not satisfactory, having either a very high false discovery rate or strong dependence on sequencing depth. Moreover, these tools do not allow quantitative comparison of peaks across different samples, failing to fully exploit the information available from HiChIP datasets.ResultsWe develop a new tool based on a representation of HiChIP data centred on the re-ligation sites to identify peaks from HiChIP datasets, which can subsequently be used in other tools for loop discovery. This increases the reliability of these tools and improves recall rate as sequencing depth is reduced. We also provide a method to count reads mapping to peaks across samples, which can be used for differential peak analysis using HiChIP data.AvailabilityHiChIP-Peaks is freely available at https://github.com/ChenfuShi/HiChIP_peaksSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Cancer subtype classification and modeling by pathway attention and propagation

Bioinformatics Oxford Journals - Tue, 24/03/2020 - 5:30am
AbstractMotivationBiological pathway is important curated knowledge of biological processes. Thus, cancer subtype classification based on pathways will be very useful to understand differences in biological mechanisms among cancer subtypes. However, pathways include only a fraction of the entire gene set, only 1/3 of human genes in KEGG, and pathways are fragmented. For this reason, there are few computational methods to use pathways for cancer subtype classification.ResultsWe present an explainable deep learning model with attention mechanism and network propagation for cancer subtype classification. Each pathway is modeled by a graph convolutional network. then, a multi-attention based ensemble model combines several hundreds of pathways in an explainable manner. Lastly, network propagation on pathway-gene network explains why gene expression profiles in subtypes are different. In experiments with five TCGA cancer data sets, our method achieved very good classification accuracies and, additionally, identified subtype-specific pathways and biological functions.Supplementary informationSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Brewery: Deep Learning and deeper profiles for the prediction of 1D protein structure annotations

Bioinformatics Oxford Journals - Tue, 24/03/2020 - 5:30am
AbstractMotivationProtein Structural Annotations are essential abstractions to deal with the prediction of Protein Structures. Many increasingly sophisticated Protein Structural Annotations have been devised in the last few decades. However the need for annotations that are easy to compute, process and predict has not diminished. This is especially true for protein structures that are hardest to predict such as novel folds.ResultsWe propose Brewery, a suite of ab initio predictors of 1D Protein Structural Annotations. Brewery uses multiple sources of evolutionary information to achieve state-of-the-art predictions of Secondary Structure, Structural Motifs, Relative Solvent Accessibility and Contact Density.AvailabilityThe web server, standalone program, Docker image and training sets of Brewery are available at http://distilldeep.ucd.ie/brewery/.
Categories: Bioinformatics Trends

Annotation of tandem mass spectrometry data using stochastic neural networks in shotgun proteomics

Bioinformatics Oxford Journals - Tue, 24/03/2020 - 5:30am
AbstractMotivationThe discrimination ability of score functions to separate correct from incorrect peptide-spectrum matches in database-searching-based spectrum identification are hindered by many superfluous peaks belonging to unexpected fragmentation ions or by the lacking peaks of anticipated fragmentation ions.ResultsHere, we present a new method, called BoltzMatch, to learn score functions using a particular stochastic neural networks, called restricted Boltzmann machines, in order to enhance their discrimination ability. BoltzMatch learns chemically explainable patterns among peak pairs in the spectrum data, and it can augment peaks depending on their semantic context or even reconstruct lacking peaks of expected ions during its internal scoring mechanism. As a result, BoltzMatch achieved 50% and 33% more annotations on high- and low-resolution MS2 data than XCorr at a 0.1% false discovery rate in our benchmark; conversely, XCorr yielded the same number of spectrum annotations as BoltzMatch, albeit with 4-6 times more errors. In addition, BoltzMatch alone does yield 14% more annotations than Prosit (which runs with Percolator), and BoltzMatch with Percolator yields 32% more annotations than Prosit at 0.1% FDR level in our benchmark.AvailabilityBoltzMatch is freely available at: https://github.com/kfattila/BoltzMatchSupporting informationSupplementary materials are available at Bioinformatics Online.
Categories: Bioinformatics Trends

Automatic identification of relevant genes from low-dimensional embeddings of single cell RNAseq data

Bioinformatics Oxford Journals - Tue, 24/03/2020 - 5:30am
AbstractDimensionality reduction is a key step in the analysis of single-cell RNA sequencing data. It produces a low-dimensional embedding for visualization and as a calculation base for downstream analysis. Nonlinear techniques are most suitable to handle the intrinsic complexity of large, heterogeneous single cell data. However, with no linear relation between gene and embedding coordinate, there is no way to extract the identity of genes driving any cell’s position in the low-dimensional embedding, making it more difficult to characterize the underlying biological processes.In this paper, we introduce the concepts of local and global gene relevance to compute an equivalent of principal component analysis loadings for non-linear low-dimensional embeddings. Global gene relevance identifies drivers of the overall embedding, while local gene relevance identifies those of a defined subregion. We apply our method to single-cell RNAseq datasets from different experimental protocols and to different low dimensional embedding techniques. This shows our method’s versatility to identify key genes for a variety of biological processes.To ensure reproducibility and ease of use, our method is released as part of destiny 3.0, a popular R package for building diffusion maps from single-cell transcriptomic data. It is readily available through Bioconductor.
Categories: Bioinformatics Trends

Resolving single-cell heterogeneity from hundreds of thousands of cells through sequential hybrid clustering and NMF

Bioinformatics Oxford Journals - Tue, 24/03/2020 - 5:30am
AbstractMotivationThe rapid proliferation of single-cell RNA-Sequencing (scRNA-Seq) technologies has spurred the development of diverse computational approaches to detect transcriptionally coherent populations. While the complexity of the algorithms for detecting heterogeneity has increased, most require significant user-tuning, are heavily reliant on dimension reduction techniques and are not scalable to ultra-large datasets. We previously described a multi-step algorithm, Iterative Clustering and Guide-gene selection (ICGS), which applies intra-gene correlation and hybrid clustering to uniquely resolve novel transcriptionally coherent cell populations from an intuitive graphical user interface.ResultsWe describe a new iteration of ICGS that outperforms state-of-the-art scRNA-Seq detection workflows when applied to well-established benchmarks. This approach combines multiple complementary subtype detection methods (HOPACH, sparse-NMF, cluster “fitness”, SVM) to resolve rare and common cell-states, while minimizing differences due to donor or batch effects. Using data from multiple cell atlases, we show that the PageRank algorithm effectively down-samples ultra-large scRNA-Seq datasets, without losing extremely rare or transcriptionally similar yet distinct cell-types and while recovering novel transcriptionally distinct cell populations. We believe this new approach holds tremendous promise in reproducibly resolving hidden cell populations in complex datasets.Availability and implementationICGS2 is implemented in Python. The source code and documentation are available at: http://altanalyze.org.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Tandem CTCF sites function as insulators to balance spatial chromatin contacts and topological enhancer-promoter selection

Genome Biology - BiomedCentral - Mon, 23/03/2020 - 5:30am
CTCF is a key insulator-binding protein, and mammalian genomes contain numerous CTCF sites, many of which are organized in tandem.
Categories: Bioinformatics Trends

Obstacles to detecting isoforms using full-length scRNA-seq data

Genome Biology - BiomedCentral - Mon, 23/03/2020 - 5:30am
Early single-cell RNA-seq (scRNA-seq) studies suggested that it was unusual to see more than one isoform being produced from a gene in a single cell, even when multiple isoforms were detected in matched bulk R...
Categories: Bioinformatics Trends

HiNT: a computational method for detecting copy number variations and translocations from Hi-C data

Genome Biology - BiomedCentral - Mon, 23/03/2020 - 5:30am
The three-dimensional conformation of a genome can be profiled using Hi-C, a technique that combines chromatin conformation capture with high-throughput sequencing. However, structural variations often yield f...
Categories: Bioinformatics Trends

Decode-seq: a practical approach to improve differential gene expression analysis

Genome Biology - BiomedCentral - Mon, 23/03/2020 - 5:30am
Many differential gene expression analyses are conducted with an inadequate number of biological replicates. We describe an easy and effective RNA-seq approach using molecular barcoding to enable profiling of ...
Categories: Bioinformatics Trends

Tandem CTCF sites function as insulators to balance spatial chromatin contacts and topological enhancer-promoter selection

Genome Biology - Mon, 23/03/2020 - 5:30am
CTCF is a key insulator-binding protein, and mammalian genomes contain numerous CTCF sites, many of which are organized in tandem.
Categories: Bioinformatics Trends

Obstacles to detecting isoforms using full-length scRNA-seq data

Genome Biology - Mon, 23/03/2020 - 5:30am
Early single-cell RNA-seq (scRNA-seq) studies suggested that it was unusual to see more than one isoform being produced from a gene in a single cell, even when multiple isoforms were detected in matched bulk R...
Categories: Bioinformatics Trends

HiNT: a computational method for detecting copy number variations and translocations from Hi-C data

Genome Biology - Mon, 23/03/2020 - 5:30am
The three-dimensional conformation of a genome can be profiled using Hi-C, a technique that combines chromatin conformation capture with high-throughput sequencing. However, structural variations often yield f...
Categories: Bioinformatics Trends

Decode-seq: a practical approach to improve differential gene expression analysis

Genome Biology - Mon, 23/03/2020 - 5:30am
Many differential gene expression analyses are conducted with an inadequate number of biological replicates. We describe an easy and effective RNA-seq approach using molecular barcoding to enable profiling of ...
Categories: Bioinformatics Trends

QuaDMutNetEx: a method for detecting cancer driver genes with low mutation frequency

BMC Bioinformatics - Mon, 23/03/2020 - 5:30am
Cancer is caused by genetic mutations, but not all somatic mutations in human DNA drive the emergence or growth of cancers. While many frequently-mutated cancer driver genes have already been identified and ar...
Categories: Bioinformatics Trends

Hellinger distance-based stable sparse feature selection for high-dimensional class-imbalanced data

BMC Bioinformatics - Mon, 23/03/2020 - 5:30am
Feature selection in class-imbalance learning has gained increasing attention in recent years due to the massive growth of high-dimensional class-imbalanced data across many scientific fields. In addition to r...
Categories: Bioinformatics Trends

Fast tree aggregation for consensus hierarchical clustering

BMC Bioinformatics - Fri, 20/03/2020 - 5:30am
In unsupervised learning and clustering, data integration from different sources and types is a difficult question discussed in several research areas. For instance in omics analysis, dozen of clustering metho...
Categories: Bioinformatics Trends

Standard machine learning approaches outperform deep representation learning on phenotype prediction from transcriptomics data

BMC Bioinformatics - Fri, 20/03/2020 - 5:30am
The ability to confidently predict health outcomes from gene expression would catalyze a revolution in molecular diagnostics. Yet, the goal of developing actionable, robust, and reproducible predictive signatu...
Categories: Bioinformatics Trends

Efficient identification of multiple pathways: RNA-Seq analysis of livers from 56Fe ion irradiated mice

BMC Bioinformatics - Fri, 20/03/2020 - 5:30am
mRNA interaction with other mRNAs and other signaling molecules determine different biological pathways and functions. Gene co-expression network analysis methods have been widely used to identify correlation ...
Categories: Bioinformatics Trends

Pages

Subscribe to Centre for Bioinformatics aggregator - Bioinformatics Trends

Calendar

Mon
Tue
Wed
Thu
Fri
Sat
Sun
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
 
March 2020