Jump to Navigation

Robust partial reference-free cell composition estimation from tissue expression

Bioinformatics Oxford Journals - Fri, 13/03/2020 - 5:30am
AbstractMotivationIn the analysis of high throughput omics data from tissue samples, estimating and accounting for cell composition have been recognized as important steps. High cost, intensive labor requirements and technical limitations hinder the cell composition quantification using cell sorting or single-cell technologies. Computational methods for cell composition estimation are available, but they are either limited by the availability of a reference panel or suffer from low accuracy.ResultsWe introduce TOAST/-P and TOAST/+P, two partial reference-free algorithms for estimating cell composition of heterogeneous tissues based on their gene expression profiles. TOAST/-P and TOAST/+P incorporate additional biological information, including cell type specific markers and prior knowledge of compositions, in the estimation procedure. Extensive simulation studies and real data analyses demonstrate that the proposed methods provide more accurate and robust cell composition estimation than existing methods.AvailabilityThe proposed methods TOAST/-P and TOAST/+P are implemented as part of the R/Bioconductor package TOAST at https://bioconductor.org/packages/TOAST.Supplementary InformationSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

COMER2: GPU-accelerated sensitive and specific homology searches

Bioinformatics Oxford Journals - Fri, 13/03/2020 - 5:30am
AbstractSummarySearching for homology in the vast amount of sequence data has a particular emphasis on its speed. We present a completely rewritten version of the sensitive homology search method COMER based on alignment of protein sequence profiles, which is capable of searching big databases even on a lightweight laptop. By harnessing the power of CUDA-enabled GPUs, it is up to 20 times faster than HHsearch, a state-of-the-art method using vectorized instructions on modern CPUs.Availability and implementationCOMER2 is cross-platform open-source software available at https://sourceforge.net/projects/comer2 and https://github.com/minmarg/comer2. It can be easily installed from source code or using stand-alone installers.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

A big data approach to metagenomics for all-food-sequencing

BMC Bioinformatics - Thu, 12/03/2020 - 5:30am
All-Food-Sequencing (AFS) is an untargeted metagenomic sequencing method that allows for the detection and quantification of food ingredients including animals, plants, and microbiota. While this approach avoi...
Categories: Bioinformatics Trends

Network hub-node prioritization of gene regulation with intra-network association

BMC Bioinformatics - Thu, 12/03/2020 - 5:30am
To identify and prioritize the influential hub genes in a gene-set or biological pathway, most analyses rely on calculation of marginal effects or tests of statistical significance. These procedures may be ina...
Categories: Bioinformatics Trends

m7GHub: deciphering the location, regulation and pathogenesis of internal mRNA N7-methylguanosine (m7G) sites in human

Bioinformatics Oxford Journals - Thu, 12/03/2020 - 5:30am
AbstractMotivationRecent progress in m7G RNA methylation studies has focused on its internal (rather than capped) presence within mRNAs. Tens of thousands of internal mRNA m7G sites have been identified within mammalian transcriptomes, and a single resource to best share, annotate and analyze the massive m7G data generated recently is sorely needed.ResultsWe report here m7GHub, a comprehensive online platform for deciphering the location, regulation and pathogenesis of internal mRNA N7-methylguanosine. The m7GHub consists of four main components, including: the first internal mRNA m7G database containing 44,058 experimentally-validated internal mRNA m7G sites, a sequence-based high-accuracy predictor, the first web server for assessing the impact of mutations on m7G status, and the first database recording 1,218 disease-associated genetic mutations that may function through regulation of m7G methylation. Together, m7GHub will serve as a useful resource for research on internal mRNA m7G modification.Availabilitym7GHub is freely accessible online at: www.xjtlu.edu.cn/biologicalsciences/m7ghub.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

IgGeneUsage: differential gene usage in immune repertoires

Bioinformatics Oxford Journals - Thu, 12/03/2020 - 5:30am
AbstractSummaryDecoding the properties of immune repertoires is key to understanding the adaptive immune response to challenges such as viral infection. One important quantitative property is differential usage of Ig genes between biological conditions. Yet, most analyses for differential Ig gene usage are performed qualitatively or with inadequate statistical methods. Here we introduce IgGeneUsage, a computational tool for the analysis of differential Ig gene usage. IgGeneUsage employs Bayesian inference with hierarchical models to analyze complex gene usage data from high-throughput sequencing experiments of immune repertoires. It quantifies differential Ig gene usage probabilistically and avoids some common problems related to the current practice of null-hypothesis significance testing.Availability and ImplementationIgGeneUsage is an R-package freely available as part of Bioconductor at: https://bioconductor.org/packages/IgGeneUsage/.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

forgeNet: a graph deep neural network model using tree-based ensemble classifiers for feature graph construction

Bioinformatics Oxford Journals - Thu, 12/03/2020 - 5:30am
AbstractMotivationA unique challenge in predictive model building for omics data has been the small number of samples (n) versus the large amount of features (p). This “n ≪ p” property brings difficulties for disease outcome classification using deep learning techniques. Sparse learning by incorporating known functional relations between the biological units, such as the graph-embedded deep feedforward network (GEDFN) model, has been a solution to this issue. However, such methods require an existing feature graph, and potential mis-specification of the feature graph can be harmful on classification and feature selection.ResultsTo address this limitation and develop a robust classification model without relying on external knowledge, we propose a forest graph-embedded deep feedforward network (forgeNet) model, to integrate the GEDFN architecture with a forest feature graph extractor, so that the feature graph can be learned in a supervised manner and specifically constructed for a given prediction task. To validate the method’s capability, we experimented the forgeNet model with both synthetic and real datasets. The resulting high classification accuracy suggests that the method is a valuable addition to sparse deep learning models for omics data.AvailabilityThe method is available at https://github.com/yunchuankong/forgeNet.
Categories: Bioinformatics Trends

The Glycine Receptor Allosteric Ligands Library (GRALL)

Bioinformatics Oxford Journals - Thu, 12/03/2020 - 5:30am
AbstractMotivationGlycine receptors (GlyR) mediate fast inhibitory neurotransmission in the brain and have been recognized as key pharmacological targets for pain. A large number of chemically diverse compounds that are able to modulate GlyR function both positively and negatively have been reported, which provides useful information for the development of pharmacological strategies and models for the allosteric modulation of these ion channels.ResultsBased on existing literature, we have collected 218 unique chemical entities with documented modulatory activities at homomeric GlyR-α1 and -α3 and built a database named GRALL. This collection includes agonists, antagonists, positive and negative allosteric modulators, and a number of experimentally inactive compounds. Most importantly, for a large fraction of them a structural annotation based on their putative binding site on the receptor is provided. This type of annotation, which is currently missing in other drug banks, along with the availability of cooperativity factors from radioligand displacement experiments are expected to improve the predictivity of in silico methodologies for allosteric drug discovery and boost the development of conformation-based pharmacological approaches.AvailabilityThe GRALL library is distributed as a web-accessible database at the following link: https://ifm.chimie.unistra.fr/grall. For each molecular entry, it provides information on the chemical structure, the ligand-binding site, the direction of modulation, the potency, the 3D molecular structure and quantum mechanical charges as determined by our in house pipeline.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

A multitask multiple kernel learning formulation for discriminating early- and late-stage cancers

Bioinformatics Oxford Journals - Thu, 12/03/2020 - 5:30am
AbstractMotivationGenomic information is increasingly being used in diagnosis, prognosis and treatment of cancer. The severity of the disease is usually measured by the tumor stage. Therefore, identifying pathways playing an important role in progression of the disease stage is of great interest. Given that there are similarities in the underlying mechanisms of different cancers, in addition to the considerable correlation in the genomic data, there is a need for machine learning methods that can take these aspects of genomic data into account. Furthermore, using machine learning for studying multiple cancer cohorts together with a collection of molecular pathways creates an opportunity for knowledge extraction.ResultsWe studied the problem of discriminating early- and late-stage tumors of several cancers using genomic information while enforcing interpretability on the solutions. To this end, we developed a multitask multiple kernel learning (MTMKL) method with a co-clustering step based on a cutting-plane algorithm to identify the relationships between the input tasks and kernels. We tested our algorithm on 15 cancer cohorts and observed that, in most cases, MTMKL outperforms other algorithms (including random forests, support vector machine and single-task multiple kernel learning) in terms of predictive power. Using the aggregate results from multiple replications, we also derived similarity matrices between cancer cohorts, which are, in many cases, in agreement with available relationships reported in the relevant literature.AvailabilityOur implementations of support vector machine and multiple kernel learning algorithms in R are available at https://github.com/arezourahimi/mtgsbc together with the scripts that replicate the reported experiments.
Categories: Bioinformatics Trends

Sarcopenia negatively affects hip structure analysis variables in a group of Lebanese postmenopausal women

BMC Bioinformatics - Wed, 11/03/2020 - 5:30am
The current study’s purpose is to compare hip structural analysis variables in a group of postmenopausal women with sarcopenia and another group of postmenopausal women with normal skeletal muscle mass index. ...
Categories: Bioinformatics Trends

GSP4PDB: a web tool to visualize, search and explore protein-ligand structural patterns

BMC Bioinformatics - Wed, 11/03/2020 - 5:30am
In the field of protein engineering and biotechnology, the discovery and characterization of structural patterns is highly relevant as these patterns can give fundamental insights into protein-ligand interacti...
Categories: Bioinformatics Trends

Accurately estimating the length distributions of genomic micro-satellites by tumor purity deconvolution

BMC Bioinformatics - Wed, 11/03/2020 - 5:30am
Genomic micro-satellites are the genomic regions that consist of short and repetitive DNA motifs. Estimating the length distribution and state of a micro-satellite region is an important computational step in ...
Categories: Bioinformatics Trends

CHOP: haplotype-aware path indexing in population graphs

Genome Biology - BiomedCentral - Wed, 11/03/2020 - 5:30am
The practical use of graph-based reference genomes depends on the ability to align reads to them. Performing substring queries to paths through these graphs lies at the core of this task. The combination of in...
Categories: Bioinformatics Trends

Smarcad1 mediates microbiota-induced inflammation in mouse and coordinates gene expression in the intestinal epithelium

Genome Biology - BiomedCentral - Wed, 11/03/2020 - 5:30am
How intestinal epithelial cells interact with the microbiota and how this is regulated at the gene expression level are critical questions. Smarcad1 is a conserved chromatin remodeling factor with a poorly und...
Categories: Bioinformatics Trends

CHOP: haplotype-aware path indexing in population graphs

Genome Biology - Wed, 11/03/2020 - 5:30am
The practical use of graph-based reference genomes depends on the ability to align reads to them. Performing substring queries to paths through these graphs lies at the core of this task. The combination of in...
Categories: Bioinformatics Trends

Smarcad1 mediates microbiota-induced inflammation in mouse and coordinates gene expression in the intestinal epithelium

Genome Biology - Wed, 11/03/2020 - 5:30am
How intestinal epithelial cells interact with the microbiota and how this is regulated at the gene expression level are critical questions. Smarcad1 is a conserved chromatin remodeling factor with a poorly und...
Categories: Bioinformatics Trends

RobustClone: A robust PCA method for tumor clone and evolution inference from single-cell sequencing data

Bioinformatics Oxford Journals - Wed, 11/03/2020 - 5:30am
AbstractMotivationSingle-cell sequencing (SCS) data provide unprecedented insights into intratumoral heterogeneity. With SCS, we can better characterize clonal genotypes and reconstruct phylogenetic relationships of tumor cells/clones. However, SCS data are often error-prone, making their computational analysis challenging.ResultsTo infer the clonal evolution in tumor from the error-prone SCS data, we developed an efficient computational framework, termed RobustClone. It recovers the true genotypes of subclones based on the low-rank matrix decomposition method with extended robust principal component analysis (RPCA), and reconstructs the subclonal evolutionary tree. RobustClone is a model-free method, which can be applied to both scSNV and scCNV data. It is efficient and scalable to large-scale datasets. We conducted a set of systematic evaluations on simulated datasets and demonstrated that RobustClone outperforms state-of-the-art methods in large-scale data both in accuracy and efficiency. We further validated RobustClone on 2 single-cell SNV and 2 single-cell CNV datasets and demonstrated that RobustClone could recover genotype matrix and infer the subclonal evolution tree accurately under various scenarios. In particular, RobustClone revealed the spatial progression patterns of subclonal evolution on the large-scale 10X Genomics scCNV breast cancer dataset.AvailabilityRobustClone software is available at https://github.com/ucasdp/RobustClone.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Ktrim: an extra-fast and accurate adapter- and quality-trimmer for sequencing data

Bioinformatics Oxford Journals - Wed, 11/03/2020 - 5:30am
AbstractMotivationNext-generation sequencing (NGS) data frequently suffer from poor-quality cycles and adapter contaminations therefore need to be preprocessed before downstream analyses. With the ever-growing throughput and read length of modern sequencers, the preprocessing step turns to be a bottleneck in data analysis due to unmet performance of current tools. Extra-fast and accurate adapter- and quality-trimming tools for sequencing data preprocessing are therefore still of urgent demand.ResultsKtrim was developed in this work. Key features of Ktrim include: built-in support to adapters of common library preparation kits; supports user-supplied, customized adapter sequences; supports both paired-end and single-end data; supports parallelization to accelerate the analysis. Ktrim was ∼2-18 times faster than current tools and also showed high accuracy when applied on the testing datasets. Ktrim could thus serve as a valuable and efficient tool for short-read NGS data preprocessing.AvailabilitySource codes and scripts to reproduce the results descripted in this paper are freely available at https://github.com/hellosunking/Ktrim/, distributed under the GPL v3 license.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

iATC-FRAKEL: A simple multi-label web-server for recognizing anatomical therapeutic chemical classes of drugs with their fingerprints only

Bioinformatics Oxford Journals - Tue, 10/03/2020 - 5:30am
AbstractMotivationAnatomical therapeutic chemical (ATC) classification system is very important for drug utilization and studies. Correct prediction of the 14 classes in the first level for given drugs is an essential problem for the study on such system. Several multi-label classifiers have been proposed in this regard. However, only two of them provided the web-servers and their performance was not very high. On the other hand, although some rest classifiers can provide better performance, they were built based on some prior knowledge on drugs, such as information of chemical-chemical interaction and chemical ontology, leading to limited applications. Furthermore, provided codes of these classifiers are almost inaccessible for pharmacologists.ResultsIn this study, we built a simple web-server, namely iATC-FRAKEL. This web-server only required the SMILES format of drugs as input and extracted their fingerprints for making prediction. The performance of the iATC-FRAKEL was much higher than all existing web-servers and was comparable to the best multi-label classifier but had much wider applications. Such web server can be visited at http://cie.shmtu.edu.cn/iatc/index.AvailabilityThe web-server is available at http://cie.shmtu.edu.cn/iatc/index.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

SphereCon - A method for precise estimation of residue relative solvent accessible area from limited structural information

Bioinformatics Oxford Journals - Tue, 10/03/2020 - 5:30am
AbstractMotivationIn proteins, solvent accessibility of individual residues is a factor contributing to their importance for protein function and stability. Hence one might wish to calculate solvent accessibility in order to predict the impact of mutations, their pathogenicity, and for other biomedical applications. A direct computation of solvent accessibility is only possible if all atoms of a protein three-dimensional structure are reliably resolved.ResultsWe present SphereCon, a new precise measure that can estimate residue relative solvent accessibility (RSA) from limited data. The measure is based on calculating the volume of intersection of a sphere with a cone cut out in the direction opposite of the residue with surrounding atoms. We propose a method for estimating the position and volume of residue atoms in cases when they are not known from the structure, or when the structural data are unreliable or missing. We show that in cases of reliable input structures, SphereCon correlates almost perfectly with the directly computed RSA, and outperforms other previously suggested indirect methods. Moreover, SphereCon is the only measure that yield accurate results when the identities of amino acids are unknown. A significant novel feature of SphereCon is that it can estimate RSA from inter-residue distance and contact matrices, without any information about the actual atom coordinates.Availabilityhttps://github.com/kalininalab/sphereconSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Pages

Subscribe to Centre for Bioinformatics aggregator - Bioinformatics Trends

Calendar

Mon
Tue
Wed
Thu
Fri
Sat
Sun
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
 
April 2020