Jump to Navigation

DiNeR: a Differential graphical model for analysis of co-regulation Network Rewiring

BMC Bioinformatics - Thu, 02/07/2020 - 5:30am
During transcription, numerous transcription factors (TFs) bind to targets in a highly coordinated manner to control the gene expression. Alterations in groups of TF-binding profiles (i.e. “co-binding changes”...
Categories: Bioinformatics Trends

A cell-level quality control workflow for high-throughput image analysis

BMC Bioinformatics - Thu, 02/07/2020 - 5:30am
Image-based high throughput (HT) screening provides a rich source of information on dynamic cellular response to external perturbations. The large quantity of data generated necessitates computer-aided quality...
Categories: Bioinformatics Trends

USMPep: universal sequence models for major histocompatibility complex binding affinity prediction

BMC Bioinformatics - Thu, 02/07/2020 - 5:30am
Immunotherapy is a promising route towards personalized cancer treatment. A key algorithmic challenge in this process is to decide if a given peptide (neoepitope) binds with the major histocompatibility comple...
Categories: Bioinformatics Trends

Prediction of heart disease and classifiers’ sensitivity analysis

BMC Bioinformatics - Thu, 02/07/2020 - 5:30am
Heart disease (HD) is one of the most common diseases nowadays, and an early diagnosis of such a disease is a crucial task for many health care providers to prevent their patients for such a disease and to sav...
Categories: Bioinformatics Trends

Accounting for grouped predictor variables or pathways in high-dimensional penalized Cox regression models

BMC Bioinformatics - Thu, 02/07/2020 - 5:30am
The standard lasso penalty and its extensions are commonly used to develop a regularized regression model while selecting candidate predictor variables on a time-to-event outcome in high-dimensional data. Howe...
Categories: Bioinformatics Trends

RSSALib: A library for stochastic simulation of complex biochemical reactions

Bioinformatics Oxford Journals - Thu, 02/07/2020 - 5:30am
AbstractMotivationStochastic chemical kinetics is an essential mathematical framework for investigating the dynamics of biological processes, especially when stochasticity plays a vital role in their development. Simulation is often the only option for the analysis of many practical models due to their analytical intractability.ResultsWe present in this paper the simulation library RSSALib, implementing our recently developed rejection-based stochastic simulation algorithm (RSSA) and a wide range of its improvements, to accelerate the simulation and analysis of biochemical reactions. RSSALib supports reactions with complex kinetics and time delays, necessary to model complexities of reaction mechanisms. Our library provides both an application program interface (API) and a graphic user interface (GUI) to ease the set-up and visualization of the simulation results.AvailabilityRSSALib is freely available at: https://github.com/vo-hong-thanh/rssalib
Categories: Bioinformatics Trends

Rapid Epistatic Mixed Model Association Studies by Controlling Multiple Polygenic Effects

Bioinformatics Oxford Journals - Thu, 02/07/2020 - 5:30am
AbstractSummaryWe have developed a rapid mixed model algorithm for exhaustive genome-wide epistatic association analysis by controlling multiple polygenic effects. Our model can simultaneously handle additive by additive epistasis, dominance by dominance epistasis and additive by dominance epistasis, and account for intrasubject fluctuations due to individuals with repeated records. Furthermore, we suggest a simple but efficient approximate algorithm, which allows examination of all pairwise interactions in a remarkably fast manner of linear with population size. Simulation studies are performed to investigate the properties of REMMAX. Application to publicly available yeast and human data has showed that our mixed model-based method has similar performance with simple linear model on computational efficiency. It took less than 40 hours for the pairwise analysis of 5,000 individuals genotyped with roughly 350,000 SNPs with five threads on Intel Xeon E5 2.6GHz CPU.Availability and implementationSource codes are freely available at https://github.com/chaoning/GMAT.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Simulation, Power Evaluation, and Sample Size Recommendation for Single Cell RNA-seq

Bioinformatics Oxford Journals - Thu, 02/07/2020 - 5:30am
AbstractMotivationDetermining the sample size for adequate power to detect statistical significance is a crucial step at the design stage for high-throughput experiments. Even though a number of methods and tools are available for sample size calculation for microarray and RNA-seq in the context of differential expression (DE), this topic in the field of single-cell RNA sequencing is understudied. Moreover, the unique data characteristics present in scRNA-seq such as sparsity and heterogeneity increase the challenge.ResultsWe propose POWSC, a simulation-based method, to provide power evaluation and sample size recommendation for single-cell RNA sequencing DE analysis. POWSC consists of a data simulator that creates realistic expression data, and a power assessor that provides a comprehensive evaluation and visualization of the power and sample size relationship. The data simulator in POWSC outperforms two other state-of-art simulators in capturing key characteristics of real datasets. The power assessor in POWSC provides a variety of power evaluations including stratified and marginal power analyses for differential expressions characterized by two forms (phase transition or magnitude tuning), under different comparison scenarios. In addition, POWSC offers information for optimizing the tradeoffs between sample size and sequencing depth with the same total reads.AvailabilityPOWSC is an open-source R package available online at https://github.com/suke18/POWSC.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Scirpy: A Scanpy extension for analyzing single-cell T-cell receptor sequencing data

Bioinformatics Oxford Journals - Thu, 02/07/2020 - 5:30am
AbstractSummaryAdvances in single-cell technologies have enabled the investigation of T cell phenotypes and repertoires at unprecedented resolution and scale. Bioinformatic methods for the efficient analysis of these large-scale datasets are instrumental for advancing our understanding of adaptive immune responses. However, while well-established solutions are accessible for the processing of single-cell transcriptomes, no streamlined pipelines are available for the comprehensive characterization of T cell receptors. Here we propose Scirpy, a scalable Python toolkit that provides simplified access to the analysis and visualization of immune repertoires from single cells and seamless integration with transcriptomic data.Availability and implementationScirpy source code and documentation are available at https://github.com/icbi-lab/scirpy.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

TDAview: an online visualization tool for topological data analysis

Bioinformatics Oxford Journals - Thu, 02/07/2020 - 5:30am
AbstractSummaryTDAview is an online tool for topological data analysis and visualization. It implements the Mapper algorithm for topological data analysis and provides extensive graph visualization options. TDAview is a user-friendly tool that allows biologists and clinicians without programming knowledge to harness the power of topological data analysis. TDAview supports an analysis and visualization mode in which a Mapper graph is constructed based on user-specified parameters, followed by graph visualization. It can also be used in a visualization only mode in which TDAview is used for visualizing the data properties of a Mapper graph generated using other open-source software. The graph visualization options allow data exploration by graphical display of meta-data variable values for nodes and edges, as well as the generation of publishable figures. TDAview can handle large datasets, with tens of thousands of data points, and thus has a wide rande of applications for high-dimensional data, including the construction of topology-based gene co-expression networks.AvailabilityTDAview is a free online tool available at https://voineagulab.github.io/TDAview/. The source code, usage documentation and example data are available at TDAview GitHub repository: https://github.com/Voineagulab/TDAview.Supplementary informationSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

iPromoter-BnCNN: a Novel Branched CNN Based Predictor for Identifying and Classifying Sigma Promoters

Bioinformatics Oxford Journals - Thu, 02/07/2020 - 5:30am
AbstractMotivationPromoter is a short region of DNA which is responsible for initiating transcription of specific genes. Development of computational tools for automatic identification of promoters is in high demand. According to the difference of functions, promoters can be of different types. Promoters may have both intra and inter class variation and similarity in terms of consensus sequences. Accurate classification of various types of sigma promoters still remains a challenge.ResultsWe present iPromoter-BnCNN for identification and accurate classification of six types of promoters - σ24, σ28, σ32, σ38, σ54, σ70. It is a CNN based classifier which combines local features related to monomer nucleotide sequence, trimer nucleotide sequence, dimer structural properties and trimer structural properties through the use of parallel branching. We conducted experiments on a benchmark dataset and compared with six state-of-the-art tools to show our supremacy on 5-fold cross-validation. Moreover, we tested our classifier on an independent test dataset.AvailabilityOur proposed tool iPromoter-BnCNN web server is freely available at http://103.109.52.8/iPromoter-BnCNN. The runnable source code can be found https://colab.research.google.com/drive/1yWWh7BXhsm8U4PODgPqlQRy23QGjF2DZSupplementary informationSupplementary dataSupplementary data (benchmark dataset, independent test dataset, model files, structural property information, attention mechanism details and web server usage) are available at Bioinformatics. online.
Categories: Bioinformatics Trends

Using AnABlast for intergenic sORF prediction in the C. elegans genome

Bioinformatics Oxford Journals - Thu, 02/07/2020 - 5:30am
AbstractMotivationShort bioactive peptides encoded by small open reading frames (sORFs) play important roles in eukaryotes. Bioinformatics prediction of ORFs is an early step in a genome sequence analysis, but sORFs encoding short peptides, often using non-AUG initiation codons, are not easily discriminated from false ORFs occurring by chance.ResultsAnABlast is a computational tool designed to highlight putative protein-coding regions in genomic DNA sequences. This protein-coding finder is independent of ORF length and reading frame shifts, thus making of AnABlast a potentially useful tool to predict sORFs. By using this algorithm, here we report the identification of 82 putative new intergenic sORFs in the Caenorhabditis elegans genome. Sequence similarity, motif presence, expression data and RNA interference experiments support that the underlined sORFs likely encode functional peptides, encouraging the use of AnABlast as a new approach for the accurate prediction of intergenic sORFs in annotated eukaryotic genomes.AvailabilityAnABlast is freely available at http://www.bioinfocabd.upo.es/ab/. The C. elegans genome browser with AnABlast results, annotated genes, and all data used in this study is available at http://www.bioinfocabd.upo.es/celegansSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Network analysis of synonymous codon usage

Bioinformatics Oxford Journals - Wed, 01/07/2020 - 5:30am
AbstractMotivationMost amino acids are encoded by multiple synonymous codons, some of which are used more rarely than others. Analyses of positions of such rare codons in protein sequences revealed that rare codons can impact co-translational protein folding and that positions of some rare codons are evolutionarily conserved. Analyses of their positions in protein 3-dimensional structures, which are richer in biochemical information than sequences alone, might further explain the role of rare codons in protein folding.ResultsWe model protein structures as networks and use network centrality to measure the structural position of an amino acid. We first validate that amino acids buried within the structural core are network-central, and those on the surface are not. Then, we study potential differences between network centralities and thus structural positions of amino acids encoded by conserved rare, non-conserved rare, and commonly used codons. We find that in 84% of proteins, the three codon categories occupy significantly different structural positions. We examine protein groups showing different codon centrality trends, i.e., different relationships between structural positions of the three codon categories. We see several cases of all proteins from our data with some structural or functional property being in the same group. Also, we see a case of all proteins in some group having the same property.ConclusionOur work shows that codon usage is linked to the final protein structure and thus possibly to co-translational protein folding.Availabilityhttps://nd.edu/~cone/CodonUsage/Supplementary informationAttached.
Categories: Bioinformatics Trends

PRANC: ML species tree estimation from the ranked gene trees under coalescence

Bioinformatics Oxford Journals - Wed, 01/07/2020 - 5:30am
AbstractSummaryPRANC computes the Probabilities of RANked gene tree topologies under the multispecies Coalescent. A ranked gene tree is a gene tree accounting for the temporal ordering of internal nodes. PRANC can also estimate the maximum likelihood species tree from a sample of ranked or unranked gene tree topologies. It estimates the maximum likelihood tree with estimated branch lengths in coalescent units.AvailabilityPRANC is written in C ++ and freely available at github.com/anastasiiakim/PRANCSupplementary informationSupplementary materialSupplementary material is available.
Categories: Bioinformatics Trends

mirPLS: a partial linear structure identifier method for cancer subtyping using MicroRNAs

Bioinformatics Oxford Journals - Wed, 01/07/2020 - 5:30am
AbstractMotivationMicroRNAs (miRNAs) are small non-coding RNAs that have been successfully identified to be differentially expressed in various cancers. However, some miRNAs were reported to be up-regulated in one subtype of a cancer but down-regulated in another, making overall associations between these miRNAs and the heterogeneous cancer non-linear. These non-linearly associated miRNAs, if identified, are thus informative for cancer subtyping.ResultsHere we propose mirPLS, a Partial Linear Structure identifier for miRNA data that simultaneously identifies miRNAs of linear or non-linear associations with cancer status when non-linearly associated miRNAs can then be used for subsequent cancer subtyping. Simulation studies showed that mirPLS can identify both non-linearly and linearly outcome-associated miRNAs more accurately than the comparison methods. Using the identified non-linearly associated miRNAs much improves the cancer subtyping accuracy. Applications to miRNA data of three different cancer types suggest that the cancer subtypes defined by the non-linearly associated miRNAs identified by mirPLS are consistently more predictive of patient survival and more biological meaningful.AvailabilityThe R package mirPLS is available for downloading from https://github.com/pfruan/mirPLS.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

GPress: a framework for querying General Feature Format (GFF) files and expression files in a compressed form

Bioinformatics Oxford Journals - Wed, 01/07/2020 - 5:30am
AbstractMotivationSequencing data are often summarized at different annotation levels for further analysis, generally using the general feature format (GFF) or its descendants, gene transfer format (GTF) and GFF3. Existing utilities for accessing these files, like gffutils and gffread, do not focus on reducing the storage space, significantly increasing it in some cases. We propose GPress, a framework for querying GFF files in a compressed form. GPress can also incorporate and compress expression files from both bulk and single-cell RNA-Seq experiments, supporting simultaneous queries on both the GFF and expression files. In brief, GPress applies transformations to the data which are then compressed with the general lossless compressor BSC. To support queries, GPress compresses the data in blocks and creates several index tables for fast retrieval.ResultsWe tested GPress on several GFF files of different organisms, and showed that it achieves on average a 61% reduction in size with respect to gzip (the current de-facto compressor for GFF files), while being able to retrieve all annotations for a given identifier or a range of coordinates in a few seconds (when run in a common laptop). In contrast, gffutils provides faster retrieval but doubles the size of the GFF files. When additionally linking an expression file, we show that GPress can reduce its size by more than 68% when compared to gzip (for both bulk and single-cell RNA-Seq experiments), while still retrieving the information within seconds. Finally, applying BSC to the data streams generated by GPress instead of to the original file shows a size reduction of more than 44% on average.AvailabilityGPress is freely available at https://github.com/qm2/gpress.Supplementary informationSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Analysis of endothelial-to-haematopoietic transition at the single cell level identifies cell cycle regulation as a driver of differentiation

Genome Biology - BiomedCentral - Wed, 01/07/2020 - 5:30am
Haematopoietic stem cells (HSCs) first arise during development in the aorta-gonad-mesonephros (AGM) region of the embryo from a population of haemogenic endothelial cells which undergo endothelial-to-haematop...
Categories: Bioinformatics Trends

Analysis of endothelial-to-haematopoietic transition at the single cell level identifies cell cycle regulation as a driver of differentiation

Genome Biology - Wed, 01/07/2020 - 5:30am
Haematopoietic stem cells (HSCs) first arise during development in the aorta-gonad-mesonephros (AGM) region of the embryo from a population of haemogenic endothelial cells which undergo endothelial-to-haematop...
Categories: Bioinformatics Trends

Identification of cell type-specific methylation signals in bulk whole genome bisulfite sequencing data

Genome Biology - BiomedCentral - Wed, 01/07/2020 - 5:30am
The traditional approach to studying the epigenetic mechanism CpG methylation in tissue samples is to identify regions of concordant differential methylation spanning multiple CpG sites (differentially methyla...
Categories: Bioinformatics Trends

Identification of cell type-specific methylation signals in bulk whole genome bisulfite sequencing data

Genome Biology - Wed, 01/07/2020 - 5:30am
The traditional approach to studying the epigenetic mechanism CpG methylation in tissue samples is to identify regions of concordant differential methylation spanning multiple CpG sites (differentially methyla...
Categories: Bioinformatics Trends

Pages

Subscribe to Centre for Bioinformatics aggregator - Bioinformatics Trends

Calendar

Mon
Tue
Wed
Thu
Fri
Sat
Sun
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
 
July 2020