Jump to Navigation

Unified Methods for Feature Selection in Large-Scale Genomic Studies with Censored Survival Outcomes

Bioinformatics Oxford Journals - Tue, 10/03/2020 - 5:30am
AbstractMotivationOne of the major goals in large-scale genomic studies is to identify genes with a prognostic impact on time-to-event outcomes which provide insight into the disease’s process. With rapid developments in high-throughput genomic technologies in the past two decades, the scientific community is able to monitor the expression levels of tens of thousands of genes and proteins resulting in enormous data sets where the number of genomic features is far greater than the number of subjects. Methods based on univariate Cox regression are often used to select genomic features related to survival outcome; however, the Cox model assumes proportional hazards (PH), which is unlikely to hold for each feature. When applied to genomic features exhibiting some form of non-proportional hazards (NPH), these methods could lead to an under- or over-estimation of the effects. We propose a broad array of marginal screening techniques that aid in feature ranking and selection by accommodating various forms of NPH. First, we develop an approach based on Kullback-Leibler information divergence and the Yang-Prentice model that includes methods for the PH and proportional odds (PO) models as special cases. Next, we propose R2 measures for the PH and PO models that can be interpreted in terms of explained randomness. Lastly, we propose a generalized pseudo-R2 index that includes PH, PO, crossing hazards and crossing odds models as special cases and can be interpreted as the percentage of separability between subjects experiencing the event and not experiencing the event according to feature measurements.ResultsWe evaluate the performance of our measures using extensive simulation studies and publicly available data sets in cancer genomics. We demonstrate that the proposed methods successfully address the issue of NPH in genomic feature selection and outperform existing methods.AvailabilityR code for the proposed methods is available at github.com/lburns27/Feature-Selection.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Leitmotif: protein motif scanning 2.0

Bioinformatics Oxford Journals - Tue, 10/03/2020 - 5:30am
AbstractMotivationMotif-HMM (mHMM) scanning has been shown to possess unique advantages over standardly used sequence-profile search methods (e.g. HMMER, PSI-BLAST) since it is particularly well suited to discriminate proteins with variations inside conserved motifs (e.g. family subtypes) or motifs lacking essential residues (false positives, e.g. pseudoenzymes).ResultsIn order to make mHMM widely accessible to a broader scientific community we developed Leitmotif, a mHMM web application with many parametrization options easily accessible through intuitive interface. Substantial improvement of performance (ROC scores) was obtained by using two novel parameters. To the best of our knowledge Leitmotif is the only available mHMM application.AvailabilityLeitmotif is freely available at https://leitmotif.irb.hrSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

immuneSIM: tunable multi-feature simulation of B- and T-cell receptor repertoires for immunoinformatics benchmarking

Bioinformatics Oxford Journals - Tue, 10/03/2020 - 5:30am
AbstractSummaryB- and T-cell receptor repertoires of the adaptive immune system have become a key target for diagnostics and therapeutics research. Consequently, there is a rapidly growing number of bioinformatics tools for immune repertoire analysis. Benchmarking of such tools is crucial for ensuring reproducible and generalizable computational analyses. Currently, however, it remains challenging to create standardized ground truth immune receptor repertoires for immunoinformatics tool benchmarking. Therefore, we developed immuneSIM, an R package that allows the simulation of native-like and aberrant synthetic full-length variable region immune receptor sequences by tuning the following immune receptor features: (i) species and chain type (BCR, TCR, single, paired), (ii) germline gene usage, (iii) occurrence of insertions and deletions, (iv) clonal abundance, (v) somatic hypermutation, and (vi) sequence motifs. Each simulated sequence is annotated by the complete set of simulation events that contributed to its in silico generation. immuneSIM permits the benchmarking of key computational tools for immune receptor analysis such as germline gene annotation, diversity, and overlap estimation, sequence similarity, network architecture, clustering analysis, and machine learning methods for motif detection.AvailabilityThe package is available via https://github.com/GreiffLab/immuneSIM and on CRAN at https://cran.r-project.org/web/packages/immuneSIM. The documentation is hosted at https://immuneSIM.readthedocs.io.Supplementary informationSupplementary dataSupplementary data will be available at Bioinformatics online.
Categories: Bioinformatics Trends

grabseqs: Simple downloading of reads and metadata from multiple next-generation sequencing data repositories

Bioinformatics Oxford Journals - Tue, 10/03/2020 - 5:30am
AbstractSummaryHigh-throughput sequencing is a powerful technique for addressing biological questions. Grabseqs streamlines access to publicly available metagenomic data by providing a single, easy-to-use interface to download data and metadata from multiple repositories including the Sequence Read Archive (SRA), the Metagenomics Rapid Annotation through Subsystems Technology (MG-RAST) server, and iMicrobe. Users can download data and metadata in a standardized format from any number of samples or projects from a given repository with a single grabseqs command.AvailabilityGrabseqs is an open-source tool implemented in Python and licensed under the MIT License. The source code is freely available from https://github.com/louiejtaylor/grabseqs, the Python Package Index (PyPI), and Anaconda Cloud repository.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

CPS Analysis: Self-contained validation of biomedical data clustering

Bioinformatics Oxford Journals - Tue, 10/03/2020 - 5:30am
AbstractMotivationCluster analysis is widely used to identify interesting subgroups in biomedical data. Since true class labels are unknown in the unsupervised setting, it is challenging to validate any cluster obtained computationally, an important problem barely addressed by the research community.ResultsWe have developed a toolkit called Covering Point Set (CPS) analysis to quantify uncertainty at the levels of individual clusters and overall partitions. Functions have been developed to effectively visualize the inherent variation in any cluster for data of high dimension, and provide more comprehensive view on potentially interesting subgroups in the data. Applying to three usage scenarios for biomedical data, we demonstrate that CPS analysis is more effective for evaluating uncertainty of clusters comparing to state-of-the-art measurements. We also showcase how to use CPS analysis to select data generation technologies or visualization methods.AvailabilityThe method is implemented in an R package called OTclust, available on CRAN.Supplementary informationSupplementary informationSupplementary information are available at Bioinformatics online.
Categories: Bioinformatics Trends

High-resolution modeling of the selection on local mRNA folding strength in coding sequences across the tree of life

Genome Biology - BiomedCentral - Mon, 09/03/2020 - 5:30am
mRNA can form local secondary structure within the protein-coding sequence, and the strength of this structure is thought to influence gene expression regulation. Previous studies suggest that secondary struct...
Categories: Bioinformatics Trends

A benchmark of algorithms for the analysis of pooled CRISPR screens

Genome Biology - BiomedCentral - Mon, 09/03/2020 - 5:30am
Genome-wide pooled CRISPR-Cas-mediated knockout, activation, and repression screens are powerful tools for functional genomic investigations. Despite their increasing importance, there is currently little guid...
Categories: Bioinformatics Trends

Hemispheric asymmetry in the human brain and in Parkinson’s disease is linked to divergent epigenetic patterns in neurons

Genome Biology - BiomedCentral - Mon, 09/03/2020 - 5:30am
Hemispheric asymmetry in neuronal processes is a fundamental feature of the human brain and drives symptom lateralization in Parkinson’s disease (PD), but its molecular determinants are unknown. Here, we ident...
Categories: Bioinformatics Trends

High-resolution modeling of the selection on local mRNA folding strength in coding sequences across the tree of life

Genome Biology - Mon, 09/03/2020 - 5:30am
mRNA can form local secondary structure within the protein-coding sequence, and the strength of this structure is thought to influence gene expression regulation. Previous studies suggest that secondary struct...
Categories: Bioinformatics Trends

A benchmark of algorithms for the analysis of pooled CRISPR screens

Genome Biology - Mon, 09/03/2020 - 5:30am
Genome-wide pooled CRISPR-Cas-mediated knockout, activation, and repression screens are powerful tools for functional genomic investigations. Despite their increasing importance, there is currently little guid...
Categories: Bioinformatics Trends

Hemispheric asymmetry in the human brain and in Parkinson’s disease is linked to divergent epigenetic patterns in neurons

Genome Biology - Mon, 09/03/2020 - 5:30am
Hemispheric asymmetry in neuronal processes is a fundamental feature of the human brain and drives symptom lateralization in Parkinson’s disease (PD), but its molecular determinants are unknown. Here, we ident...
Categories: Bioinformatics Trends

A Graph Regularized Generalized Matrix Factorization Model for Predicting Links in Biomedical Bipartite Networks

Bioinformatics Oxford Journals - Sat, 07/03/2020 - 5:30am
AbstractMotivationPredicting potential links in biomedical bipartite networks can provide useful insights into the diagnosis and treatment of complex diseases and the discovery of novel drug targets. Computational methods have been proposed recently to predict potential links for various biomedical bipartite networks. However, existing methods are usually rely on the coverage of known links, which may encounter difficulties when dealing with new nodes without any known link information.ResultsIn this study, we propose a new link prediction method, named graph regularized generalized matrix factorization (GRGMF), to identify potential links in biomedical bipartite networks. First, we formulate a generalized matrix factorization model to exploit the latent patterns behind observed links. In particular, it can take into account the neighborhood information of each node when learning the latent representation for each node, and the neighborhood information of each node can be learned adaptively. Second, we introduce two graph regularization terms to draw support from affinity information of each node derived from external databases to enhance the learning of latent representations. We conduct extensive experiments on six real datasets. Experiment results show that GRGMF can achieve competitive performance on all these datasets, which demonstrate the effectiveness of GRGMF in prediction potential links in biomedical bipartite networks.Availability and ImplementationThe package is available at https://github.com/happyalfred2016/GRGMF.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

HLPpred-Fuse: improved and robust prediction of hemolytic peptide and its activity by fusing multiple feature representation

Bioinformatics Oxford Journals - Sat, 07/03/2020 - 5:30am
AbstractMotivationTherapeutic peptides failing at clinical trials could be attributed to their toxicity profiles like hemolytic activity, which hamper further progress of peptides as drug candidates. The accurate prediction of hemolytic peptides (HLPs) and its activity from the given peptides is one of the challenging tasks in immunoinformatics, which is essential for drug development and basic research. Although, there are a few computational methods that have been proposed for this aspect, none of them are able to identify hemolytic peptides and their activities simultaneously.ResultsIn this study, we proposed a two-layer prediction framework, called HLPpred-Fuse, that can accurately and automatically predict both hemolytic peptides (HLPs or non-HLPs) as well as HLPs activity (high and low). More specifically, feature representation learning scheme was utilized to generate 54 probabilistic features by integrating six different machine-learning classifiers and nine different sequence-based encodings. Consequently, the 54 probabilistic features were fused to provide sufficiently converged sequence information which was used as an input to extremely randomized tree for the development of two final prediction models which independently identify hemolytic peptide and its activity. Performance comparisons over empirical cross-validation analysis, independent test, and case study against state-of-the-art methods demonstrate that HLPpred-Fuse consistently outperformed these methods in the identification of hemolytic activity.AvailabilityFor the convenience of experimental scientists, a web-based tool has been established at http://thegleelab.org/HLPpred-Fuse.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

nanoTRON: a Picasso module for MLP-based classification of super-resolution data

Bioinformatics Oxford Journals - Sat, 07/03/2020 - 5:30am
AbstractMotivationClassification of images is an essential task in higher-level analysis of biological data. By bypassing the diffraction-limit of light, super-resolution microscopy opened up a new way to look at molecular details using light microscopy, producing large amounts of data with exquisite spatial detail. Statistical exploration of data usually needs initial classification, which is up to now often performed manually.ResultsWe introduce nanoTRON, an interactive open-source tool, which allows super-resolution data classification based on image recognition. It extends the software package Picasso with the first deep learning tool with a graphic user interface.AvailabilitynanoTRON is written in Python and freely available under the MIT license as a part of the software collection Picasso on GitHub (http://www.github.com/jungmannlab/picasso). All data files and code relevant for the review process of this paper can be accessed at https://datashare.biochem.mpg.de/s/iPBw9tj4OO9X4pCSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

The genome evolution and domestication of tropical fruit mango

Genome Biology - BiomedCentral - Fri, 06/03/2020 - 5:30am
Mango is one of the world’s most important tropical fruits. It belongs to the family Anacardiaceae, which includes several other economically important species, notably cashew, sumac and pistachio from other g...
Categories: Bioinformatics Trends

Gamevar.f90: a software package for calculating individual gametic diversity

BMC Bioinformatics - Fri, 06/03/2020 - 5:30am
Traditional selection in livestock and crops focuses on additive genetic values or breeding values of the individuals. While traditional selection utilizes variation between individuals, differences between ga...
Categories: Bioinformatics Trends

PyBSASeq: a simple and effective algorithm for bulked segregant analysis with whole-genome sequencing data

BMC Bioinformatics - Fri, 06/03/2020 - 5:30am
Bulked segregant analysis (BSA), coupled with next-generation sequencing, allows the rapid identification of both qualitative and quantitative trait loci (QTL), and this technique is referred to as BSA-Seq her...
Categories: Bioinformatics Trends

The genome evolution and domestication of tropical fruit mango

Genome Biology - Fri, 06/03/2020 - 5:30am
Mango is one of the world’s most important tropical fruits. It belongs to the family Anacardiaceae, which includes several other economically important species, notably cashew, sumac and pistachio from other g...
Categories: Bioinformatics Trends

CNV Radar: an improved method for somatic copy number alteration characterization in oncology

BMC Bioinformatics - Fri, 06/03/2020 - 5:30am
Cancer associated copy number variation (CNV) events provide important information for identifying patient subgroups and suggesting treatment strategies. Technical and logistical issues, however, make it chall...
Categories: Bioinformatics Trends

SCLpred-EMS: subcellular localization prediction of endomembrane system and secretory pathway proteins by Deep N-to-1 Convolutional Neural Networks

Bioinformatics Oxford Journals - Fri, 06/03/2020 - 5:30am
AbstractMotivationThe subcellular location of a protein can provide useful information for protein function prediction and drug design. Experimentally determining the subcellular location of a protein is an expensive and time-consuming task. Therefore, various computer-based tools have been developed, mostly using machine learning algorithms, to predict the subcellular location of proteins.ResultsHere, we present a neural network based algorithm for protein subcellular location prediction. We introduce SCLpred-EMS a subcellular localization predictor powered by an ensemble of Deep N-to-1 Convolutional Neural Networks. SCLpred-EMS predicts the subcellular location of a protein into two classes, the endomembrane system and secretory pathway versus all others, with an MCC of 0.75-0.86 outperforming the other state-of-the-art web servers we tested.AvailabilitySCLpred-EMS is freely available for academic users at http://distilldeep.ucd.ie/SCLpred2/Supplementary informationSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Pages

Subscribe to Centre for Bioinformatics aggregator - Bioinformatics Trends

Calendar

Mon
Tue
Wed
Thu
Fri
Sat
Sun
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
 
April 2020