Jump to Navigation

CLoNe: automated clustering based on local density neighborhoods for application to biomolecular structural ensembles

Bioinformatics Oxford Journals - Fri, 20/08/2021 - 5:30am
AbstractMotivationProteins are intrinsically dynamic entities. Flexibility sampling methods, such as molecular dynamics or those arising from integrative modeling strategies, are now commonplace and enable the study of molecular conformational landscapes in many contexts. Resulting structural ensembles increase in size as technological and algorithmic advancements take place, making their analysis increasingly demanding. In this regard, cluster analysis remains a go-to approach for their classification. However, many state-of-the-art algorithms are restricted to specific cluster properties. Combined with tedious parameter fine-tuning, cluster analysis of protein structural ensembles suffers from the lack of a generally applicable and easy to use clustering scheme.ResultsWe present CLoNe, an original Python-based clustering scheme that builds on the Density Peaks algorithm of Rodriguez and Laio. CLoNe relies on a probabilistic analysis of local density distributions derived from nearest neighbors to find relevant clusters regardless of cluster shape, size, distribution and amount. We show its capabilities on many toy datasets with properties otherwise dividing state-of-the-art approaches and improves on the original algorithm in key aspects. Applied to structural ensembles, CLoNe was able to extract meaningful conformations from membrane binding events and ligand-binding pocket opening as well as identify dominant dimerization motifs or inter-domain organization. CLoNe additionally saves clusters as individual trajectories for further analysis and provides scripts for automated use with molecular visualization software.Availability and implementationwww.epfl.ch/labs/lbm/resources, github.com/LBM-EPFL/CLoNe.
Categories: Bioinformatics Trends

Bipartite graph-based approach for clustering of cell lines by gene expression-drug response associations

Bioinformatics Oxford Journals - Wed, 03/03/2021 - 5:30am
AbstractMotivationIn pharmacogenomic studies, the biological context of cell lines influences the predictive ability of drug-response models and the discovery of biomarkers. Thus, similar cell lines are often studied together based on prior knowledge of biological annotations. However, this selection approach is not scalable with the number of annotations, and the relationship between gene-drug association patterns and biological context may not be obvious.ResultsWe present a procedure to compare cell lines based on their gene-drug association patterns. Starting with a grouping of cell lines from biological annotation, we model gene-drug association patterns for each group as a bipartite graph between genes and drugs. This is accomplished by applying sparse canonical correlation analysis (SCCA) to extract the gene-drug associations, and using the canonical vectors to construct the edge weights. Then, we introduce a nuclear norm-based dissimilarity measure to compare the bipartite graphs. Accompanying our procedure is a permutation test to evaluate the significance of similarity of cell line groups in terms of gene-drug associations. In the pharmacogenomics datasets CTRP2, GDSC2, and CCLE, hierarchical clustering of carcinoma groups based on this dissimilarity measure uniquely reveals clustering patterns driven by carcinoma subtype rather than primary site. Next, we show that the top associated drugs or genes from SCCA can be used to characterize the clustering patterns of haematopoietic and lymphoid malignancies. Finally, we confirm by simulation that when drug responses are linearly-dependent on expression, our approach is the only one that can effectively infer the true hierarchy compared to existing approaches.AvailabilityBipartite graph-based hierarchical clustering is implemented in R and can be obtained from CRAN: https://CRAN.R-project.org/package=hierBipartite. The source code is available at https://github.com/CalvinTChi/hierBipartiteSupplementary informationSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

S3V2-IDEAS: a package for normalizing, denoising and integrating epigenomic datasets across different cell types

Bioinformatics Oxford Journals - Wed, 03/03/2021 - 5:30am
AbstractSummaryEpigenetic modifications reflect key aspects of transcriptional regulation, and many epigenomic data sets have been generated under different biological contexts to provide insights into regulatory processes. However, the technical noise in epigenomic data sets and the many dimensions (features) examined make it challenging to effectively extract biologically meaningful inferences from these data sets. We developed a package that reduces noise while normalizing the epigenomic data by a novel normalization method, followed by integrative dimensional reduction by learning and assigning epigenetic states. This package, called S3V2-IDEAS, can be used to identify epigenetic states for multiple features, or identify discretized signal intensity levels and a master peak list across different cell types for a single feature. We illustrate the outputs and performance of S3V2-IDEAS using 137 epigenomics data sets from the VISION project that provides ValIdated Systematic IntegratiON of epigenomic data in hematopoiesis.Availability and implementationS3V2-IDEAS pipeline is freely available as open source software released under an MIT license at: https://github.com/guanjue/S3V2_IDEAS_ESMPSupplementary informationS3V2_IDEAS_supplementary_materials.docx
Categories: Bioinformatics Trends

Characterizing protein conformers by cross-linking mass spectrometry and pattern recognition

Bioinformatics Oxford Journals - Wed, 03/03/2021 - 5:30am
AbstractMotivationChemical cross-linking coupled to mass spectrometry (XLMS) emerged as a powerful technique for studying protein structures and large-scale protein-protein interactions. Nonetheless, XLMS lacks software tailored toward dealing with multiple conformers; this scenario can lead to high-quality identifications that are mutually exclusive. This limitation hampers the applicability of XLMS in structural experiments of dynamic protein systems, where less abundant conformers of the target protein are expected in the sample.ResultsWe present QUIN-XL, a software that uses unsupervised clustering to group cross-link identifications by their quantitative profile across multiple samples. QUIN-XL highlights regions of the protein or system presenting changes in its conformation when comparing different biological conditions. We demonstrate our software's usefulness by revisiting the HSP90 protein, comparing three of its different conformers. QUIN-XL's clusters correlate directly to known protein 3D structures of the conformers and therefore validates our software.AvailabilityQUIN-XL and a user tutorial are freely available at http://patternlabforproteomics.org/quinxl for academic users.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Integrative Survival Analysis of Breast Cancer with Gene Expression and DNA Methylation Data

Bioinformatics Oxford Journals - Wed, 03/03/2021 - 5:30am
AbstractMotivationIntegrative multi-feature fusion analysis on biomedical data has gained much attention recently. In breast cancer, existing studies have demonstrated that combining genomic mRNA data and DNA methylation data can better stratify can-cer patients with distinct prognosis than using single signature. However, those ex-isting methods are simply combining these gene features in series and have ignored the correlations between separate omics dimensions over time.ResultsIn the present study, we propose an adaptive multi-task learning method, which combines the Cox loss task with the ordinal loss task, for survival prediction of breast cancer patients using multi-modal learning instead of performing survival analysis on each feature data set. First, we use local maximum quasi-clique merging (lmQCM) algorithm to reduce the mRNA and methylation feature dimensions and extract cluster eigengenes respectively. Then, we add an auxiliary ordinal loss to the original Cox model to improve the ability to optimize the learning process in training and regularization. The auxiliary loss helps to reduce the vanishing gradient problem for earlier layers and helps to decrease the loss of the primary task. Meanwhile, we use an adaptive weights ap-proach to multi-task learning which weighs multiple loss functions by considering the homoscedas-tic uncertainty of each task. Finally, we build an ordinal cox hazards model for survival analysis and use long short-term memory (LSTM) method to predict patients’ survival risk. We use the cross-validation method and the concordance index (C-index) for assessing the prediction effect. Strin-gent cross-verification testing processes for the benchmark data set and two additional datasets demonstrate that the developed approach is effective, achieving very competitive performance with existing approaches.Availability and implementationhttps://github.com/bhioswego/ML_ordCOXSupplementary informationSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

vcf2fhir: a utility to convert VCF files into HL7 FHIR format for genomics-EHR integration

BMC Bioinformatics - Tue, 02/03/2021 - 5:30am
VCF formatted files are the lingua franca of next-generation sequencing, whereas HL7 FHIR is emerging as a standard language for electronic health record interoperability. A growing number of FHIR-based clinic...
Categories: Bioinformatics Trends

InstantDL: an easy-to-use deep learning pipeline for image segmentation and classification

BMC Bioinformatics - Tue, 02/03/2021 - 5:30am
Deep learning contributes to uncovering molecular and cellular processes with highly performant algorithms. Convolutional neural networks have become the state-of-the-art tool to provide accurate and fast ima...
Categories: Bioinformatics Trends

A machine learning-based gene signature of response to the novel alkylating agent LP-184 distinguishes its potential tumor indications

BMC Bioinformatics - Tue, 02/03/2021 - 5:30am
Non-targeted cytotoxics with anticancer activity are often developed through preclinical stages using response criteria observed in cell lines and xenografts. A panel of the NCI-60 cell lines is frequently the...
Categories: Bioinformatics Trends

PnB Designer: a web application to design prime and base editor guide RNAs for animals and plants

BMC Bioinformatics - Tue, 02/03/2021 - 5:30am
The rapid expansion of the CRISPR toolbox through tagging effector domains to either enzymatically inactive Cas9 (dCas9) or Cas9 nickase (nCas9) has led to several promising new gene editing strategies. Recent...
Categories: Bioinformatics Trends

Normalization of single-cell RNA-seq counts by log(x + 1)* or log(1 + x)*

Bioinformatics Oxford Journals - Tue, 02/03/2021 - 5:30am
Abstract Single-cell RNA-seq technologies have been successfully employed over the past decade to generate many high resolution cell atlases. These have proved invaluable in recent efforts aimed at understanding the cell type specificity of host genes involved in SARS-CoV-2 infections. While single-cell atlases are based on well-sampled highly-expressed genes, many of the genes of interest for understanding SARS-CoV-2 can be expressed at very low levels. Common assumptions underlying standard single-cell analyses don’t hold when examining low-expressed genes, with the result that standard workflows can produce misleading results.Supplementary informationSupplementary data and all of the code to reproduce Figure 1 are available here: https://github.com/pachterlab/BP_2020_2/.
Categories: Bioinformatics Trends

Gene-Set Integrative Analysis of Multi-Omics Data Using Tensor-based Association Test

Bioinformatics Oxford Journals - Mon, 01/03/2021 - 5:30am
AbstractMotivationFacilitated by technological advances and the decrease in costs, it is feasible to gather subject data from several omics platforms. Each platform assesses different molecular events, and the challenge lies in efficiently analyzing these data to discover novel disease genes or mechanisms. A common strategy is to regress the outcomes on all omics variables in a gene set. However, this approach suffers from problems associated with high-dimensional inference.ResultsWe introduce a tensor-based framework for variable-wise inference in multi-omics analysis. By accounting for the matrix structure of an individual’s multi-omics data, the proposed tensor methods incorporate the relationship among omics effects, reduce the number of parameters, and boost the modeling efficiency. We derive the variable-specific tensor test and enhance computational efficiency of tensor modeling. Using simulations and data applications on the Cancer Cell Line Encyclopedia (CCLE), we demonstrate our method performs favorably over baseline methods and will be useful for gaining biological insights in multi-omics analysis.Availability and ImplementationR function and instruction are available from the authors’ website: https://www4.stat.ncsu.edu/∼jytzeng/Software/TR.omics/TRinstruction.pdfSupplementary informationSupplementary materialsSupplementary materials are available at Bioinformatics online.
Categories: Bioinformatics Trends

2passtools: two-pass alignment using machine-learning-filtered splice junctions increases the accuracy of intron detection in long-read RNA sequencing

Genome Biology - Mon, 01/03/2021 - 5:30am
Transcription of eukaryotic genomes involves complex alternative processing of RNAs. Sequencing of full-length RNAs using long reads reveals the true complexity of processing. However, the relatively high erro...
Categories: Bioinformatics Trends

2passtools: two-pass alignment using machine-learning-filtered splice junctions increases the accuracy of intron detection in long-read RNA sequencing

Genome Biology - BiomedCentral - Mon, 01/03/2021 - 5:30am
Transcription of eukaryotic genomes involves complex alternative processing of RNAs. Sequencing of full-length RNAs using long reads reveals the true complexity of processing. However, the relatively high erro...
Categories: Bioinformatics Trends

Pattern recognition in lymphoid malignancies using CytoGPS and Mercator

BMC Bioinformatics - Mon, 01/03/2021 - 5:30am
There have been many recent breakthroughs in processing and analyzing large-scale data sets in biomedical informatics. For example, the CytoGPS algorithm has enabled the use of text-based karyotypes by transfo...
Categories: Bioinformatics Trends

barCoder: a tool to generate unique, orthogonal genetic tags for qPCR detection

BMC Bioinformatics - Mon, 01/03/2021 - 5:30am
Tracking dispersal of microbial populations in the environment requires specific detection methods that discriminate between the target strain and all potential natural and artificial interferents, including p...
Categories: Bioinformatics Trends

Multi-dimensional data integration algorithm based on random walk with restart

BMC Bioinformatics - Sat, 27/02/2021 - 5:30am
The accumulation of various multi-omics data and computational approaches for data integration can accelerate the development of precision medicine. However, the algorithm development for multi-omics data inte...
Categories: Bioinformatics Trends

miTAR: a hybrid deep learning-based approach for predicting miRNA targets

BMC Bioinformatics - Sat, 27/02/2021 - 5:30am
microRNAs (miRNAs) have been shown to play essential roles in a wide range of biological processes. Many computational methods have been developed to identify targets of miRNAs. However, the majority of these ...
Categories: Bioinformatics Trends

Blinking Statistics and Molecular Counting in direct Stochastic Reconstruction Microscopy (dSTORM)

Bioinformatics Oxford Journals - Sat, 27/02/2021 - 5:30am
AbstractMotivationMany recent advancements in single molecule localisation microscopy exploit the stochastic photo-switching of fluorophores to reveal complex cellular structures beyond the classical diffraction limit. However, this same stochasticity makes counting the number of molecules to high precision extremely challenging, preventing key insight into the cellular structures and processes under observation.ResultsModelling the photo-switching behaviour of a fluorophore as an unobserved continuous time Markov process transitioning between a single fluorescent and multiple dark states, and fully mitigating for missed blinks and false positives, we present a method for computing the exact probability distribution for the number of observed localisations from a single photo-switching fluorophore. This is then extended to provide the probability distribution for the number of localisations in a dSTORM experiment involving an arbitrary number of molecules. We demonstrate that when training data is available to estimate photoswitching rates, the unknown number of molecules can be accurately recovered from the posterior mode of the number of molecules given the number of localisations. Finally, we demonstrate the method on experimental data by quantifying the number of adapter protein Linker for Activation of T cells (LAT) on the cell surface of the T cell immunological synapse.AvailabilitySoftware available at https://github.com/lp1611/mol_count_dstorm.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

CCmed: Cross-condition mediation analysis for identifying replicable trans-associations mediated by cis-gene expression

Bioinformatics Oxford Journals - Sat, 27/02/2021 - 5:30am
AbstractMotivationTrans-acting expression quantitative trait loci (eQTLs) collectively explain a substantial proportion of expression variation, yet are challenging to detect and replicate since their effects are often individually weak. A large proportion of genetic effects on distal genes are mediated through cisgene expression. Cis-association (between SNP and cis-gene) and gene-gene correlation conditional on SNP genotype could establish trans-association (between SNP and trans-gene). Both cis-association and gene-gene conditional correlation have effects shared across relevant tissues and conditions, and transassociations mediated by cis-gene expression also have effects shared across relevant conditions.Results. We proposed a Cross-Condition Mediation analysis method (CCmed) for detecting cis-mediated trans-associations with replicable effects in relevant conditions/studies. CCmed integrates cis-association and gene-gene conditional correlation statistics from multiple tissues/studies. Motivated by the bimodal effect-sharing patterns of eQTLs, we proposed two variations of CCmed, CCmedmost and CCmedspec for detecting cross-tissue and tissue-specific trans-associations, respectively. We analyzed data of 13 brain tissues from the Genotype-Tissue Expression (GTEx) project, and identified trios with cis-mediated transassociations across brain tissues, many of which showed evidence of trans-association in two replication studies. We also identified trans-genes associated with schizophrenia loci in at least two brain tissues.Availability and implementationCCmed software is available at http://github.com/kjgleason/CCmed.Supplementary informationSupplementary MaterialSupplementary Material are available at Bioinformatics online.
Categories: Bioinformatics Trends

splice-aware RNA-Seq data simulation

Bioinformatics Oxford Journals - Sat, 27/02/2021 - 5:30am
AbstractSummaryA plethora of tools exist for RNA-Seq data analysis with a focus on alternative splicing (AS). However, appropriate data for their comparative evaluation is missing. The R package ASimulatoR simulates gold standard RNA-Seq datasets with fine-grained control over the distribution of AS events, which allow for evaluating alternative splicing tools, e.g. to study the effect of sequencing depth on the performance of AS event detection.Availability and implementationASimulatoR is freely available at https://github.com/biomedbigdata/ASimulatoR as an R package under GPL-3 license.
Categories: Bioinformatics Trends

Pages

Subscribe to Centre for Bioinformatics aggregator - Bioinformatics Trends

Calendar

Mon
Tue
Wed
Thu
Fri
Sat
Sun
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
 
March 2021