Jump to Navigation

A robust approach to 3D neuron shape representation for quantification and classification

BMC Bioinformatics - Thu, 28/09/2023 - 5:30am
We consider the problem of finding an accurate representation of neuron shapes, extracting sub-cellular features, and classifying neurons based on neuron shapes. In neuroscience research, the skeleton represen...
Categories: Bioinformatics Trends

An improved rhythmicity analysis method using Gaussian Processes detects cell-density dependent circadian oscillations in stem cells

Bioinformatics Oxford Journals - Thu, 28/09/2023 - 5:30am
AbstractMotivationDetecting oscillations in time series remains a challenging problem even after decades of research. In chronobiology, rhythms (for instance in gene expression, eclosion, egg-laying and feeding) tend to be low amplitude, display large variations amongst replicates, and often exhibit varying peak-to-peak distances (non-stationarity). Most currently available rhythm detection methods are not specifically designed to handle such datasets, and are also limited by their use of p-values in detecting oscillations.ResultsWe introduce a new method, ODeGP (Oscillation Detection using Gaussian Processes), which combines Gaussian Process (GP) regression and Bayesian inference to incorporate measurement errors, non-uniformly sampled data, and a recently developed non-stationary kernel to improve detection of oscillations. By using Bayes factors, ODeGP models both the null (non-rhythmic) and the alternative (rhythmic) hypotheses, thus providing an advantage over p-values. Using synthetic datasets we first demonstrate that ODeGP almost always outperforms eight commonly used methods in detecting stationary as well as non-stationary symmetric oscillations. Next, by analyzing existing qPCR datasets we demonstrate that our method is more sensitive compared to the existing methods at detecting weak and noisy oscillations. Finally, we generate new qPCR data on mouse embryonic stem cells. Surprisingly, we discover using ODeGP that increasing cell density results in rapid generation of oscillations in the Bmal1 gene, thus highlighting our method’s ability to discover unexpected and new patterns. In its current implementation, ODeGP is meant only for analyzing single or a few time-trajectories, not genome-wide datasets.Availability and implementationODeGP is available at https://github.com/Shaonlab/ODeGPSupplementary informationSupplementary dataSupplementary data are available at Journal Name online.
Categories: Bioinformatics Trends

Automatic echocardiographic anomalies interpretation using a stacked residual-dense network model

BMC Bioinformatics - Wed, 27/09/2023 - 5:30am
Echocardiographic interpretation during the prenatal or postnatal period is important for diagnosing cardiac septal abnormalities. However, manual interpretation can be time consuming and subject to human erro...
Categories: Bioinformatics Trends

Fuzzy optimization for identifying antiviral targets for treating SARS-CoV-2 infection in the heart

BMC Bioinformatics - Wed, 27/09/2023 - 5:30am
In this paper, a fuzzy hierarchical optimization framework is proposed for identifying potential antiviral targets for treating severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection in the hea...
Categories: Bioinformatics Trends

A heterogeneous graph convolutional attention network method for classification of autism spectrum disorder

BMC Bioinformatics - Wed, 27/09/2023 - 5:30am
Autism spectrum disorder (ASD) is a serious developmental disorder of the brain. Recently, various deep learning methods based on functional magnetic resonance imaging (fMRI) data have been developed for the c...
Categories: Bioinformatics Trends

Design of optimal labeling patterns for optical genome mapping via information theory

Bioinformatics Oxford Journals - Wed, 27/09/2023 - 5:30am
AbstractMotivationOptical genome mapping (OGM) is a technique that extracts partial genomic information from optically imaged and linearized DNA fragments containing fluorescently labeled short sequence patterns. This information can be used for various genomic analyses and applications, such as the detection of structural variations and copy-number variations, epigenomic profiling, and microbial species identification. Currently, the choice of labeled patterns is based on the available bio-chemical methods, and is not necessarily optimized for the application.ResultsIn this work, we develop a model of OGM based on information theory, which enables the design of optimal labeling patterns for specific applications and target organism genomes. We validated the model through experimental OGM on human DNA and simulations on bacterial DNA. Our model predicts up to 10-fold improved accuracy by optimal choice of labeling patterns, which may guide future development of OGM bio-chemical labeling methods and significantly improve its accuracy and yield for applications such as epigenomic profiling and cultivation-free pathogen identification in clinical samples.Availability and implementationhttps://github.com/yevgenin/PatternCode
Categories: Bioinformatics Trends

Balancing Biomass Reaction Stoichiometry and Measured Fluxes in Flux Balance Analysis

Bioinformatics Oxford Journals - Wed, 27/09/2023 - 5:30am
AbstractMotivationFlux Balance Analysis (FBA) is widely recognized as an important method for studying metabolic networks. When incorporating flux measurements of certain reactions into an FBA problem, it is possible that the underlying linear program may become infeasible, for example, due to measurement or modeling inaccuracies. Furthermore, while the biomass reaction is of central importance in FBA models, its stoichiometry is often a rough estimate and a source of high uncertainty.ResultsIn this work, we present a method that allows modifications to the biomass reaction stoichiometry as a means to (i) render the FBA problem feasible and to (ii) improve the accuracy of the model by corrections in the biomass composition. Optionally, the adjustment of the biomass composition can be used in conjunction with a previously introduced approach for balancing inconsistent fluxes to obtain a feasible FBA system. We demonstrate the value of our approach by analyzing realistic flux measurements of E.coli. In particular, we find that the growth-associated maintenance (GAM) demand of ATP, which is typically integrated in the biomass reaction, is likely overestimated in recent genome-scale models, at least for certain growth conditions. In light of these findings, we discuss issues related to determination and inclusion of GAM values in constraint-based models. Overall, our method can uncover potential errors and suggest adjustments in the assumed biomass composition in FBA models based on inconsistencies between model and measured fluxes.AvailabilityThe developed method has been implemented in our software tool CNApy available from github.com/cnapy-org/CNApy.Supplementary informationSupplementary data can be found at https://github.com/cnapy-org.
Categories: Bioinformatics Trends

compleasm: a faster and more accurate reimplementation of BUSCO

Bioinformatics Oxford Journals - Wed, 27/09/2023 - 5:30am
AbstractMotivationEvaluating the gene completeness is critical to measuring the quality of a genome assembly. An incomplete assembly can lead to errors in gene predictions, annotation, and other downstream analyses. BUSCO is a widely used tool for assessing the completeness of genome assembly by testing the presence of a set of single-copy orthologs conserved across a wide range of taxa. However, BUSCO is slow particularly for large genome assemblies. It is cumbersome to apply BUSCO to a large number of assemblies.ResultsHere, we present compleasm, an efficient tool for assessing the completeness of genome assemblies. Compleasm utilizes the miniprot protein-to-genome aligner and the conserved orthologous genes from BUSCO. It is 14 times faster than BUSCO for human assemblies and reports a more accurate completeness of 99.6% than BUSCO’s 95.7%, which is in close agreement with the annotation completeness of 99.5% for T2T-CHM13.Availabilityhttps://github.com/huangnengCSU/compleasmSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Inferring circadian gene regulatory relationships from gene expression data with a hybrid framework

BMC Bioinformatics - Tue, 26/09/2023 - 5:30am
The central biological clock governs numerous facets of mammalian physiology, including sleep, metabolism, and immune system regulation. Understanding gene regulatory relationships is crucial for unravelling t...
Categories: Bioinformatics Trends

Extending protein interaction networks using proteoforms and small molecules

Bioinformatics Oxford Journals - Tue, 26/09/2023 - 5:30am
AbstractMotivationBiological network analysis for high-throughput biomedical data interpretation relies heavily on topological characteristics. Networks are commonly composed of nodes representing genes or proteins that are connected by edges when interacting. In this study, we use the rich information available in the Reactome pathway database to build biological networks accounting for small molecules and proteoforms modeled using protein isoforms and post-translational modifications to study the topological changes induced by this refinement of the network representation.ResultsWe find that improving the interactome modeling increases the number of nodes and interactions, but that isoform and post-translational modification annotation is still limited compared to what can be expected biologically. We also note that small molecule information can distort the topology of the network due to the high connectedness of these molecules, which does not necessarily represent the reality of biology. However, by restricting the connections of small molecules to the context of biochemical reactions, we find that these improve the overall connectedness of the network and reduce the prevalence of isolated components and nodes. Overall, changing the representation of the network alters the prevalence of articulation points and bridges globally but also within and across pathways. Hence, some molecules can gain or lose in biological importance depending on the level of detail of the representation of the biological system, which might in turn impact network-based studies of diseases or druggability.AvailabilityNetworks are constructed based on data publicly available in the Reactome Pathway knowledgebase: reactome.orgSupplementary informationThe networks produced by this study are available at the public repository: github.com/PathwayAnalysisPlatform/Networks.
Categories: Bioinformatics Trends

ScribbleDom: Using Scribble-Annotated Histology Images to Identify Domains in Spatial Transcriptomics Data

Bioinformatics Oxford Journals - Tue, 26/09/2023 - 5:30am
AbstractMotivationSpatial domain identification is a very important problem in the field of Spatial Transcriptomics (ST). The state-of-the-art solutions to this problem focus on unsupervised methods, as there is lack of data for a supervised learning formulation. The results obtained from these methods highlight significant opportunities for improvement.ResultsIn this paper, we propose a potential avenue for enhancement through the development of a semi-supervised convolutional neural network (CNN) based approach. Named ScribbleDom, our method leverages human expert’s input as a form of semi-supervision, thereby seamlessly combines the cognitive abilities of human experts with the computational power of machines. ScribbleDom incorporates a loss function that integrates two crucial components: similarity in gene expression profiles and adherence to the valuable input of a human annotator through scribbles on histology images, providing prior knowledge about spot labels. The spatial continuity of the tissue domains is taken into account by extracting information on the spot micro-environment through convolution filters of varying sizes, in the form of Inception blocks. By leveraging this semi-supervised approach, ScribbleDom significantly improves the quality of spatial domains, yielding superior results both quantitatively and qualitatively. Our experiments on several benchmark datasets demonstrate the clear edge of ScribbleDom over state-of-the-art methods—between 1.82% to 169.38% improvements in Adjusted Rand Index (ARI) for 9 of the 12 Human DLPFC samples, and 15.54% improvement in the Melanoma cancer dataset. Notably, when the expert input is absent, ScribbleDom can still operate, in a fully unsupervised manner like the state-of-the-art methods, and produces results that remain competitive.AvailabilitySource code is available at Github (https://github.com/1alnoman/ScribbleDom) and Zenodo (https://zenodo.org/badge/latestdoi/681572669).Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Simulating structurally variable Nuclear Pore Complexes for Microscopy

Bioinformatics Oxford Journals - Tue, 26/09/2023 - 5:30am
AbstractMotivationThe Nuclear Pore Complex (NPC) is the only passageway for macromolecules between nucleus and cytoplasm, and an important reference standard in microscopy: it is massive and stereotypically arranged. The average architecture of NPC proteins has been resolved with pseudo-atomic precision, however observed NPC heterogeneities evidence a high degree of divergence from this average. Single Molecule Localization Microscopy (SMLM) images NPCs at protein-level resolution, whereupon image analysis software studies NPC variability. However the true picture of this variability is unknown. In quantitative image analysis experiments, it is thus difficult to distinguish intrinsically high SMLM noise from variability of the underlying structure.ResultsWe introduce CIR4MICS (”ceramics”, Configurable, Irregular Rings FOR MICroscopy Simulations), a pipeline that synthesizes ground truth datasets of structurally variable NPCs based on architectural models of the true NPC. Users can select one or more N- or C-terminally tagged NPC proteins, and simulate a wide range of geometric variations. We also represent the NPC as a spring-model such that arbitrary deforming forces, of user-defined magnitudes, simulate irregularly shaped variations. Further, we provide annotated reference datasets of simulated human NPCs, which facilitate a side-by-side comparison with real data. To demonstrate, we synthetically replicate a geometric analysis of real NPC radii and reveal that a range of simulated variability parameters can lead to observed results. Our simulator is therefore valuable to test the capabilities of image analysis methods, as well as to inform experimentalists about the requirements of hypothesis-driven imaging studies.AvailabilityCode: https://github.com/uhlmanngroup/cir4mics. Simulated data: BioStudies S-BSST1058.Supplementary informationSupplementary dataSupplementary data are available at
Categories: Bioinformatics Trends

A cell-level discriminative neural network model for diagnosis of blood cancers

Bioinformatics Oxford Journals - Tue, 26/09/2023 - 5:30am
AbstractMotivationPrecise identification of cancer cells in patient samples is essential for accurate diagnosis and clinical monitoring but has been a significant challenge in machine learning approaches for cancer precision medicine. In most scenarios, training data are only available with disease annotation at the subject or sample level. Traditional approaches separate the classification process into multiple steps that are optimized independently. Recent methods either focus on predicting sample-level diagnosis without identifying individual pathologic cells or are less effective for identifying heterogeneous cancer cell phenotypes.ResultsWe developed a generalized end-to-end differentiable model, the Cell Scoring Neural Network (CSNN), which takes sample-level training data and predicts the diagnosis of the testing samples and the identity of the diagnostic cells in the sample, simultaneously. The cell-level density differences between samples are linked to the sample diagnosis, which allows the probabilities of individual cells being diagnostic to be calculated using backpropagation. We applied CSNN to two independent clinical flow cytometry datasets for leukemia diagnosis. In both qualitative and quantitative assessments, CSNN outperformed preexisting neural network modeling approaches for both cancer diagnosis and cell-level classification. Post hoc decision trees and 2D dot plots were generated for interpretation of the identified cancer cells, showing that the identified cell phenotypes match the cancer endotypes observed clinically in patient cohorts. Independent data clustering analysis confirmed the identified cancer cell populations.AvailabilityThe source code of CSNN and datasets used in the experiments are publicly available on GitHub (http://github.com/erobl/csnn). Raw FCS files can be downloaded from FlowRepository (ID: FR-FCM-Z6YK).Supplementary informationSupplementary dataSupplementary data are available on GitHub and at Bioinformatics online.
Categories: Bioinformatics Trends

CrMP-Sol database: classification, bioinformatic analyses and comparison of cancer-related membrane proteins and their water-soluble variant designs

BMC Bioinformatics - Mon, 25/09/2023 - 5:30am
Membrane proteins are critical mediators for tumor progression and present enormous therapeutic potentials. Although gene profiling can identify their cancer-specific signatures, systematic correlations betwee...
Categories: Bioinformatics Trends

Fast and sensitive validation of fusion transcripts in whole-genome sequencing data

BMC Bioinformatics - Sat, 23/09/2023 - 5:30am
In cancer, genomic rearrangements can create fusion genes that either combine protein-coding sequences from two different partner genes or place one gene under the control of the promoter of another gene. Thes...
Categories: Bioinformatics Trends

DeepCCI: a deep learning framework for identifying cell-cell interactions from single-cell RNA sequencing data

Bioinformatics Oxford Journals - Sat, 23/09/2023 - 5:30am
AbstractMotivationCell-cell interactions (CCIs) play critical roles in many biological processes such as cellular differentiation, tissue homeostasis and immune response. With the rapid development of high throughput single-cell RNA sequencing (scRNA-seq) technologies, it is of high importance to identify CCIs from the ever-increasing scRNA-seq data. However, limited by the algorithmic constraints, current computational methods based on statistical strategies ignore some key latent information contained in scRNA-seq data with high sparsity and heterogeneity.ResultsHere, we developed a deep learning framework named DeepCCI to identify meaningful CCIs from scRNA-seq data. Applications of DeepCCI to a wide range of publicly available datasets from diverse technologies and platforms demonstrate its ability to predict significant CCIs accurately and effectively. Powered by the flexible and easy-to-use software, DeepCCI can provide the one-stop solution to discover meaningful intercellular interactions and build CCI networks from scRNA-seq data.AvailabilityThe source code of DeepCCI is available online at https://github.com/JiangBioLab/DeepCCI.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

MuDCoD: Multi-Subject Community Detection in Personalized Dynamic Gene Networks from Single Cell RNA Sequencing

Bioinformatics Oxford Journals - Sat, 23/09/2023 - 5:30am
AbstractMotivationWith the wide availability of single-cell RNA-seq (scRNA-seq) technology, population-scale scRNA-seq datasets across multiple individuals and time points are emerging. While the initial investigations of these datasets tend to focus on standard analysis of clustering and differential expression, leveraging the power of scRNA-seq data at the personalized dynamic gene co-expression network level has the potential to unlock subject and/or time-specific network-level variation, which is critical for understanding phenotypic differences. Community detection from co-expression networks of multiple time points or conditions has been well-studied; however, none of the existing settings included networks from multiple subjects and multiple time points simultaneously. To address this, we develop MuDCoD for multi-subject community detection in personalized dynamic gene networks from scRNA-seq. MuDCoD builds on the spectral clustering framework and promotes information sharing among the networks of the subjects as well as networks at different time points. It clusters genes in the personalized dynamic gene networks and reveals gene communities that are variable or shared not only across time but also among subjects.ResultsEvaluation and benchmarking of MuDCoD against existing approaches reveal that MuDCoD effectively leverages apparent shared signals among networks of the subjects at individual time points, and performs robustly when there is no or little information sharing among the networks. Applications to population-scale scRNA-seq datasets of human-induced pluripotent stem cells during dopaminergic neuron differentiation and CD4+ T cell activation indicate that MuDCoD enables robust inference for identifying time-varying personalized gene modules. Our results illustrate how personalized dynamic community detection can aid in the exploration of subject-specific biological processes that vary across time.AvailabilityMuDCoD is publicly available at https://github.com/bo1929/MuDCoD as a Python package. Implementation includes simulation and real-data experiments together with extensive documentation.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Designing and development of multi-epitope chimeric vaccine against Helicobacter pylori by exploring its entire immunogenic epitopes: an immunoinformatic approach

BMC Bioinformatics - Fri, 22/09/2023 - 5:30am
Helicobacter pylori is a prominent causative agent of gastric ulceration, gastric adenocarcinoma and gastric lymphoma and have been categorised as a group 1 carcinogen by WHO. The treatment of H. pylori with prot...
Categories: Bioinformatics Trends

Identification of plant vacuole proteins by using graph neural network and contact maps

BMC Bioinformatics - Fri, 22/09/2023 - 5:30am
Plant vacuoles are essential organelles in the growth and development of plants, and accurate identification of their proteins is crucial for understanding their biological properties. In this study, we develo...
Categories: Bioinformatics Trends

GOAT: Gene-level biomarker discovery from multi-Omics data using graph ATtention neural network for eosinophilic asthma subtype

Bioinformatics Oxford Journals - Fri, 22/09/2023 - 5:30am
AbstractMotivationAsthma is a heterogeneous disease where various subtypes are established and molecular biomarkers of the subtypes are yet to be discovered. Recent availability of multi-omics data paved a way to discover molecular biomarkers for the subtypes. However, multi-omics biomarker discovery is challenging because of the complex interplay between different omics layers.ResultsWe propose a deep attention model named Gene-level biomarker discovery from multi-Omics data using graph ATtention neural network (GOAT) for identifying molecular biomarkers for eosinophilic asthma (EA) subtypes with multi-omics data. GOAT identifies genes that discriminate subtypes using a graph neural network by modeling complex interactions among genes as the attention mechanism in the deep learning model. In experiments with multi-omics profiles of the COREA asthma cohort of 300 patients, GOAT outperforms existing models and suggests interpretable biological mechanisms underlying asthma subtypes. Importantly, GOAT identified genes that are distinct only in terms of relationship with other genes through attention. To better understand the role of biomarkers, we further investigated two transcription factors (TFs), CTNNB1 and JUN, captured by GOAT. We were successful in showing the role of the TFs in EA pathophysiology in a network propagation and transcriptional network analysis, which were not distinct in terms of gene expression level differences.Availabilityhttps://github.com/DabinJeong/Multi-omics_biomarker.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends


Subscribe to Centre for Bioinformatics aggregator - Bioinformatics Trends


September 2023