Jump to Navigation
Subscribe to Bioinformatics Oxford Journals feed
Updated: 7 hours 29 min ago

Block Aligner: an adaptive SIMD-accelerated aligner for sequences and position-specific scoring matrices

Thu, 03/08/2023 - 5:30am
AbstractMotivationEfficiently aligning sequences is a fundamental problem in bioinformatics. Many recent algorithms for computing alignments through Smith-Waterman-Gotoh dynamic programming exploit Single Instruction Multiple Data operations on modern CPUs for speed. However, these advances have largely ignored difficulties associated with efficiently handling complex scoring matrices or large gaps (insertions or deletions).ResultsWe propose a new SIMD-accelerated algorithm called Block Aligner for aligning nucleotide and protein sequences against other sequences or position-specific scoring matrices. We introduce a new paradigm that uses blocks in the dynamic programming matrix that greedily shift, grow, and shrink. This approach allows regions of the dynamic programming matrix to be adaptively computed. Our algorithm reaches over 5-10 times faster than some previous methods while incurring an error rate of less than 3% on protein and long read datasets, despite large gaps and low sequence identities.AvailabilityOur algorithm is implemented for global, local, and X-drop alignments. It is available as a Rust library (with C bindings) at https://github.com/Daniel-Liu-c0deb0t/block-aligner.
Categories: Bioinformatics Trends

NCOurd: Modelling length distributions of NCO events and gene conversion tracts

Thu, 03/08/2023 - 5:30am
AbstractMotivationMeiotic recombination is the main driving force of human genetic diversity, along with mutations. Recombinations split into crossovers, separating large chromosomal regions originating from different homologous chromosomes, and non-crossovers (NCOs), where a small segment from one chromosome is embedded in a region originating from the homologous chromosome. NCOs are much less studied than mutations and crossovers as NCOs are short and can only be detected at markers heterozygous in the transmitting parent, leaving most of them undetectable.ResultsThe detectable NCOs, known as gene conversions, hide information about NCOs, including their number and length, waiting to be unveiled. We introduce NCOurd, software and algorithm, based on an expectation maximisation algorithm, to estimate the number of NCOs and their length distribution from gene conversion data.Availabilityhttps://github.com/DecodeGenetics/NCOurdSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

KOunt—A reproducible KEGG orthologue abundance workflow

Thu, 03/08/2023 - 5:30am
AbstractSummaryAccurate gene prediction is essential for successful metagenome analysis. We present KOunt, a Snakemake pipeline, that precisely quantifies KEGG orthologue abundance.Availability and implementationKOunt is available on GitHub: https://github.com/WatsonLab/KOunt. The KOunt reference database is available on figshare: https://doi.org/10.6084/m9.figshare.21269715. Test data are available at https://doi.org/10.6084/m9.figshare.22250152 and version 1.2.0 of KOunt at https://doi.org/10.6084/m9.figshare.23607834.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

XGDAG: eXplainable Gene–Disease Associations via Graph Neural Networks

Wed, 02/08/2023 - 5:30am
AbstractMotivationDisease gene prioritization consists in identifying genes that are likely to be involved in the mechanisms of a given disease, providing a ranking of such genes. Recently, the research community has used computational methods to uncover unknown gene–disease associations; these methods range from combinatorial to machine learning-based approaches. In particular, during the last years, approaches based on deep learning have provided superior results compared to more traditional ones. Yet, the problem with these is their inherent black-box structure, which prevents interpretability.ResultsWe propose a new methodology for disease gene discovery, which leverages graph-structured data using graph neural networks (GNNs) along with an explainability phase for determining the ranking of candidate genes and understanding the model’s output. Our approach is based on a positive–unlabeled learning strategy, which outperforms existing gene discovery methods by exploiting GNNs in a non-black-box fashion. Our methodology is effective even in scenarios where a large number of associated genes need to be retrieved, in which gene prioritization methods often tend to lose their reliability.AvailabilityThe source code of XGDAG is available on GitHub at: https://github.com/GiDeCarlo/XGDAGSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

PCGAN: A Generative Approach for Protein Complex Identification from Protein Interaction Networks

Wed, 02/08/2023 - 5:30am
AbstractMotivationProtein complexes are groups of polypeptide chains linked by noncovalent protein-protein interactions (PPIs), which play important roles in biological systems and perform numerous functions, including DNA transcription, mRNA translation, and signal transduction. In the past decade, a number of computational methods have been developed to identify protein complexes from protein interaction networks (PINs) by mining dense subnetworks or subgraphs.ResultsIn this paper, different from the existing works, we propose a novel approach for this task based on generative adversarial networks (GANs), which is called PCGAN, meaning identifying Protein Complexes by GAN. With the help of some real complexes as training samples, our method can learn a model to generate new complexes from a PIN. To effectively support model training and testing, we construct two more comprehensive and reliable PINs and a larger gold standard complex set by merging existing ones of the same organism (including human and yeast). Extensive comparison studies indicate that our method is superior to existing protein complex identification methods in terms of various performance metrics. Furthermore, functional enrichment analysis shows that the identified complexes are of high biological significance, which indicates that these generated protein complexes are very possibly real complexes.Availabilityhttps://github.com/yul-pan/PCGAN.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Short-read aligner performance in germline variant identification

Tue, 01/08/2023 - 5:30am
AbstractMotivationRead alignment is an essential first step in the characterization of DNA sequence variation. The accuracy of variant calling results depends not only on the quality of read alignment and variant calling software but also on the interaction between these complex software tools.ResultsIn this review, we evaluate short-read aligner performance with the goal of optimizing germline variant calling accuracy. We examine the performance of three general-purpose short-read aligners – BWA-MEM, Bowtie 2, and Arioc – in conjunction with three germline variant callers: DeepVariant, FreeBayes, and GATK HaplotypeCaller. We discuss the behavior of the read aligners with regard to the data elements on which the variant callers rely, and illustrate how the runtime configurations of these software tools combine to affect variant calling performance.AvailabilityThe quick brown fox jumps over the lazy dog.Supplementary informationSupplementary informationSupplementary information is available at Bioinformatics online.
Categories: Bioinformatics Trends

demuxmix: Demultiplexing oligonucleotide-barcoded single-cell RNA sequencing data with regression mixture models

Tue, 01/08/2023 - 5:30am
AbstractMotivationDroplet-based single-cell RNA sequencing (scRNA-seq) is widely used in biomedical research for interrogating the transcriptomes of single cells on a large scale. Pooling and processing cells from different samples together can reduce costs and batch effects. To pool cells, they are often first labeled with hashtag oligonucleotides (HTOs). These HTOs are sequenced alongside the cells’ RNA in the droplets and subsequently used to computationally assign each droplet to its sample of origin, a process referred to as demultiplexing. Accurate demultiplexing is crucial but can be challenging due to background HTOs, low-quality cells/cell debris, and multiplets.ResultsA new demultiplexing method based on negative binomial regression mixture models is introduced. The method, called demuxmix, implements two significant improvements. First, demuxmix’s probabilistic classification framework provides error probabilities for droplet assignments that can be used to discard uncertain droplets and inform about the quality of the HTO data and the success of the demultiplexing process. Second, demuxmix utilizes the positive association between detected genes in the RNA library and HTO counts to explain parts of the variance in the HTO data resulting in improved droplet assignments. The improved performance of demuxmix compared to existing demultiplexing methods is assessed using real and simulated data. Finally, the feasibility of accurately demultiplexing experimental designs where non-labeled cells are pooled with labeled cells is demonstrated.AvailabilityR/Bioconductor package demuxmix (https://doi.org/doi:10.18129/B9.bioc.demuxmix)Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

USNAP: Fast unique dense region detection and its application to lung cancer

Tue, 01/08/2023 - 5:30am
AbstractMotivationMany real-world problems can be modeled as annotated graphs. Scalable graph algorithms that extract actionable information from such data are in demand since these graphs are large, varying in topology, and have diverse node/edge annotations. When these graphs change over time they create dynamic graphs, and open the possibility to find patterns across different time points. In this paper, we introduce a scalable algorithm that finds unique dense regions across time points in dynamic graphs. Such algorithms have applications in many different areas, including the biological, financial, and social domains.ResultsThere are three important contributions to this manuscript. First, we designed a scalable algorithm, USNAP, to effectively identify dense subgraphs that are unique to a time stamp given a dynamic graph. Importantly, USNAP provides a lower bound of the density measure in each step of the greedy algorithm. Second, insights and understanding obtained from validating USNAP on real data show its effectiveness. While USNAP is domain independent, we applied it to four non-small cell lung cancer (NSCLC) gene expression datasets. Stages in NSCLC were modeled as dynamic graphs, and input to USNAP. Pathway enrichment analyses and comprehensive interpretations from literature show that USNAP identified biologically relevant mechanisms for different stages of cancer progression. Third, USNAP is scalable, and has a time complexity of O(m + mclognc + nclognc), where m is the number of edges, and n is the number of vertices in the dynamic graph; mc is the number of edges, and nc is the number of vertices in the collapsed graph.AvailabilityThe code of USNAP is available at https://www.cs.utoronto.ca/∼juris/data/USNAP22.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Investigating the human and non-obese diabetic mouse MHC class II immunopeptidome using protein language modelling

Tue, 01/08/2023 - 5:30am
AbstractMotivationIdentifying peptides associated with the major histocompability complex class II (MHCII) is a central task in the evaluation of the immunoregulatory function of therapeutics and drug prototypes. MHCII-peptide presentation prediction has multiple biopharmaceutical applications, including the safety assessment of biologics and engineered derivatives in silico, or the fast progression of antigen-specific immunomodulatory drug discovery programs in immune disease and cancer. This has resulted in the collection of large-scale data sets on adaptive immune receptor antigenic responses and MHC-associated peptide proteomics. In parallel, recent deep learning algorithmic advances in protein language modelling (PLM) have shown potential in leveraging large collections of sequence data and improve MHC presentation prediction.ResultsHere, we train a compact transformer model (AEGIS) on human and mouse MHCII immunopeptidome data, including a preclinical murine model, and evaluate its performance on the peptide presentation prediction task. We show that the transformer performs on par with existing deep learning algorithms and that combining datasets from multiple organisms increases model performance. We trained variants of the model with and without MHCII information. In both alternatives, the inclusion of peptides presented by the I-Ag7 MHC class II molecule expressed by the non-obese diabetic (NOD) mice enabled for the first time the accuratein silico prediction of presented peptides in a preclinical type 1 diabetes model organism, which has promising therapeutic applications.AvailabilityThe source code is available at https://github.com/Novartis/AEGIS.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

ggpicrust2: an R package for PICRUSt2 predicted functional profile analysis and visualization

Tue, 01/08/2023 - 5:30am
AbstractSummaryMicrobiome research is now moving beyond the compositional analysis of microbial taxa in a sample. Increasing evidence from large human microbiome studies suggests that functional consequences of changes in the intestinal microbiome may provide more power for studying their impact on inflammation and immune responses. Although 16S rRNA analysis is one of the most popular and a cost-effective method to profile the microbial compositions, marker-gene sequencing cannot provide direct information about the functional genes that are present in the genomes of community members. Bioinformatic tools have been developed to predict microbiome function with 16S rRNA gene data. Among them, PICRUSt2 has become one of the most popular functional profile prediction tools, which generates community-wide pathway abundances. However, no state-of-art inference tools are available to test the differences in pathway abundances between comparison groups. We have developed ggpicrust2, an R package, for analyzing functional profiles derived from 16S rRNA sequencing. This powerful tool enables researchers to conduct extensive differential abundance (DA) analyses and generate visually appealing visualizations that effectively highlight functional signals. With ggpicrust2, users can obtain publishable results and gain deeper insights into the functional composition of their microbial communities.Availability and implementationThe package is open-source under the MIT and file license and is available at CRAN and https://github.com/cafferychen777/ggpicrust2. Its shiny web is available at https://a95dps-caffery-chen.shinyapps.io/ggpicrust2_shiny/.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

MIX-TPI: A flexible prediction framework for TCR-pMHC interactions based on multimodal representations

Tue, 01/08/2023 - 5:30am
AbstractMotivationThe interactions between T-cell receptors (TCR) and peptide-major histocompatibility complex (pMHC) are essential for the adaptive immune system. However, identifying these interactions can be challenging due to the limited availability of experimental data, sequence data heterogeneity, and high experimental validation costs.ResultsTo address this issue, we develop a novel computational framework, named MIX-TPI, to predict TCR-pMHC interactions using amino acid sequences and physicochemical properties. Based on convolutional neural networks, MIX-TPI incorporates sequence-based and physicochemical-based extractors to refine the representations of TCR-pMHC interactions. Each modality is projected into modality-invariant and modality-specific representations to capture the uniformity and diversities between different features. A self-attention fusion layer is then adopted to form the classification module. Experimental results demonstrate the effectiveness of MIX-TPI in comparison with other state-of-the-art methods. MIX-TPI also shows good generalization capability on mutual exclusive evaluation datasets and a paired TCR dataset.Availability and implementationThe source code of MIX-TPI and the test data are available at: https://github.com/Wolverinerine/MIX-TPI.
Categories: Bioinformatics Trends

AlphaPeptStats: an open-source Python package for automated and scalable statistical analysis of mass spectrometry-based proteomics

Tue, 01/08/2023 - 5:30am
AbstractSummaryThe widespread application of mass spectrometry (MS)-based proteomics in biomedical research increasingly requires robust, transparent and streamlined solutions to extract statistically reliable insights. We have designed and implemented AlphaPeptStats, an inclusive python package with currently with broad functionalities for normalization, imputation, visualization, and statistical analysis of label-free proteomics data. It modularly builds on the established stack of Python scientific libraries, and is accompanied by a rigorous testing framework with 98% test coverage. It imports the output of a range of popular search engines. Data can be filtered and normalized according to user specifications. At its heart, AlphaPeptStats provides a wide range of robust statistical algorithms such as t-tests, ANOVA, PCA, hierarchical clustering and multiple covariate analysis—all in an automatable manner. Data visualization capabilities include heat maps, volcano plots, scatter plots in publication-ready format. AlphaPeptStats advances proteomic research through its robust tools that enable researchers to manually or automatically explore complex datasets to identify interesting patterns and outliers.AvailabilityAlphaPeptStats is implemented in Python and part of the AlphaPept framework. It is released under a permissive Apache license. The source code and one-click installers are freely available and on GitHub at https://github.com/MannLabs/alphapeptstats.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Exploitation of surrogate variables in random forests for unbiased analysis of mutual impact and importance of features

Mon, 31/07/2023 - 5:30am
AbstractMotivationRandom forest is a popular machine learning approach for the analysis of high-dimensional data because it is flexible and provides variable importance measures for the selection of relevant features. However, the complex relationships between the features are usually not considered for the selection and thus also neglected for the characterization of the analysed samples.ResultsHere we propose two novel approaches that focus on the mutual impact of features in random forests. Mutual forest impact (MFI) is a relation parameter that evaluates the mutual association of the features to the outcome and, hence, goes beyond the analysis of correlation coefficients. Mutual impurity reduction (MIR) is an importance measure that combines this relation parameter with the importance of the individual features. MIR and MFI are implemented together with testing procedures that generate p-values for the selection of related and important features. Applications to one experimental and various simulated data sets and the comparison to other methods for feature selection and relation analysis show that MFI and MIR are very promising to shed light on the complex relationships between features and outcome. In addition, they are not affected by common biases, e.g. that features with many possible splits or high minor allele frequencies are prefered.AvailabilityThe approaches are implemented in Version 0.3.3 of the R package RFSurrogates that is available at github.com/AGSeifert/RFSurrogates and the data are available at doi.org/10.25592/uhhfdm.12620.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics Online.
Categories: Bioinformatics Trends

ShinyBioHEAT: An interactive Shiny app to identify phenotype driver genes in E. coli and B. subtilis

Mon, 31/07/2023 - 5:30am
AbstractSummaryIn any population under selective pressure, a central challenge is to distinguish the genes that drive adaptation from others which, subject to population variation, harbor many neutral mutations de novo. We recently showed such genes could be identified by supplementing information on mutational frequency with an evolutionary analysis of the likely functional impact of coding variants. This approach improved the discovery of driver genes in both lab-evolved and environmental E. coli strains. To facilitate general adoption, we now developed ShinyBioHEAT, an R Shiny web-based application that enables identification of phenotype driving gene in two commonly used model bacteria, E. coli and B. subtilis, with no specific computational skill requirements. ShinyBioHEAT not only supports transparent and interactive analysis of lab evolution data in E. coli and B. subtilis, but it also creates dynamic visualizations of mutational impact on protein structures, which add orthogonal checks on predicted drivers.AvailabilityCode for ShinyBioHEAT is available at https://github.com/LichtargeLab/ShinyBioHEAT. The Shiny application is additionally hosted at http://bioheat.lichtargelab.org/.
Categories: Bioinformatics Trends

FGCNSurv: dually fused graph convolutional network for multi-omics survival prediction

Mon, 31/07/2023 - 5:30am
AbstractMotivationSurvival analysis is an important tool for modelling time-to-event data, for example, to predict the survival time of patient after a cancer diagnosis or a certain treatment. While deep neural networks work well in standard prediction tasks, it is still unclear how to best utilize these deep models in survival analysis due to the difficulty of modelling right censored data, especially for multi-omics data. Although existing methods have shown the advantage of multi-omics integration in survival prediction, it remains challenging to extract complementary information from different omics and improve the prediction accuracy.ResultsIn this work, we propose a novel multi-omics deep survival prediction approach by dually fused graph convolutional network named FGCNSurv. Our FGCNSurv is a complete generative model from multi-omics data to survival outcome of patients, including feature fusion by a factorized bilinear model, graph fusion of multiple graphs, higher-level feature extraction by Graph convolutional network (GCN) and survival prediction by a Cox proportional hazard model. The factorized bilinear model enables to capture cross-omics features and quantify complex relations from multi-omics data. By fusing single-omics features and the cross-omics features, and simultaneously fusing multiple graphs from different omics, GCN with the generated dually fused graph could capture higher-level features for computing the survival loss in the Cox-PH model. Comprehensive experimental results on real-world datasets with gene expression and microRNA expression data show that the proposed FGCNSurv method outperforms existing survival prediction methods, and imply its ability to extract complementary information for survival prediction from multi-omics data.Availability and ImplementationThe codes are freely available at https://github.com/LiminLi-xjtu/FGCNSurv.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

PS-Net: Human perception-guided segmentation network for EM cell membrane

Fri, 28/07/2023 - 5:30am
AbstractMotivationCell membrane segmentation in electron microscopy (EM) images is a crucial step in EM image processing. However, while popular approaches have achieved performance comparable to that of humans on low-resolution EM datasets, they have shown limited success when applied to high-resolution EM datasets. The human visual system, on the other hand, displays consistently excellent performance on both low and high resolutions. To better understand this limitation, we conducted eye movement and perceptual consistency experiments. Our data showed that human observers are more sensitive to the structure of the membrane while tolerating misalignment, contrary to commonly used evaluation criteria. Additionally, our results indicated that the human visual system processes images in both global-local and coarse-to-fine manners.ResultsBased on these observations, we propose a computational framework for membrane segmentation that incorporates these characteristics of human perception. This framework includes a novel evaluation metric, the perceptual Hausdorff distance (PHD), and an end-to-end network called the PHD-guided segmentation network (PS-Net) that is trained using adaptively tuned PHD loss functions and a multiscale architecture. Our subjective experiments showed that the PHD metric is more consistent with human perception than other criteria, and our proposed PS-Net outperformed state-of-the-art methods on both low and high resolution EM image datasets as well as other natural image datasets.AvailabilityThe code and dataset can be found at https://github.com/EmmaSRH/PS-Net.Supplementary informationSupplementary InformationSupplementary Information for this article is available online.
Categories: Bioinformatics Trends

Molecular Property Prediction by Semantic-invariant Contrastive Learning

Fri, 28/07/2023 - 5:30am
AbstractMotivationContrastive learning has been widely used as pretext tasks for self-supervised pre-trained molecular representation learning models in AI-aided drug design and discovery. However, existing methods that generate molecular views by noise-adding operations for contrastive learning may face the semantic inconsistency problem, which leads to false positive pairs and consequently poor prediction performance.ResultsTo address this problem, in this paper we first propose a semantic-invariant view generation method by properly breaking molecular graphs into fragment pairs. Then, we develop a Fragment-based Semantic-Invariant Contrastive Learning (FraSICL) model based on this view generation method for molecular property prediction. The FraSICL model consists of two branches to generate representations of views for contrastive learning, meanwhile a multi-view fusion and an auxiliary similarity loss are introduced to make better use of the information contained in different fragment-pair views. Extensive experiments on various benchmark datasets show that with the least number of pre-training samples, FraSICL can achieve state-of-the-art performance, compared with major existing counterpart models.AvailabilityThe code is publicly available at https://github.com/ZiqiaoZhang/FraSICL.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

iGRLDTI: An Improved Graph Representation Learning Method for Predicting Drug-Target Interactions over Heterogeneous Biological Information Network

Fri, 28/07/2023 - 5:30am
AbstractMotivationThe task of predicting drug-target interactions (DTIs) plays a significant role in faciliating the development of novel drug discovery. Compared with laboratory-based approaches, computational methods proposed for DTI prediction are preferred due to their high-efficiency and low-cost advantages. Recently, much attention has been attracted to apply different graph neural network (GNN) models to discover underlying DTIs from hetergeneous biological information network (HBIN). Although GNN-based prediction methods achieve better performance, they are prone to encounter the over-smoothing simulation when learning the latent representations of drugs and targets with their rich neighborhood information in HBIN, and thereby reduce the discriminative ability in DTI prediction.ResultsIn this work, an improved graph representation learning method, namely iGRLDTI, is proposed to address the above issue by better capturing more discriminative representations of drugs and targets in a latent feature space. Specifically, iGRLDTI first constructs a HBIN by integrating the biological knowledge of drugs and targets with their interactions. After that, it adopts a node-dependent local smoothing strategy to adaptively decide the propagation depth of each biomolecule in HBIN, thus significantly alleviating over-smoothing by enhancing the discriminative ability of feature represeantions of drugs and targets. Finally, a Gradient Boosting Decision Tree classifier is used by iGRLDTI to predict novel DTIs. Experimental results demonstrate that iGRLDTI yields better performance that several state-of-the-art computational methods on the benchmark dataset. Besides, our case study indicates that iGRLDTI can successfully identify novel DTIs with more distinguishable features of drugs and targets.AvailabilityPython codes and dataset are available at https://github.com/stevejobws/iGRLDTI/.Supplementary informationSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Single-cell Hi-C data Enhancement with Deep Residual and Generative Adversarial Networks

Thu, 27/07/2023 - 5:30am
AbstractMotivationThe spatial genome organization of a eukaryotic cell is important for its function. The development of single-cell technologies for probing the three-dimensional (3D) genome conformation, especially single-cell chromosome conformation capture techniques (ScHi-C), has enabled us to understand genome function better than before. However, due to extreme sparsity and high noise associated with single-cell Hi-C data, it is still difficult to study genome structure and function using the HiC-data of one single cell.ResultsIn this work, we developed a deep learning method ScHiCEDRN based on deep residual networks and generative adversarial networks for the imputation and enhancement of Hi-C data of a single cell. In terms of both image evaluation and Hi-C reproducibility metrics, ScHiCEDRN outperforms the four deep learning methods (DeepHiC, HiCPlus, HiCSR, and Loopenhance) on enhancing the raw single-cell Hi-C data of human and Drosophila. The experiments also show that it can generate single-cell Hi-C data more suitable for identifying topologically associating domain (TAD) boundaries and reconstructing 3D chromosome structures than the existing methods. Moreover, ScHiCEDRN’s performance generalizes well across different single cells and cell types, and it can be applied to improving population Hi-C data.AvailabilityThe source code of ScHiCEDRN is available at the GitHub repository: https://github.com/BioinfoMachineLearning/ScHiCEDRN.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

ICARUS: Flexible protein structural alignment based on Protein Units

Thu, 27/07/2023 - 5:30am
AbstractMotivationAlignment of protein structures is a major problem in structural biology. The first approach commonly used is to consider proteins as rigid bodies. However, alignment of protein structures can be very complex due to conformational variability, or complex evolutionary relationships between proteins such as insertions, circular permutations or repetitions. In such cases, introducing flexibility becomes useful for two reasons: (i) it can help compare two protein chains which adopted two different conformational states, such as due to proteins/ligands interaction or post-translational modifications, and (ii) it aids in the identification of conserved regions in proteins that may have distant evolutionary relationships.ResultsWe propose ICARUS, a new approach for flexible structural alignment based on identification of Protein Units, evolutionarily preserved structural descriptors of intermediate size, between secondary structures and domains. ICARUS significantly outperforms reference methods on a dataset of very difficult structural alignments.Availability and implementationcode is freely available online at https://github.com/DSIMB/ICARUS.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Pages

Calendar

Mon
Tue
Wed
Thu
Fri
Sat
Sun
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
 
September 2023