BQsupports: systematic assessment of the support and novelty of new biomedical associations
AbstractMotivationLiving a Big Data era in Biomedicine, there is an unmet need to systematically assess experimental observations in the context of available information. This assessment would offer a means for a comprehensive and robust validation of biomedical data results and provide an initial estimate of the potential novelty of the findings.ResultsHere we present BQsupports, a web-based tool built upon the Bioteque biomedical descriptors that systematically analyzes and quantifies the current support to a given set of observations. The tool relies on over 1,000 distinct types of biomedical descriptors, covering over 11 different biological and chemical entities, including genes, cell lines, diseases and small molecules. By exploring hundreds of descriptors, BQsupports provide support scores for each observation across a wide variety of biomedical contexts. These scores are then aggregated to summarize the biomedical support of the assessed dataset as a whole. Finally, the BQsupports also suggests predictive features of the given dataset, which can be exploited in downstream machine learning applications.AvailabilityThe web application and underlying data are available online (https://bqsupports.irbbarcelona.org).Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends
A machine learning-based quantitative model (LogBB_Pred) to predict the blood-brain barrier permeability (logBB value) of drug compounds
AbstractMotivationEfficient assessment of the blood-brain barrier (BBB) penetration ability of a drug compound is one of the major hurdles in central nervous system (CNS) drug discovery since experimental methods are costly and time-consuming. To advance and elevate the success rate of neurotherapeutic drug discovery, it is essential to develop an accurate computational quantitative model to determine the absolute logBB value (a logarithmic ratio of the concentration of a drug in the brain to its concentration in the blood) of a drug candidate.ResultsHere, we developed a quantitative model (LogBB_Pred) capable of predicting a logBB value of a query compound. The model achieved an R2 of 0.61 on an independent test dataset and outperformed other publicly available quantitative models. When compared with the available qualitative (classification) models that only classified whether a compound is BBB-permeable or not, our model achieved the same accuracy (0.85) with the best qualitative model and far-outperformed other qualitative models (accuracies between 0.64 - 0.70). For further evaluation, our model, quantitative models, and the qualitative models were evaluated on a real-world CNS drug screening library. Our model showed an accuracy of 0.97 while the other models showed an accuracy in the range of 0.29 - 0.83. Consequently, our model can accurately classify BBB-permeable compounds as well as predict the absolute logBB values of drug candidates.Availability and implementationWeb server is freely available on the web at http://ssbio.cau.ac.kr/software/logbb_pred/. The data used in this study is available to download at http://ssbio.cau.ac.kr/software/logbb_pred/dataset.zip.
Categories: Bioinformatics Trends
SCORPIO: a utility for defining and classifying mutation constellations of virus genomes
AbstractSummaryScorpio provides a set of command line utilities for classifying, haplotyping and defining constellations of mutations for an aligned set of genome sequences. It was developed to enable exploration and classification of variants of concern within the SARS-CoV-2 pandemic, but can be applied more generally to other species.Availability and ImplementationScorpio is an open-source project distributed under the GNU GPL version 3 license. Source code and binaries are available at https://github.com/cov-lineages/scorpio, and binaries are also available from Bioconda. SARS-CoV-2 specific definitions can be installed as a separate dependency from https://github.com/cov-lineages/constellations.
Categories: Bioinformatics Trends
AFsample: Improving Multimer Prediction with AlphaFold using Massive Sampling
Abstract The AlphaFold2 neural network model has revolutionized structural biology with unprecedented performance. We demonstrate that by stochastically perturbing the neural network by enabling dropout at inference combined with massive sampling, it is possible to improve the quality of the generated models. We generated around 6,000 models per target compared to 25 default for AlphaFold-Multimer, with v1 and v2 multimer network models, with and without templates, and increased the number of recycles within the network. The method was benchmarked in CASP15, and compared to AlphaFold-Multimer v2 it improved the average DockQ from 0.41 to 0.55 using identical input and was ranked at the very top in the protein assembly category when compared to all other groups participating in CASP15. The simplicity of the method should facilitate the adaptation by the field, and the method should be useful for anyone interested in modelling multimeric structures, alternate conformations or flexible structures.AvailabilityAFsample is available online at http://wallnerlab.org/AFsample.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends
FrameD: Framework for DNA-based Data Storage Design, Verification, and Validation
AbstractMotivationDNA-based data storage is a quickly growing field that hopes to harness the massive theoretical information density of DNA molecules to produce a competitive next-generation storage medium suitable for archival data. In recent years, many DNA-based storage system designs have been proposed. Given that no common infrastructure exists for simulating these storage systems, comparing many different designs along with many different error models is increasingly difficult. To address this challenge we introduce FrameD, a simulation infrastructure for DNA storage systems that leverages the underlying modularity of DNA storage system designs to provide a framework to express different designs while being able to reuse common components.ResultsWe demonstrate the utility of FrameD and the need for a common simulation platform using a case study. Our case study compares designs that utilize strand copies differently, some that align strand copies using Multiple Sequence Alignment (MSA) algorithms and others that do not. We found that the choice to include MSA in the pipeline is dependent on the error rate and the type of errors being injected and is not always beneficial. In addition to supporting a wide range of designs, FrameD provides the user with transparent parallelism to deal with a large number of reads from sequencing and the need for many fault injection iterations. We believe that FrameD fills a void in the tools publicly available to the DNA storage community by providing a modular and extensible framework with support for massive parallelism. As a result, it will help accelerate the design process of future DNA-based storage systems.Availability and implementationThe source code for FrameD along with the data generated during the demonstration of FrameD is available in a public Github repository at https://github.com/dna-storage/framed (10.5281/zenodo.7757762)
Categories: Bioinformatics Trends
BioThings Explorer: a query engine for a federated knowledge graph of biomedical APIs
AbstractSummaryKnowledge graphs are an increasingly common data structure for representing biomedical information. These knowledge graphs can easily represent heterogeneous types of information, and many algorithms and tools exist for querying and analyzing graphs. Biomedical knowledge graphs have been used in a variety of applications, including drug repurposing, identification of drug targets, prediction of drug side effects, and clinical decision support. Typically, knowledge graphs are constructed by centralization and integration of data from multiple disparate sources. Here, we describe BioThings Explorer, an application that can query a virtual, federated knowledge graph derived from the aggregated information in a network of biomedical web services. BioThings Explorer leverages semantically precise annotations of the inputs and outputs for each resource, and automates the chaining of web service calls to execute multi-step graph queries. Because there is no large, centralized knowledge graph to maintain, BioThings Explorer is distributed as a lightweight application that dynamically retrieves information at query time.Availability and implementationMore information can be found at https://explorer.biothings.io, and code is available at https://github.com/biothings/biothings_explorer.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends
INTEGRATE-Circ and INTEGRATE-Vis: Unbiased Detection and Visualization of Fusion-Derived Circular RNA
AbstractMotivationBacksplicing of RNA results in circularized rather than linear transcripts, known as circular RNA. A recently discovered and poorly understood subset of circular RNAs that are composed of multiple genes, termed fusion-derived circular RNAs (fcircRNAs), represent a class of potential biomarkers shown to have oncogenic potential. Detection of fcircRNAs eludes existing analytical tools, making it difficult to more comprehensively assess their prevalence and function. Improved detection methods may lead to additional biological and clinical insights related to fcircRNAs.ResultsWe developed the first unbiased tool for detecting fcircRNAs (INTEGRATE-Circ) and visualizing fcircRNAs (INTEGRATE-Vis) from RNA-Seq data. We found that INTEGRATE-Circ was more sensitive, precise and accurate than other tools based on our analysis of simulated RNA-Seq data and our tool was able to outperform other tools in an analysis of public lymphoblast cell line data. Finally, we were able to validate in vitro three novel fcircRNAs detected by INTEGRATE-Circ in a well characterized breast cancer cell line.AvailabilityOpen source code for INTEGRATE-Circ and INTEGRATE-Vis is available at https://www.github.com/ChrisMaherLab/INTEGRATE-CIRC and https://www.github.com/ChrisMaherLab/INTEGRATE-Vis.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends
mbQTL: An R/Bioconductor Package for Microbial Quantitative Trait Loci (QTL) Estimation
AbstractMotivationIn recent years, significant strides have been made in the field of genomics, with the commencement of large-scale studies aimed at collecting host mutational profiles and microbiome data. The amalgamation of host gene mutational profiles in both healthy and diseased subjects with microbial abundance data holds immense promise in providing insights into several crucial research questions, including the development and progression of diseases, as well as individual responses to therapeutic interventions. With the advent of sequencing methods such as 16 s ribosomal RNA (rRNA) sequencing and whole genome sequencing, there is increasing evidence of interplay of human genetics and microbial communities. Quantitative trait loci associated with microbial abundance (mbQTLs), are genetic variants that influence the abundance of microbial populations within the host.ResultsHere we introduce mbQTL, the first R package integrating 16S ribosomal RNA (rRNA) sequencing and single nucleotide variation (SNV) and single cell polymorphysim (SNP) data. We describe various statistical methods implemented for the identification of microbe-SNV pairs, relevant statistical measures, and plot functionality for interpretation.AvailabilitymbQTL is available on bioconductor at https://bioconductor.org/packages/mbQTL/Supplementary informationSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends
P-DOR, an easy-to-use pipeline to reconstruct bacterial outbreaks using genomics
AbstractSummaryBacterial Healthcare Associated Infections (HAIs) are a major threat worldwide, which can be counteracted by establishing effective infection control measures, guided by constant surveillance and timely epidemiological investigations. Genomics is crucial in modern epidemiology but lacks standard methods and user-friendly software, accessible to users without a strong bioinformatics proficiency. To overcome these issues we developed P-DOR, a novel tool for rapid bacterial outbreak characterization. P-DOR accepts genome assemblies as input, it automatically selects a background of publicly available genomes using k-mer distances and adds it to the analysis dataset before inferring a SNP-based phylogeny. Epidemiological clusters are identified considering the phylogenetic tree topology and SNP distances. By analyzing the SNP-distance distribution, the user can gauge the correct threshold. Patient metadata can be inputted as well, to provide a spatio-temporal representation of the outbreak. The entire pipeline is fast and scalable and can be also run on low-end computers.Availability and implementationP-DOR is implemented in Python3 and R and can be installed using conda environments. It is available from GitHub https://github.com/SteMIDIfactory/P-DOR under the GPL-3.0 license.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends
VoDEx: a Python library for time annotation and management of volumetric functional imaging data
AbstractSummaryIn functional imaging studies, accurately synchronizing the time course of experimental manipulations and stimulus presentations with resulting imaging data is crucial for analysis. Current software tools lack such functionality, requiring manual processing of the experimental and imaging data, which is error-prone and potentially non-reproducible. We present VoDEx, an open-source Python library that streamlines the data management and analysis of functional imaging data. VoDEx synchronizes the experimental timeline and events (eg. presented stimuli, recorded behavior) with imaging data. VoDEx provides tools for logging and storing the timeline annotation, and enables retrieval of imaging data based on specific time-based and manipulation-based experimental conditions.Availability and ImplementationVoDEx is an open-source Python library and can be installed via the ”pip install” command. It is released under a BSD license, and its source code is publicly accessible on GitHub (https://github.com/LemonJust/vodex). A graphical interface is available as a napari-vodex plugin, which can be installed through the napari plugins menu or using ”pip install.” The source code for the napari plugin is available on GitHub (https://github.com/LemonJust/napari-vodex). The software version at the time of submission is archived at Zenodo (version v1.0.18, https://zenodo.org/record/8061531)Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends
ActivePPI: Quantifying Protein-Protein Interaction Network Activity with Markov Random Fields
AbstractMotivationProtein-protein interactions (PPI) are crucial components of the biomolecular networks that enable cells to function. Biological experiments have identified a large number of PPI, and these interactions are stored in knowledge bases. However, these interactions are often restricted to specific cellular environments and conditions. Network activity can be characterized as the extent of agreement between a PPI network (PPIN) and a distinct cellular environment measured by protein mass spectrometry, and it can also be quantified as a statistical significance score. Without knowing the activity of these PPI in the cellular environments or specific phenotypes, it is impossible to reveal how these PPI perform and affect cellular functioning.ResultsTo calculate the activity of PPIN in different cellular conditions, we proposed a PPIN activity evaluation framework named ActivePPI to measure the consistency between network architecture and protein measurement data. ActivePPI estimates the probability density of protein mass spectrometry abundance and models PPIN using a Markov-random-field-based method. Furthermore, empirical P-value is derived based on a nonparametric permutation test to quantify the likelihood significance of the match between PPIN structure and protein abundance data. Extensive numerical experiments demonstrate the superior performance of ActivePPI and result in network activity evaluation, pathway activity assessment, and optimal network architecture tuning tasks. To summarize it succinctly, ActivePPI is a versatile tool for evaluating PPI network that can uncover the functional significance of protein interactions in crucial cellular biological processes and offer further insights into physiological phenomena.AvailabilityAll source code and data are freely available at https://github.com/zpliulab/ActivePPI.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends
cloneRate: fast estimation of single-cell clonal dynamics using coalescent theory
AbstractMotivationWhile evolutionary approaches to medicine show promise, measuring evolution itself is difficult due to experimental constraints and the dynamic nature of body systems. In cancer evolution, continuous observation of clonal architecture is impossible, and longitudinal samples from multiple timepoints are rare. Increasingly available DNA sequencing datasets at single-cell resolution enable the reconstruction of past evolution using mutational history, allowing for a better understanding of dynamics prior to detectable disease. There is an unmet need for an accurate, fast, and easy-to-use method to quantify clone growth dynamics from these datasets.ResultsWe derived methods based on coalescent theory for estimating the net growth rate of clones using either reconstructed phylogenies or the number of shared mutations. We applied and validated our analytical methods for estimating the net growth rate of clones, eliminating the need for complex simulations used in previous methods. When applied to hematopoietic data, we show that our estimates may have broad applications to improve mechanistic understanding and prognostic ability. Compared to clones with a single or unknown driver mutation, clones with multiple drivers have significantly increased growth rates (median 0.94 vs. 0.25 per year; p = 1.6×10−6). Further, stratifying patients with a myeloproliferative neoplasm (MPN) by the growth rate of their fittest clone shows that higher growth rates are associated with shorter time to MPN diagnosis (median 13.9 vs. 26.4 months; p = 0.0026).Availability and ImplementationWe developed a publicly available R package, cloneRate, to implement our methods (Package website: https://bdj34.github.io/cloneRate/). Source code: https://github.com/bdj34/cloneRate/.Supplementary informationSupplementary materialSupplementary material is available at Bioinformatics online.
Categories: Bioinformatics Trends
MoleculeExperiment enables consistent infrastructure for molecule-resolved spatial omics data in Bioconductor
AbstractMotivationImaging-based spatial transcriptomics technologies have achieved subcellular resolution, enabling detection of individual molecules in their native tissue context. Data associated with these technologies promises unprecedented opportunity towards understanding cellular and subcellular biology. However, in R/Bioconductor there is a scarcity of existing computational infrastructure to represent such data, and particularly to summarize and transform it for existing widely adopted computational tools in single cell transcriptomics analysis, including SingleCellExperiment and SpatialExperiment classes. With the emergence of several commercial offerings of imaging-based spatial transcriptomics, there is a pressing need to develop consistent data structure standards for these technologies at the individual molecule level.ResultsTo this end, we have developed MoleculeExperiment, an R/Bioconductor package, which i) stores molecule and cell segmentation boundary information at the molecule-level, ii) standardises this molecule-level information across different imaging-based ST technologies, including 10x Genomics’ Xenium, and iii) streamlines transition from a MoleculeExperiment object to a SpatialExperiment object. Overall, MoleculeExperiment is generally applicable as a data infrastructure class for consistent analysis of molecule-resolved spatial omics data.Availability and implementationThe MoleculeExperiment package is publicly available on Bioconductor at https://bioconductor.org/packages/release/bioc/html/MoleculeExperiment.html. Source code is available on Github at: https://github.com/SydneyBioX/MoleculeExperiment. The vignette for MoleculeExperiment can be found at https://bioconductor.org/packages/release/bioc/html/MoleculeExperiment.html.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends
MuTATE—An R Package for Comprehensive Multi-Objective Molecular Modeling
AbstractMotivationComprehensive multi-omics studies have driven advances in disease modeling for effective precision medicine but pose a challenge for existing machine learning approaches, which have limited interpretability across clinical endpoints. Automated, comprehensive disease modeling requires a machine learning approach that can simultaneously identify disease subgroups and their defining molecular biomarkers by explaining multiple clinical endpoints. Current tools are restricted to individual endpoints or limited variable types, necessitate advanced computation skills, and require resource-intensive manual expert interpretation.ResultsWe developed MuTATE [Multi-Target Automated Tree Engine] for automated and comprehensive molecular modeling which enables user-friendly multi-objective decision tree construction and visualization of relationships between molecular biomarkers and patient subgroups characterized by multiple clinical endpoints. MuTATE incorporates multiple targets throughout model construction and allows for target weights, enabling construction of interpretable decision trees that provide insights into disease heterogeneity and molecular signatures. MuTATE eliminates the need for manual synthesis of multiple non-explainable models, making it highly efficient and accessible for bioinformaticians and clinicians. The flexibility and versatility of MuTATE make it applicable to a wide range of complex diseases, including cancer, where it can improve therapeutic decisions by providing comprehensive molecular insights for precision medicine. MuTATE has the potential to transform biomarker discovery and subtype identification, leading to more effective and personalized treatment strategies in precision medicine, and advancing our understanding of disease mechanisms at the molecular level.Availability and ImplementationMuTATE is freely available at GitHub (https://github.com/SarahAyton/MuTATE) under the GPLv3 license.Supplementary informationSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends
A General Framework for Powerful Confounder Adjustment in Omics Association Studies
AbstractMotivationGenomic data are subject to various sources of confounding, such as demographic variables, biological heterogeneity, and batch effects. To identify genomic features associated with a variable of interest in the presence of confounders, the traditional approach involves fitting a confounder-adjusted regression model to each genomic feature, followed by multiplicity correction.ResultsThis study shows that the traditional approach is sub-optimal and proposes a new two-dimensional false discovery rate control framework (2dFDR+) that provides significant power improvement over the conventional method and applies to a wide range of settings. 2dFDR+ uses marginal independence test statistics as auxiliary information to filter out less promising features, and FDR control is performed based on conditional independence test statistics in the remaining features. 2dFDR+ provides (asymptotically) valid inference from samples in settings where the conditional distribution of the genomic variables given the covariate of interest and the confounders is arbitrary and completely unknown. Promising finite sample performance is demonstrated via extensive simulations and real data applications.Availability and ImplementationR codes and vignettes are available at https://github.com/asmita112358/tdfdr.npSupplementary InformationSupplementary DataSupplementary Data are available at Bioinformatics online.
Categories: Bioinformatics Trends
chem16S: Community-level chemical metrics for exploring genomic adaptation to environments
AbstractSummaryThe chem16S package combines taxonomic classifications of 16S rRNA gene sequences with amino acid compositions of prokaryotic reference proteomes to generate community reference proteomes. Taxonomic classifications from the RDP Classifier or data objects created by the phyloseq R package are supported. Users can calculate and visualize a variety of chemical metrics in order to explore the effects of redox, salinity, and other physicochemical variables on the genomic adaptation of protein sequences at the community level.Availability and implementationDevelopment of chem16S is hosted at https://github.com/jedick/chem16S. Version 1.0.0 is freely available from the Comprehensive R Archive Network (CRAN) at https://cran.r-project.org/package=chem16S.
Categories: Bioinformatics Trends
Aenmd: Annotating escape from nonsense-mediated decay for transcripts with protein-truncating variants
AbstractSummaryDNA changes that cause premature termination codons (PTCs) represent a large fraction of clinically relevant pathogenic genomic variation. Typically, PTCs induce transcript degradation by nonsense-mediated mRNA decay (NMD) and render such changes loss-of-function alleles. However, certain PTC-containing transcripts escape NMD and can exert dominant-negative or gain-of-function (DN/GOF) effects. Therefore, systematic identification of human PTC-causing variants and their susceptibility to NMD contributes to the investigation of the role of DN/GOF alleles in human disease.Here we present aenmd, a software for annotating PTC-containing transcript-variant pairs for predicted escape from NMD. aenmd is user-friendly and self-contained. It of-fers functionality not currently available in other methods and is based on established and experimentally validated rules for NMD escape; the software is designed to work at scale, and to integrate seamlessly with existing analysis workflows. We applied aenmd to variants in the gnomAD, Clinvar, and GWAS catalog databases and report the prevalence of human PTC-causing variants in these databases, and the subset of these variants that could exert DN/GOF effects via NMD escape.Availability and implementationaenmd is implemented in the R programming language. Code is available on GitHub as an R package (github.com/kostkalab/aenmd.git), and as a containerized command-line interface (github.com/kostkalab/aenmd_cli.git).Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends
DataDTA: a multi-feature and dual-interaction aggregation framework for drug-target binding affinity prediction
AbstractMotivationAccurate prediction of drug-target binding affinity (DTA) is crucial for drug discovery. The increase in the publication of large-scale DTA datasets enables the development of various computational methods for DTA prediction. Numerous deep learning-based methods have been proposed to predict affinities, some of which only utilize original sequence information or complex structures, but the effective combination of various information and protein-binding pockets have not been fully mined. Therefore, a new method that integrates available key information is urgently needed to predict DTA and accelerate the drug discovery process.ResultsIn this study, we propose a novel deep learning-based predictor termed DataDTA to estimate the affinities of drug-target pairs. DataDTA utilizes descriptors of predicted pockets and sequences of proteins, as well as low-dimensional molecular features and SMILES strings of compounds as inputs. Specifically, the pockets were predicted from the three-dimensional structure of proteins and their descriptors were extracted as the partial input features for DTA prediction. The molecular representation of compounds based on algebraic graph features was collected to supplement the input information of targets. Furthermore, to ensure effective learning of multiscale interaction features, a dual-interaction aggregation neural network strategy was developed. DataDTA was compared with state-of-the-art methods on different datasets, and the results showed that DataDTA is a reliable prediction tool for affinities estimation. Specifically, the CI of DataDTA is 0.806 and the R value is 0.814 on the test dataset, which is higher than other methods.Availability and implementationThe codes of DataDTA are available at https://github.com/YanZhu06/DataDTA.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends
μ-PBWT: a lightweight r-indexing of the PBWT for storing and querying UK Biobank Data
AbstractMotivationThe positional Burrows-Wheeler Transform (PBWT) is a data structure that indexes haplotype sequences in a manner that enables finding maximal haplotype matches in h sequences containing w variation sites in O(hw)-time. This represents a significant improvement over classical quadratic-time approaches. However, the original PBWT data structure does not allow for queries over Biobank panels that consist of several millions of haplotypes, if an index of the haplotypes must be kept entirely in memory.ResultsIn this paper, we leverage the notion of r-index proposed for the BWT to present a memory efficient method for constructing and storing the run-length encoded PBWT, and computing set maximal matches (SMEMs) queries in haplotype sequences. We implement our method, which we refer to as μ-PBWT, and evaluate it on datasets of 1000 Genome Project and UK Biobank data. Our experiments demonstrate that the μ-PBWT reduces the memory usage up to a factor of 20% compared to the best current PBWT-based indexing. In particular, μ-PBWT produces an index that stores high-coverage whole genome sequencing data of chromosome 20 in about a third of the space of its BCF file. μ-PBWT is an adaptation of techniques for the run-length compressed BWT for the PBWT (RLPBWT) and it is based on keeping in memory only a succinct representation of the RLPBWT that still allows the efficient computation of set maximal matches (SMEMs) over the original panel.AvailabilityOur implementation is open source and available at https://github.com/dlcgold/muPBWT. The binary is available at https://bioconda.github.io/recipes/mupbwt/README.html
Categories: Bioinformatics Trends
ePlatypus: an ecosystem for computational analysis of immunogenomics data
AbstractMotivationThe maturation of systems immunology methodologies requires novel and transparent computational frameworks capable of integrating diverse data modalities in a reproducible manner.ResultsHere, we present the ePlatypus computational immunology ecosystem for immunogenomics data analysis, with a focus on adaptive immune repertoires and single-cell sequencing. ePlatypus is an open-source web-based platform and provides programming tutorials and an integrative database that helps elucidate signatures of B and T cell clonal selection. Furthermore, the ecosystem links novel and established bioinformatics pipelines relevant for single-cell immune repertoires and other aspects of computational immunology such as predicting ligand-receptor interactions, structural modeling, simulations, machine learning, graph theory, pseudotime, spatial transcriptomics and phylogenetics. The ePlatypus ecosystem helps extract deeper insight in computational immunology and immunogenomics and promote open science.AvailabilityPlatypus code used in this manuscript can be found at github.com/alexyermanos/Platypus.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends