Jump to Navigation

GENLIB: new function to simulate haplotype transmission in large complex genealogies

Bioinformatics Oxford Journals - Fri, 17/03/2023 - 5:30am
AbstractSummaryFounder populations with deep genealogical data are well suited for investigating genetic variants contributing to diseases. Here, we present a major update of the genealogical analysis R package GENLIB, centered around a new function which can simulate the transmission of haplotypes from founders to probands along very large and complex user-specified genealogies.Availability and implementationThe latest update of the GENLIB package (v1.1.9) contains the new gen.simuHaplo() function and is available on the CRAN repository and from https://github.com/R-GENLIB/GENLIB. Examples can be accessed at https://github.com/R-GENLIB/simuhaplo_functions.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

sBGC-hm: An atlas of secondary metabolite biosynthetic gene clusters from the human gut microbiome

Bioinformatics Oxford Journals - Fri, 17/03/2023 - 5:30am
AbstractSummaryMicrobial secondary metabolites exhibit potential medicinal value. A large number of secondary metabolite biosynthetic gene clusters (BGCs) in the human gut microbiome, which exhibit essential biological activity in microbe-microbe and microbe-host interactions, have not been adequately characterised, making it difficult to prioritise these BGCs for experimental characterization. Here, we present the sBGC-hm, an atlas of secondary metabolite BGCs allows researchers to explore the potential therapeutic benefits of these natural products. One of its key features is the ability to assist in optimizing the BGC structure by utilizing the gene co-occurrence matrix obtained from HMP data. Results are viewable online and can be downloaded as spreadsheets.Availability and implementationThe database is openly available at https://www.wzubio.com/sbgc. The website is powered by Apache 2 server with PHP and MariaDB.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

PATO: Genome-wide prediction of lncRNA–DNA triple helices

Bioinformatics Oxford Journals - Thu, 16/03/2023 - 5:30am
AbstractMotivationLong non-coding RNA (lncRNA) plays a key role in many biological processes. For instance, lncRNA regulates chromatin using different molecular mechanisms, including direct RNA–DNA hybridization via triplexes, cotranscriptional RNA–RNA interactions, and RNA–DNA binding mediated by protein complexes. While the functional annotation of lncRNA transcripts has been widely studied over the last 20 years, barely a handful of tools have been developed with the specific purpose of detecting and evaluating lncRNA–DNA triple helices. What is worse, some of these tools have nearly grown a decade old, making new triplex-centric pipelines depend on legacy software that cannot thoroughly process all the data made available by Next Generation Sequencing (NGS) technologies.ResultsWe present PATO, a modern, fast, and efficient tool for the detection of lncRNA–DNA triplexes that matches NGS processing capabilities. PATO enables the prediction of triple helices at the genome scale and can process in as little as one hour more than 60GB of sequence data using a two-socket server. Moreover, PATO’s efficiency allows a more exhaustive search of the triplex-forming solution space, and so PATO achieves higher levels of prediction accuracy in far less time than other tools in the state of the art.AvailabilitySource code, user manual, and tests are freely available to download under the MIT License at https://github.com/UDC-GAC/pato.
Categories: Bioinformatics Trends

vaRHC: an R package for semi-automation of variant classification in hereditary cancer genes according to ACMG/AMP and gene-specific ClinGen guidelines

Bioinformatics Oxford Journals - Tue, 14/03/2023 - 5:30am
AbstractMotivationGermline variant classification allows accurate genetic diagnosis and risk assessment. However, it is a tedious iterative process integrating information from several sources and types of evidence. It should follow gene-specific (if available) or general updated international guidelines. Thus, it is the main burden of the incorporation of NGS into the clinical setting.ResultsWe created the vaRHC R package to assist the process of variant classification in hereditary cancer by : 1) collecting information from diverse databases; 2) assigning or denying different types of evidence according to updated ACMG/AMP gene-specific criteria for ATM, CDH1, CHEK2, MLH1, MSH2, MSH6, PMS2, PTEN, and TP53 and general criteria for other genes; 3) providing an automated classification of variants using a Bayesian metastructure and considering CanVIG-UK recommendations; 4) optionally printing the output to an .xlsx file. A validation using 659 classified variants demonstrated the robustness of vaRHC, presenting a better criteria assignment than Cancer SIGVAR, an available similar tool.AvailabilityThe source code can be consulted in the GitHub repository (https://github.com/emunte/vaRHC) Additionally, it will be submitted to CRAN soon.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Accurate and efficient protein sequence design through learning concise local environment of residues

Bioinformatics Oxford Journals - Tue, 14/03/2023 - 5:30am
AbstractMotivationComputational protein sequence design has been widely applied in rational protein engineering and increasing the design accuracy and efficiency is highly desired.ResultsHere, we present ProDESIGN-LE, an accurate and efficient approach to protein sequence design. ProDESIGN-LE adopts a concise but informative representation of residue’s local environment and trains a transformer to learn the correlation between local environment of residues and their amino acid types. For a target backbone structure, ProDESIGN-LE uses the transformer to assign an appropriate residue type for each position based on its local environment within this structure, eventually acquiring a designed sequence with all residues fitting well with their local environments. We applied ProDESIGN-LE to design sequences for 68 naturally occurring and 129 hallucinated proteins within 20 seconds per protein on average. The designed proteins have their predicted structures perfectly resembling the target structures with a state-of-the-art average TM-score exceeding 0.80. We further experimentally validated ProDESIGN-LE by designing five sequences for an enzyme, chloramphenicol O-acetyltransferase type III (CAT III), and recombinantly expressing the proteins in E. coli. Of these proteins, three exhibited excellent solubility, and one yielded monomeric species with circular dichroism spectra consistent with the natural CAT III protein.Availability and implementationThe source code of ProDESIGN-LE is available through https://github.com/bigict/ProDESIGN-LESupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Prediction and Curation of Missing Biomedical Identifier Mappings with Biomappings

Bioinformatics Oxford Journals - Tue, 14/03/2023 - 5:30am
AbstractMotivationBiomedical identifier resources (such as ontologies, taxonomies, and controlled vocabularies) commonly overlap in scope and contain equivalent entries under different identifiers. Maintaining mappings between these entries is crucial for interoperability and the integration of data and knowledge. However, there are substantial gaps in available mappings motivating their semi-automated curation.ResultsBiomappings implements a curation workflow for missing mappings which combines automated prediction with human-in-the-loop curation. It supports multiple prediction approaches and provides a web-based user interface for reviewing predicted mappings for correctness, combined with automated consistency checking. Predicted and curated mappings are made available in public, version-controlled resource files on GitHub. Biomappings currently makes available 9,274 curated mappings and 40,691 predicted ones, providing previously missing mappings between widely used identifier resources covering small molecules, cell lines, diseases, and other concepts. We demonstrate the value of Biomappings on case studies involving predicting and curating missing mappings among cancer cell lines as well as small molecules tested in clinical trials. We also present how previously missing mappings curated using Biomappings were contributed back to multiple widely used community ontologies.AvailabilityThe data and code are available under the CC0 and MIT licenses at https://github.com/biopragmatics/biomappings.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

ExamPle: Explainable deep learning framework for the prediction of plant small secreted peptides

Bioinformatics Oxford Journals - Fri, 10/03/2023 - 5:30am
AbstractMotivationPlant Small Secreted Peptides (SSPs) play an important role in plant growth, development, and plant-microbe interactions. Therefore, the identification of SSPs is essential for revealing the functional mechanisms. Over the last few decades, machine learning-based methods have been developed, accelerating the discovery of SSPs to some extent. However, existing methods highly depend on handcrafted feature engineering, which easily ignores the latent feature representations and impacts the predictive performance.ResultsHere, we propose ExamPle, a novel deep learning model using Siamese network and multi-view representation for the explainable prediction of the plant SSPs. Benchmarking comparison results show that our ExamPle performs significantly better than existing methods in the prediction of plant SSPs. Also, our model shows excellent feature extraction ability by using dimension reduction tools. Importantly, by utilizing in silico mutagenesis (ISM) experiments, ExamPle can discover sequence characteristics and identify the contribution of each amino acid. The key novel principle learned by our model is that the head region of the peptide and some specific sequential patterns are strongly associated with the SSPs’ functions. Thus, ExamPle is a competitive model and tool for predicting plant SSPs and designing effective plant SSPs.AvailabilityOur codes and datasets are available at https://github.com/Johnsunnn/ExamPle.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

StonPy: a tool to parse and query collections of SBGN maps in a graph database

Bioinformatics Oxford Journals - Fri, 10/03/2023 - 5:30am
AbstractSummaryThe Systems Biology Graphical Notation (SBGN) has become the de facto standard for the graphical representation of molecular maps. Having rapid and easy access to the content of large collections of maps is necessary to perform semantic or graph-based analysis of these resources. To this end, we propose StonPy, a new tool to store and query SBGN maps in a Neo4j graph database. StonPy notably includes a data model that takes into account all three SBGN languages and a completion module to automatically build valid SBGN maps from query results. StonPy is built as a library that can be integrated into other software and offers a command-line interface that allows users to easily perform all operations.Availability and implementationStonPy is implemented in Python 3 under a GPLv3 license. Its code and complete documentation are freely available from https://github.com/adrienrougny/stonpy.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

SLEMM: million-scale genomic predictions with window-based SNP weighting

Bioinformatics Oxford Journals - Fri, 10/03/2023 - 5:30am
AbstractMotivationThe amount of genomic data is increasing exponentially. Using many genotyped and phenotyped individuals for genomic prediction is appealing yet challenging.ResultsWe present SLEMM, a new software tool, to address the computational challenge. SLEMM builds on an efficient implementation of the stochastic Lanczos algorithm for REML in a framework of mixed models. We further implement SNP weighting in SLEMM to improve its predictions. Extensive analyses on seven public datasets, covering 19 polygenic traits in three plant and three livestock species, showed that SLEMM with SNP weighting had overall the best predictive ability among a variety of genomic prediction methods including GCTA’s empirical BLUP, BayesR, KAML, and LDAK’s BOLT and BayesR models. We also compared the methods using nine dairy traits of ∼300k genotyped cows. All had overall similar prediction accuracies, except that KAML failed to process the data. Additional simulation analyses on up to three million individuals and one million SNPs showed that SLEMM was advantageous over counterparts as for computational performance. Overall, SLEMM can do million-scale genomic predictions with an accuracy comparable to BayesR.AvailabilityThe software is available at https://github.com/jiang18/slemm.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

RNAget: An API to securely retrieve RNA quantifications

Bioinformatics Oxford Journals - Fri, 10/03/2023 - 5:30am
AbstractSummaryLarge-scale sharing of genomic quantification data requires standardized access interfaces. In this Global Alliance for Genomics and Health (GA4GH) project we developed RNAget, an API for secure access to genomic quantification data in matrix form. RNAget provides for slicing matrices to extract desired subsets of data and is applicable to all expression matrix-format data, including RNA-seq and microarrays. Further, it generalizes to quantification matrices of other sequence-based genomics such as ATAC-seq and ChIP-seq.Availability and Implementationhttps://ga4gh-rnaseq.github.io/schema/docs/index.htmlSupplementary informationSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Coverage-preserving sparsification of overlap graphs for long-read assembly

Bioinformatics Oxford Journals - Thu, 09/03/2023 - 5:30am
Abstract Read-overlap-based graph data structures play a central role in computing de novo genome assembly. Most long-read assemblers use Myers’s string graph model to sparsify overlap graphs. Graph sparsification improves assembly contiguity by removing spurious and redundant connections. However, a graph model must be coverage-preserving, i.e., it must ensure that there exist walks in the graph that spell all chromosomes, given sufficient sequencing coverage. This property becomes even more important for diploid genomes, polyploid genomes and metagenomes where there is a risk of losing haplotype-specific information.We develop a novel theoretical framework under which the coverage-preserving properties of a graph model can be analysed. We first prove that de Bruijn graph and overlap graph models are guaranteed to be coverage-preserving. We next show that the standard string graph model lacks this guarantee. The latter result is consistent with the observation made in (Hui et al., 2016) that removal of contained reads, i.e., the reads that are substrings of other reads, can lead to coverage gaps during string graph construction. Our experiments done using simulated long reads from HG002 human diploid genome show that 50 coverage gaps are introduced on average by ignoring contained reads from nanopore datasets. To remedy this, we propose practical heuristics that are well-supported by our theoretical results, and are useful to decide which contained reads should be retained to avoid coverage gaps. Our method retains a small fraction of contained reads (1 – 2%) and closes majority of the coverage gaps.ImplementationSource code is available through GitHub (https://github.com/at-cg/ContainX) and Zenodo with doi: 10.5281/zenodo.7687543
Categories: Bioinformatics Trends

TopEnzyme: A framework and database for structural coverage of the functional enzyme space

Bioinformatics Oxford Journals - Wed, 08/03/2023 - 5:30am
AbstractMotivationTopEnzyme is a database of structural enzyme models created with TopModel and is linked to the SWISS-MODEL repository and AlphaFold Protein Structure Database to provide an overview of structural coverage of the functional enzyme space for over 200,000 enzyme models. It allows the user to quickly obtain representative structural models for 60% of all known enzyme functions.ResultsWe assessed the models with TopScore and contributed 9039 good-quality and 1297 high-quality structures. Furthermore, we compared these models to AlphaFold2 models with TopScore and found that the TopScore differs only by 0.04 on average in favor of AlphaFold2. We tested TopModel and AlphaFold2 for targets not seen in the respective training databases and found that both methods create qualitatively similar structures. When no experimental structures are available, this database will facilitate quick access to structural models across the currently most extensive structural coverage of the functional enzyme space within Swiss-Prot.AvailabilityWe provide a full web interface to the database at https://cpclab.uni-duesseldorf.de/topenzyme/.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

CFAGO: cross-fusion of network and attributes based on attention mechanism for protein function prediction

Bioinformatics Oxford Journals - Wed, 08/03/2023 - 5:30am
AbstractMotivationProtein function annotation is fundamental to understanding biological mechanisms. The abundant genome-scale protein–protein interaction (PPI) networks, together with other protein biological attributes, provide rich information for annotating protein functions. As PPI networks and biological attributes describe protein functions from different perspectives, it is highly challenging to cross-fuse them for protein function prediction. Recently, several methods combine the PPI networks and protein attributes via the graph neural networks (GNNs). However, GNNs may inherit or even magnify the bias caused by noisy edges in PPI networks. Besides, GNNs with stacking of many layers may cause the over-smoothing problem of node representations.ResultsWe develop a novel protein function prediction method, CFAGO, to integrate single-species PPI networks and protein biological attributes via a multi-head attention mechanism. CFAGO is first pre-trained with an encoder-decoder architecture to capture the universal protein representation of the two sources. It is then fine-tuned to learn more effective protein representations for protein function prediction. Benchmark experiments on human and mouse datasets show CFAGO outperforms state-of-the-art single-species network-based methods at least 7.59%, 6.90%, 11.68% in terms of m-AUPR, M-AUPR and Fmax, respectively, demonstrating cross-fusion by multi-head attention mechanism can greatly improve the protein function prediction. We further evaluate the quality of captured protein representations in terms of Davies Bouldin Score, whose results show cross-fused protein representations by multi-head attention mechanism is at least 2.7% better than that of original and concatenated representations. We believe CFAGO is an effective tool for protein function prediction.AvailabilityThe source code of CFAGO and experiments data are available at: http://bliulab.net/CFAGO/.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

NGenomeSyn: an easy-to-use and flexible tool for publication-ready visualization of syntenic relationships across multiple genomes

Bioinformatics Oxford Journals - Wed, 08/03/2023 - 5:30am
AbstractSummaryLarge-scale comparative genomic studies have provided important insights into species evolution and diversity, but also lead to a great challenge to visualize. Quick catching or presenting key information hidden in the vast amount of genomic data and relationships among multiple genomes requires an efficient visualization tool. However, current tools for such visualization remain inflexible in layout and/or require advanced computation skills, especially for visualization of genome-based synteny. Here, we developed an easy-to-use and flexible layout tool, NGenomeSyn (multiple (N) Genome Synteny), for publication-ready visualization of syntenic relationships of the whole genome or local region and genomic features (e.g. repeats, structural variations, genes) across multiple genomes with a high customization. NGenomeSyn provides an easy way for its users to visualize a large amount of data with a rich layout by simply adjusting options for moving, scaling, and rotation of target genomes. Moreover, NGenomeSyn could be applied on the visualization of relationships on non-genomic data with similar input formats.Availability and implementationNGenomeSyn is freely available at GitHub (https://github.com/hewm2008/NGenomeSyn) and Zenodo (https://doi.org/10.5281/zenodo.7645148).Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Genomepy: genes and genomes at your fingertips

Bioinformatics Oxford Journals - Mon, 06/03/2023 - 5:30am
AbstractMotivationAnalyzing a functional genomics experiment, such as ATAC-, ChIP- or RNA-sequencing, requires genomic resources such as a reference genome assembly and gene annotation. These data can generally be retrieved from different organizations and in different versions. Most bioinformatic workflows require the user to supply this genomic data manually, which can be a tedious and error-prone process.ResultsHere we present genomepy, which can search, download, and preprocess the right genomic data for your analysis. Genomepy can search genomic data on NCBI, Ensembl, UCSC and GENCODE, and inspect available gene annotations to enable an informed decision. The selected genome and gene annotation can be downloaded and preprocessed with sensible, yet controllable, defaults. Additional supporting data can be automatically generated or downloaded, such as aligner indexes, genome metadata and blacklists.AvailabilityGenomepy is freely available at https://github.com/vanheeringen-lab/genomepy under the MIT license and can be installed through pip or Bioconda.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

PhaGAA: an integrated web server platform for phage genome annotation and analysis

Bioinformatics Oxford Journals - Mon, 06/03/2023 - 5:30am
AbstractMotivationPhage genome annotation plays a key role in the design of phage therapy. To date, there have been various genome annotation tools for phages, but most of these tools focus on mono-functional annotation and have complex operational processes. Accordingly, comprehensive and user-friendly platforms for phage genome annotation are needed.ResultsHere, we propose PhaGAA, an online integrated platform for phage genome annotation and analysis. By incorporating several annotation tools, PhaGAA is constructed to annotate prophage genome at DNA- and protein-levels and provide the analytical results. Furthermore, PhaGAA could mine and annotate phage genomes from bacterial genome or metagenome. In summary, PhaGAA will be a useful resource for experimental biologists, and help advance the phage synthetic biology in basic and application research.AvailabilityPhaGAA is freely available at http://phage.xialab.info/.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

NDEx IQuery: a multi-method network gene set analysis leveraging the Network Data Exchange

Bioinformatics Oxford Journals - Mon, 06/03/2023 - 5:30am
AbstractMotivationThe investigation of sets of genes using biological pathways is a common task for researchers and is supported by a wide variety of software tools. This type of analysis generates hypotheses about the biological processes active or modulated in a specific experimental context.ResultsThe NDEx Integrated Query (IQuery) is a new tool for network and pathway-based gene set interpretation that complements or extends existing resources. It combines novel sources of pathways, integration with Cytoscape, and the ability to store and share analysis results. The NDEx IQuery web application performs multiple gene set analyses based on diverse pathways and networks stored in Network Data Exchange (NDEx). These include curated pathways from Wikipathways and SIGNOR, published pathway figures from the last 27 years, machine-assembled networks using the INDRA system, and the new NCI-PID v2.0, an updated version of the popular NCI Pathway Interaction Database. NDEx IQuery’s integration with MSigDB and cBioPortal now provides pathway analysis in the context of these two resources.AvailabilityNDEx IQuery is available at https://www.ndexbio.org/iquery and is implemented in Javascript and Java.
Categories: Bioinformatics Trends

Targeting Tumor Heterogeneity: Multiplex-Detection-Based Multiple Instance Learning for Whole Slide Image Classification

Bioinformatics Oxford Journals - Thu, 02/03/2023 - 5:30am
AbstractMotivationMultiple instance learning (MIL) is a powerful technique to classify whole slide images (WSIs) for diagnostic pathology. The key challenge of MIL on WSI classification is to discover the critical instances that trigger the bag label. However, tumor heterogeneity significantly hinders the algorithm’s performance.ResultsHere, we propose a novel multiplex-detection-based multiple instance learning (MDMIL) which targets tumor heterogeneity by multiplex detection strategy and feature constraints among samples. Specifically, the internal query (IQ) generated after the probability distribution analysis and the variational query (VQ) optimized throughout the training process are utilized to detect potential instances in the form of internal and external assistance, respectively. The multiplex detection strategy significantly improves the instance-mining capacity of the deep neural network. Meanwhile, a memory-based contrastive loss is proposed to reach consistency on various phenotypes in the feature space. The novel network and loss function jointly achieve high robustness towards tumor heterogeneity. We conduct experiments on three computational pathology datasets, e.g., CAMELYON16, TCGA-NSCLC, and TCGA-RCC. Benchmarking experiments on the three datasets illustrate that our proposed MDMIL approach achieves superior performance over several existing state-of-the-art methods.AvailabilityMDMIL is available for academic purposes at https://github.com/ZacharyWang-007/MDMIL.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Improving annotation propagation on molecular networks through random walks: Introducing ChemWalker

Bioinformatics Oxford Journals - Thu, 02/03/2023 - 5:30am
AbstractMotivationAnnotation of the mass signals is still the biggest bottleneck for the untargeted mass spectrometry analysis of complex mixtures. Molecular networks are being increasingly adopted by the mass spectrometry community as a tool to annotate large scale experiments. We have previously shown that the process of propagating annotations from spectral library matches on molecular networks can be automated using Network Annotation Propagation (NAP). One of the limitations of NAP is that the information for the spectral matches is only propagated locally, to the first neighbor of a spectral match. Here we show that annotation propagation can be expanded to nodes not directly connected to spectral matches using random walks on graphs, introducing the ChemWalker python library.ResultsSimilarly to NAP, ChemWalker relies on combinatorial in silico fragmentation results, performed by MetFrag, searching biologically relevant databases. Departing from the combination of a spectral network and the structural similarity among candidate structures, we have used MetFusion Scoring function to create a weight function, producing a weighted graph. This graph was subsequently used by the random walk to calculate the probability of ’walking’ through a set of candidates, departing from seed nodes (represented by spectral library matches). This approach allowed the information propagation to nodes not directly connected to the spectral library match. Compared to NAP, ChemWalker has a series of improvements, on running time, scalability and maintainability and is available as a stand alone python package.AvailabilityChemWalker is freely available at https://github.com/computational-chemical-biology/ChemWalkerSupplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

AGC: Compact representation of assembled genomes with fast queries and updates

Bioinformatics Oxford Journals - Thu, 02/03/2023 - 5:30am
AbstractMotivationHigh-quality sequence assembly is the ultimate representation of complete genetic information of an individual. Several ongoing pangenome projects are producing collections of high-quality assemblies of various species. Each project has already generated assemblies of hundreds of gigabytes on disk, greatly impeding the distribution of and access to such rich datasets.ResultsHere we show how to reduce the size of the sequenced genomes by 2 to 3 orders of magnitude. Our tool compresses the genomes significantly better than the existing programs and is much faster. Moreover, its unique feature is the ability to access any contig (or its part) in a fraction of a second and easily append new samples to the compressed collections. Thanks to this, AGC could be useful not only for backup or transfer purposes, but also for routine analysis of pangenome sequences in common pipelines. With the rapidly reduced cost and improved accuracy of sequencing technologies, we anticipate more comprehensive pangenome projects with much larger sample sizes. AGC is likely to become a foundation tool to store, distribute and access pangenome data.AvailabilityThe source code of AGC is available at https://github.com/refresh-bio/agc. The package can be installed via Bioconda at https://anaconda.org/bioconda/agc.Supplementary informationSupplementary dataSupplementary data are available at Bioinformatics online.
Categories: Bioinformatics Trends

Pages

Subscribe to Centre for Bioinformatics aggregator - Bioinformatics Trends

Calendar

Mon
Tue
Wed
Thu
Fri
Sat
Sun
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
 
June 2023