...
Tiffanie M Nelson and Jeffrey H Christiansentoc
Contents
1. Executive Summary2
...
Community input is welcomed at all times, as is the nomination of additional members of the SIG, by either adding comments directly to this google document, or by emailing communities@biocommons.org.au |
Feedback on the proposed components outlined in this initial draft plan is now sought from the SIG and any other Australian researchers or their collaborators undertaking metagenomics and microbiome analyses.
...
Figure 2: Estimates of the increasing number of microbiome analysis studies conducted in Australia To gain an estimate of the number of microbiome analysis studies that have been conducted historically in Australia, a search was conducted of the Scopu for articles with: either A/ ‘shotgun’ or ‘metagenomic’ in the title, abstract, or keyword and ‘Australia’ in the affiliation; or B/ ‘amplicon’ or ‘microbiome’ or ‘microbiota’ or ‘microbial community’ or ‘virome’ in the title, abstract or keyword and genome or sequencing or sequence or genomic or next-generation in the title, abstract or keyword and ‘Australia’ in the affiliation. Articles retrieved from the search were manually reviewed to include only those whose focus included the production of data using a marker gene or metagenomic sequencing method and excluded others whose focus was on developing or evaluating analysis methods or tools. Articles that were retrieved during multiple searches were limited to include only one representative article categorised to either marker gene or shotgun. The complete list of citations including abstracts can be found here. |
In late July 2020, the Australian BioCommons invited over 100 researchers across Australia to participate in a Microbiome Analysis Special Interest Group (SIG). These researchers were identified as having experience in, or interest in, microbiome analysis. The Australian BioCommons sought information from the SIG about each member’s level of expertise, current (and desired) practices and infrastructure used via an on-line survey (number of respondents = 33), and also held an open video conference follow-up to gain further information (minutes and a recording of the meeting are available).
Respondents to the survey and attendees at the meeting collectively indicated they are performing microbiome analyses on both samples from environmental (i.e. marine, freshwater, soil, and air) as well as host-associated (e.g. animals, plants, corals and humans) habitats. The collective responses also indicated that all of the following approaches are being undertaken by Australian researchers:targeted amplicon sequencing, random shotgun sequencing, taxonomic profiling, functional profiling, generating metagenome-assembled genomes (MAGs), phylogenetic analysis, statistical analyses, and novel gene discovery.
...
Based on information received from the SIG members through the survey (n=33), most researchers use a combination of sequencing platforms to generate their data with the most popular being Illumina, Nanopore, and PacBio.
...
For functional classification, the Kyoto Encyclopedia of Genes and Genomes (KEGG) databases (which provides information relating to the functional classification of cells and organisms), is accessed by 72% of survey respondents.
3.3.2 Tools
Based on the survey, approximately 100 software tools, pipelines, or packages were identified as being used by respondents for various stages of the microbiome analysis process. These are listed in Appendix 1 of this document.
The data generated in either amplicon marker gene or shotgun metagenomic surveys present a wide variety of possible analysis pathways/workflows to pursue and there are many options for tools/pipelines or processes at each step of a chosen bioinformatic pathway.
...
Several researchers (36%, n=12) reported that they were not using their preferred tools/pipelines (primarily due to not having access to sufficient computational memory to run these tools - see Section 3.4.1) and instead had resorted to a workaround solution with other tools.
...
Figure 4. Schematic diagram showing the proposed infrastructure to support microbiome analyses, and data flow (D1) Sequence reads or other relevant data are inputs into the Platform for Taxonomic and Functional Microbiome Analyses which provides a command-line interface (CLI)- or graphical user interface (GUI)-based access to tools and workflows for performing amplicon marker gene clustering or metagenome assembly and classification (blue shapes). It is underpinned by sufficient and appropriate computational infrastructure. Closely associated is a data management platform (denoted by the darker green shape) that caters to data management, version control, and association of appropriate (e.g. sample, experimental) metadata with the data files. Outputs of D1 are accessible to both (D2) hosted frameworks to enable researchers to utilise common packages for statistical analysis, visualisation, and exploration of microbiome datasets, and (D3) systems to enable submission/publishing of metagenome-assembled genome files (and sequence read data) to international repositories. Arrows indicate the general flow of data. Thicker arrows indicate increasing data transfer capabilities. See Appendix 1 for a list of tools/pipelines that may be included in D1. Higher resolution image. |
D1 - A platform for performing taxonomic and functional analyses of microbiomes;
To address objective 1 (i.e. providing Australian researchers with access to a selection of tools and workflows underpinned by computational resources that allow taxonomic and functional analyses of microbiomes (whether they be derived from amplicon/targeted or shotgun/metagenomics based sequencing approaches) to be performed), it is proposed to implement a platform in Australia, that:
...
D2. Systems to enable statistical analyses and visualisations of microbial community data:
To address objective 2 (i.e. to make it easier for Australian researchers to perform statistical and visualisation analyses of microbiome data), it is proposed to implement:
...
D3 - Systems to enable submission of raw sequencing reads and metagenome-assembled genome files from Australia to appropriate global repositories:
To address objective 3 (i.e. to make it easier to publish high quality and share final raw metagenome-assembled genomes (and relevant input data) in accordance with best-practice open science guidelines) it is proposed to implement:
A temporary ‘staging post’ in Australia for metagenome and microbiome (and sequence read) files ready for public international release. The system should include data/metadata formatting checks (which would be enabled by the use of the data management platforms described in D1-E), and support as detailed in D1-F;
Includes a rapid data transfer from the data management platform or the sharing platform to NCBI and/or ENA; and,
Documentation on how to use the system (including a knowledgebase with community-contributed content).
...
Component | Planned dates for delivery | Notes |
D1-Aa. Key tools/workflows installed as modules and optimised for CLI access across a variety of Tier 1 and Tier 2 HPC infrastructures. | Ongoing | As of November 2020, 6 of the tools listed in Appendix 1 (graftm, groopm, metacv, QIIME, QIIME2.0, SortMeRna) are installed as modules on QRIScloud/UQ-RCC HPC machines (Tinaroo, Awoonga, FlashLite). Installation of further tools as modules across NCI, Pawsey, and QRIScloud/UQ-RCC infrastructures to support microbiome analysis is being undertaken in the BioCommons. Preliminary discussions have been held with the MGnify group at EBI to install and host a MGnify (which offers specialised workflows for three different data types: amplicon, raw metagenomic/ metatranscriptomic reads, and assembly) on Australian BioCommons associated infrastructure, as well as the Marine Metagenomics group from ELIXIR-Norway surrounding the local installation of the Meta-Pipe workflow (for pre-processing, assembly, taxonomic classification and functional analysis of marine metagenomics data). |
D1-Aa. CLI platform appropriately resourced for performing microbiome analyses | Ongoing | BioCommons partner infrastructures at NCI, Pawsey, and QCIF include machines that are capable of performing any part of microbiome analysis. This includes FlashLite at QCIF/UQ which can be structured to allow ‘supernodes’ of up to 8TB) Enabling increased access to partner HPC systems via mechanisms other than through the National Computational Merit Allocation Scheme (NCMAS) or partner shares are under active exploration by the BioCommons. |
D1-Ab. Key tools/workflows installed as modules and optimised on Galaxy Australia. | Ongoing | As of November 2020, 4 of the tools listed in Appendix 1 (maxbin2, metaSPAdes, mothur, SortMeRna) are installed on Galaxy Australia. Installation of further tools on Galaxy Australia can be requested by any member of the community at any time. |
D1-Ab. Galaxy Australia appropriately resourced for performing microbiome analyses | Q1 2021 | In addition to the 465 cores at QCIF, UMelb, and Pawsey that currently underpins Galaxy Australia, the Australian BioCommons has secured ARDC funding to purchase an additional minimum of 1x 4TB and 3x 2TB high memory nodes to contribute computational resources to Galaxy Australia. These nodes will be reserved for specific tools requiring high memory, such as those required for MAG assembly. |
D1-Ac. Key tools available as high quality trusted software containers for self-deployment on institutional or independent computational infrastructures | Ongoing | Development of containerised tools to support various life science researcher communities in Australia (including microbiome analysis) is being undertaken in the BioCommons. |
D1-B. Connectable to Nationally available storage (e.g. Cloudstor) | Ongoing | In late 2020, a direct connection between . Streamlined connectivity of Cloudstor storage to Pawsey, QCIF, NCI, and other computational resources will continue in the BioCommons. |
D1-C/D2-B. Appropriate user authorisation and sharing mechanisms | Ongoing | AAF is currently engaged by the BioCommons to explore Access and Authentication Frameworks that will be fit for purpose across all envisaged BioCommons-related platforms and services. |
D1-G. Tool and software workflow documentation with community contributed content. | Ongoing | Tool and workflow documentation for other researcher communities (e.g. de novo genome assembly, and genome annotation) are being organised via an Australian BioCommons Github: https://github.com/australianbiocommons. This avenue is available for the microbiome analysis community. |
D1-H. Training re. containerisation of software tools. | Ongoing | Introductory level training around software containerisation (co-organised by BioCommons and Pawsey) occurred in June/July 2020 and will be repeated throughout 2021, 2022, and 2023. See https://www.biocommons.org.au/events/containers-intro and the Australian for recordings of these events. |
...
Component | Notes |
D1-D. A data management system that is tightly linked to the Microbiome Platforms | Considerations for what may be the best technical solution are ongoing. See Requirements of a Data Management Component of the Australian |
D1-H Training re. taxonomic and functional bioinformatics of shotgun and targeted sequencing projects | Discussions with EBI to potentially deliver microbiome analysis related bioinformatics training events to an Australian audience during 2021 or 2022 have begun. |
D2-A. Hosted frameworks to enable researchers to utilise common packages for statistical analysis, visualisation, and exploration of microbiome datasets | ‘Interactive environments’ offered through the Galaxy platform include R-Studio, JupyterLab, CloudStor SWAN, and Phinch. These are currently available publicly through the European public Galaxy instance (see https://live.usegalaxy.eu/), and are planned for release via Galaxy Australia in Q1 2021. Galaxy Interactive environments may represent an option for this feature. |
D3-A and D3-B. A temporary ‘staging post’ in Australia for metagenome and microbiome (and sequence read) files ready for public international release, with a rapid data transfer from the data management platform or the sharing platform to NCBI and/or ENA | COPO is a GUI-based metadata platform for brokering life science data submissions to various repositories including the ENA (see https://f1000research.com/articles/9-495). It is being adopted by the Darwin Tree of Life project in the UK as the tool to enable the data and metadata submission to ENA to be completed for genome assemblies of over 60,000 species native to the British Isles. The Australian Biocommons is currently exploring whether a locally supported COPO instance can fulfill the requirements of D3-A/D3-B. |
...
Workflow Step | High-level component | Tool | Brief description | Link to data/software or article |
1 | Quality Control | FastQC | Provides a simple way to do some quality control checks on raw sequence data coming from high throughput sequencing pipelines. | |
2 | Preprocessing | BLAST+ | A suite of command line tools to run BLAST which is to search for nucleotide similarities. | |
2 | Preprocessing | ChimeraSlayer | A chimeric sequence detection utility, compatible with near-full length Sanger sequences and shorter 454-FLX sequences (~500 bp). | |
2 | Preprocessing | fastp | Tool designed to provide fast all-in-one preprocessing for FastQ files. | |
2 | Preprocessing | FASTX-Toolkit | A collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing. | |
2 | Preprocessing | FLASH - Fast Length Adjustment of SHort reads | A very fast and accurate software tool to merge paired-end reads from next-generation sequencing experiments. | |
2 | Preprocessing | MultiQC | A reporting tool that parses summary statistics from results and log files generated by other bioinformatics tools. | |
2 | Preprocessing | PANDAseq | A program to align Illumina reads, optionally with PCR primers embedded in the sequence, and reconstruct an overlapping sequence. | |
2 | Preprocessing | PEAR - Paired-End reAd mergeR | A fast and accurate Illumina Paired-End reAd mergeR. | |
2 | Preprocessing | Prinseq | Easy and rapid quality control and data preprocessing of genomic and metagenomic datasets. | |
2 | Preprocessing | Prinseq++ | A program to filter, reformat or trim genomic and metagenomic sequence data. | |
2 | Preprocessing | SortMeRNA | A program tool for filtering, mapping, and OTU-picking NGS reads in metatranscriptomic and metagenomic data. | |
2 | Preprocessing | Tagcleaner | A tool to automatically detect and efficiently remove tag sequences. | |
2 | Preprocessing | Trimmomatic | A flexible read trimming tool for Illumina NGS data. | |
2 | Preprocessing | UCHIME/ UCHIME2 | Chimera detection tool. | |
2 | Preprocessing | VSEARCH | Processes and prepares metagenomics, genomics, and population genomics nucleotide sequence data. | |
3 | OTU/ASV picking clustering | UPARSE | A method for generating clusters (OTUs) from next-generation sequencing reads | |
3 | OTU/ASV picking clustering | USEARCH | A unique sequence analysis tool with thousands of users worldwide. | |
4 | Taxonomic classification | Centrifuge | A very rapid and memory-efficient system for the classification of DNA sequences from microbial samples. | |
4 | Taxonomic classification | Focus | An agile composition based approach using non-negative least squares (NNLS) to report the organisms present in metagenomic samples and profile their abundances. | |
4 | Taxonomic classification | Gist | A statistical classifier for taxonomic inference for mRNA reads | |
4 | Taxonomic classification | graftm | A tool to identify and classify marker genes in short read datasets. | |
4 | Taxonomic classification | GTDB-TK | A computationally efficient and able to classify thousands of draft genomes in parallel. | |
4 | Taxonomic classification | Kraken/ KRAKEN2 | A taxonomic classification system using exact k-mer matches to achieve high accuracy and fast classification speeds. | |
4 | Taxonomic classification | MetaCV | A composition and phylogeny-based algorithm to classify very short metagenomic reads (75-100 bp) into specific taxonomic and functional groups. | |
4 | Taxonomic classification | MetaPhyler | A novel taxonomic classifier for metagenomic shotgun reads, which uses phylogenetic marker genes as a taxonomic reference. | |
4 | Taxonomic classification | PhymmBL | a new classification approach for metagenomics data which uses interpolated Markov models (IMMs) to taxonomically classify DNA sequences, c | |
5 | Sequence assembly | AMOS/ MetAMOS | An open-source, modular assembly pipeline built upon AMOS and tailored specifically for metagenomic next-generation sequencing data | https://genomebiology.biomedcentral.com/articles/10.1186/gb-2011-12-s1-p25 |
5 | Sequence assembly | BinSanity | A suite of scripts designed to cluster contigs generated from metagenomic assembly into putative genomes. | |
5 | Sequence assembly | Flye | A de novo assembler for single molecule sequencing reads, such as those produced by PacBio and Oxford Nanopore Technologies. | |
5 | Sequence assembly | GATB-minia- pipeline | A de novo assembly pipeline for Illumina data. | |
5 | Sequence assembly | groopm | A metagenomics binning suite. | |
5 | Sequence assembly | IDBA-UD | Designed to utilize paired-end reads to assemble low-depth regions and use progressive depth on contigs to reduce errors in high-depth regions. | |
5 | Sequence assembly | MaxBin/ MaxBin2 | A software for binning assembled metagenomic sequences based. | https://toolshed.g2.bx.psu.edu/view/mbernt/maxbin2/cfd50144a871 |
5 | Sequence assembly | MEGAHIT | An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. | |
5 | Sequence assembly | Meta-IDBA | Meta-IDBA algorithm for assembling reads in metagenomic data, which contain multiple genomes from different species | |
5 | Sequence assembly | MetaBAT2 | Clusters metagenomic contigs into different "bins", each of which should correspond to a putative genome. | |
5 | Sequence assembly | MetaCluster | Unsupervised binning method for metagenomic sequences. | |
5 | Sequence assembly | metaSPAdes | A versatile metagenomic assembler | |
5 | Sequence assembly | MetaVelvet | An extension of Velvet assembler to de novo metagenome assembly from short sequence reads | http://metavelvet.dna.bio.keio.ac.jp/ https://pubmed.ncbi.nlm.nih.gov/22821567/ |
5 | Sequence assembly | MIRA | DNA sequence data assembler/mapper for whole genome and EST/RNASeq projects. | http://mira-assembler.sourceforge.net/docs/DefinitiveGuideToMIRA.html#sect_intro_whatismira |
5 | Sequence assembly | S-GSOM | Binning sequences using very sparse labels within a metagenome. | |
5 | Sequence assembly | SOAPdenovo2 | A novel short-read assembly method that can build a de novo draft assembly for the human-sized genomes. | |
5 | Sequence assembly | SPADES - St. Petersburg genome assembler | An assembly toolkit containing various assembly pipelines. | |
5 | Sequence assembly | Unicycler | An assembly pipeline for bacterial genomes. | |
5 | Sequence assembly | Velvet | A de novo genome assembler specially designed for short read sequencing technologies, such as Solexa or 454. | |
6 | Gene prediction and alignment | AMR++ | A bioinformatics pipeline that interfaces with MEGARes to identify and quantify AMR gene accessions contained within a metagenomic sequence dataset. | |
6 | Gene prediction and alignment | BBMap | Splice-aware global aligner for DNA and RNA sequencing reads. It can align reads from all major platforms. | https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbmap-guide/ |
6 | Gene prediction and alignment | BLAT | Accurate and 500 times faster than popular existing tools for mRNA/DNA alignments. | |
6 | Gene prediction and alignment | BMGE - Block Mapping and Gathering with Entropy | Designed to select regions in a multiple sequence alignment that are suited for phylogenetic inference. | https://bmcevolbiol.biomedcentral.com/articles/10.1186/1471-2 |
6 | Gene prediction and alignment | Bowtie/ Bowtie2 | An ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. | http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#getting-started-with-bowtie-2-lambda-phage-example |
6 | Gene prediction and alignment | BWA | A software package for mapping low-divergent sequences against a large reference genome, such as the human genome. | |
6 | Gene prediction and alignment | CD-HIT | A very widely used program for clustering and comparing protein or nucleotide sequences. | |
6 | Gene prediction and alignment | DIAMOND | A sequence aligner for protein and translated DNA searches, designed for high performance analysis of big sequence data. | |
6 | Gene prediction and alignment | GlimmerMG | A system for finding genes in environmental shotgun DNA sequences. | |
6 | Gene prediction and alignment | HMMER | Biosequence analysis using profile hidden Markov models. | |
6 | Gene prediction and alignment | Infernal - INFERence of RNA ALignment | A useful tool for identifying RNAs in metagenomics data sets. | |
6 | Gene prediction and alignment | IQ-TREE | Phylogenetic tree inference by maximum likelihood. | |
6 | Gene prediction and alignment | MAFFT - Multiple Alignment with Fast Fourier Transform | A multiple sequence alignment program. | http://evomics.org/resources/software/bioinformatics-software/mafft/ |
6 | Gene prediction and alignment | mauve | A system for constructing multiple genome alignments in the presence of large-scale evolutionary events such as rearrangement and inversion. | |
6 | Gene prediction and alignment | MetaGene Annotator | A gene-finding program for prokaryote and phage. | |
6 | Gene prediction and alignment | MetaGeneMark | Novel genomic sequences can be analyzed either by the self-training program GeneMarkS(sequences longer than 50 kb) or by GeneMark.hm. | |
6 | Gene prediction and alignment | Minimap2 | A general-purpose alignment program to map DNA or long mRNA sequences against a large reference database. | https://github.com/lh3/minimap2 https://academic.oup.com/bioinformatics/article/34/18/3094/4994778 |
6 | Gene prediction and alignment | MinPath/ MinPath2 | Minimal set of Pathways is for biological pathway reconstructions using protein family predictions. | https://omics.informatics.indiana.edu/MinPath/ http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1000465 |
6 | Gene prediction and alignment | NAST-iEr | Aligns a single raw nucleotide sequence against one or more NAST formatted sequences. | http://microbiomeutil.sourceforge.net/#A_NASTiEr |
6 | Gene prediction and alignment | PhyloSift | A suite of software tools to conduct phylogenetic analysis of genomes and metagenomes. | |
6 | Gene prediction and alignment | PSORTm / PSORTb | For protein subcellular localization prediction (SCL). | |
6 | Gene prediction and alignment | pyani | a Python package and standalone program for calculation of whole-genome similarity measures. | |
6 | Gene prediction and alignment | TETRA | A web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences. | |
6 | Gene prediction and alignment | tRNAscan-SE | The de facto tool for predicting tRNA genes in whole genomes. | |
7 | Annotation prediction | BlastKOALA/ GhostKOALA | An automatic annotation server for genome and metagenome sequences, which perform KO (KEGG Orthology) assignments to characterize individual gene functions and reconstruct KEGG pathways. | https://www.sciencedirect.com/science/article/pii/S002228361500649X |
7 | Annotation prediction | dbCAN | A web server for automated Carbohydrate-active enzyme ANnotation. | |
7 | Annotation prediction | eggNOG- mapper | A tool for fast functional annotation of novel sequences. | |
7 | Annotation prediction | KAAS - KEGG Automatic Annotation Server | Provides functional annotation of genes by BLAST or GHOST comparisons against the manually curated KEGG GENES database. | |
7 | Annotation prediction | KofamKOALA | A web server to assign KEGG Orthologs (KOs) to protein sequences by homology search. | https://www.genome.jp/tools/kofamkoala/ https://academic.oup.com/bioinformatics/article/36/7/2251/5631907 |
7 | Annotation prediction | PICRUSt/ PICRUSt2 | A method to predict approximate functional potential of a community based on marker gene sequencing profiles. | |
7 | Annotation prediction | PROKKA | Annotation tool for bacterial, archaeal, and viral genomes. | |
7 | Annotation prediction | SUPER-FOCUS | A tool for metagenomics functional analysis, and it uses the SEED database. | |
7 | Annotation prediction | Tax4Fun2 | An R-based tool for the rapid prediction of habitat-specific functional profiles and functional redundancy based on 16S rRNA gene marker gene sequences. | |
8 | Assembly Validation | CheckM | A set of tools for assessing the quality of genomes recovered from isolates, single cells, or metagenomes. | |
8 | Assembly Validation | CheckV | For assessing the quality of metagenome-assembled viral genomes. | |
8 | Assembly Validation | CompareM | A software toolkit which supports performing large-scale comparative genomic analyses. It provides statistics across sets of genomes (e.g., amino acid identity) and for individual genomes. | |
8 | Assembly Validation | Valet | Evaluating metagenomic assemblies. | |
9 | Statistical analysis and visualisation | DADA2 | Fast and accurate sample inference from amplicon data with single-nucleotide resolution. | |
9 | Statistical analysis and visualisation | Krona | Allows hierarchical data to be explored with zooming, multi-layered pie charts. | |
9 | Statistical analysis and visualisation | Metagenome Seq | Designed to determine features (be it Operational Taxonomic Unit (OTU), species, etc.) that are differentially abundant between two or more groups. | |
9 | Statistical analysis and visualisation | MetaPath | Identify differentially abundant pathways in metagenomic data-sets. | |
9 | Statistical analysis and visualisation | Phyloseq | A set of classes and tools to facilitate the import, storage, analysis, and graphical display of microbiome census data. | https://www.bioconductor.org/packages/release/bioc/html/phyloseq.html |
10 | Databases | CAzy - Carbohydrate-Active enZYmes Database | Describes the families of structurally-related catalytic and carbohydrate-binding modules (or functional domains) of enzymes that degrade, modify, or create glycosidic bonds. | |
10 | Databases | COG Clusters of Orthologous Groups of proteins | A developed system for delineation of Clusters of Orthologous Groups of proteins (COGs) from the sequenced genomes of prokaryotes and unicellular eukaryotes and the construction of clusters of predicted orthologs. | |
10 | Databases | Cyanorak | Cyanorak Information system is a bioinformatics tool dedicated to the curation, comparison and visualization of genomes of strains belonging to the subsection I, cluster 5, a deeply branching group within the Cyanobacteria phylum. | |
10 | Databases | EBI | European Bioinformatics Institute. | |
10 | Databases | eggNOG | A hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. | http://eggnog5.embl.de/#/app/home |
10 | Databases | FunGuild | A python-based tool that can be used to taxonomically parse fungal OTUs by ecological guilds independent of sequencing platforms or analysis pipelines. | |
10 | Databases | Greengenes | 16S rRNA gene database or experimental datasets. | |
10 | Databases | GTDB | Genome taxonomy database. | |
10 | Databases | InterPro | Functional analysis of proteins by classifying them into families and predicting domains and important sites. | |
10 | Databases | KEGG: Kyoto Encyclopedia of Genes and Genomes KEGG | KEGG is a database resource for understanding high-level functions and utilities of the biological system | |
10 | Databases | KOG eukaryotic orthologous groups (KOGs) | A eukaryote-specific version of the Clusters of Orthologous Groups (COG) tool for identifying ortholog and paralog protein | |
10 | Databases | MAR | Marine databases; MarRef, MarDB and MarCat, which are publicly available resources that promote marine research and innovation. | |
10 | Databases | MEROPS | An information resource for peptidases (also termed proteases, proteinases and proteolytic enzymes) and the proteins that inhibit them. | |
10 | Databases | MetaCyc | A curated database of experimentally elucidated metabolic pathways from all domains of life. | |
10 | Databases | NCBI | National Center for Biotechnology Information. | |
10 | Databases | PANTHER - Protein ANalysis THrough Evolutionary Relationships) | Designed to classify proteins (and their genes) in order to facilitate high-throughput analysis. | |
10 | Databases | Pfam | A large collection of protein families. | |
10 | Databases | PR2 | A reference database of carefully annotated 18S rRNA sequences using eight unique taxonomic fields. | |
10 | Databases | RDP | Provides the research community with aligned and annotated rRNA gene sequence data. | |
10 | Databases | Rfam | A collection of RNA families, each represented by multiple sequence alignments. | |
10 | Databases | SEED | To provide consistent and accurate genome annotations across thousands of genomes and as a platform for discovering and developing de novo annotations. | |
10 | Databases | Silva | A comprehensive, quality checked and regularly updated datasets of aligned small (16S/18S, SSU) and large subunit (23S/28S, LSU) ribosomal RNA (rRNA) sequences for all three domains of life (Bacteria, Archaea and Eukarya). | |
10 | Databases | TARA Oceans | Diversity, evolution and ecology of marine plankton. | https://www.ebi.ac.uk/services/tara-oceans-data http://www.taraoceans-dataportal.org/top/;jsessionid=07217630362165E3CD27AA73D839945D?execution=e1s1 |
10 | Databases | TCDB | A comprehensive IUBMB approved classification system for membrane transport proteins known as the Transporter Classification (TC) system. | |
10 | Databases | TIGRFAM | A resource consisting of curated multiple sequence alignments, Hidden Markov Models (HMMs) for protein sequence classification, and associated information designed to support automated annotation of (mostly prokaryotic) proteins. | |
11 | Other | Anvi'o | An open-source, community-driven analysis and visualization platform for microbial ‘omics. | |
11 | Other | Calypso | An easy-to-use online software, allowing non-expert users to mine, interpret and compare taxonomic information from metagenomic or 16S rDNA datasets. | |
11 | Other | CLC Genomics Workbench | A bioinformatics software solution that allows for comprehensive analysis of your NGS data, including de novo assembly of whole genomes and transcriptomes, resequencing analysis. | |
11 | Other | conda | An open source package management system and environment management system that runs on Windows, macOS and Linux. | |
11 | Other | Galaxy Australia | Galaxy is a web-based analysis and workflow platform. | |
11 | Other | gromacs | A versatile package to perform molecular dynamics. | |
11 | Other | IMG/M | A platform to support the annotation, analysis and distribution of microbial genome and microbiome datasets. | |
11 | Other | Jupyter Notebook | A open-source web application that allows you to create and share documents that contain live code, | |
11 | Other | MEGAN - MEtaGenome ANalyzer | A comprehensive toolbox for interactively analyzing microbiome data. | |
11 | Other | MetaORFA - Metagenomic ORFome Assembly | Metagenomic assembly. | http://allie.dbcls.jp/pair/MetaORFA;Metagenomic+ORFome+Assembly.html |
11 | Other | MetaWRAP | An easy-to-use metagenomic wrapper suite that accomplishes the core tasks of metagenomic analysis from start to finish. | https://github.com/bxlab/metaWRAP https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-018-0541-1 |
11 | Other | MG-RAST | An automatic phylogenetic and functional analysis of metagenomes. | |
11 | Other | MGnify | An analysis, archiving and browsing of metagenomic and metatranscriptomic data. | |
11 | Other | MOCAT/ MOCAT2 | A package for analyzing metagenomics datasets. | |
11 | Other | Mothur | An open-source, expandable software to fill the bioinformatics needs of the microbial ecology community. | |
11 | Other | Nextflow | A scalable and reproducible scientific workflow using software containers. | |
11 | Other | OTUreporter | A modular automated pipeline for the analysis and report of amplicon data. | |
11 | Other | Perl | A general purpose language for getting things done. | |
11 | Other | Python | Programming language | |
11 | Other | QIIME2.0 | Performing microbiome analysis from raw DNA sequencing data. | |
11 | Other | R/R Studio | A development environment for R and Python, with a console, syntax-highlighting editor. | |
11 | Other | RocksDB | A persistent key-value store for flash and RAM storage | |
11 | Other | singularity | Singularity containers can be used to package entire scientific workflows, | |
11 | Other | SOAP - Short Oligonucleotide Analysis Package | A suite of bioinformatics software tools from the BGI Bioinformatics department enabling the assembly, alignment, and analysis of next generation DNA sequencing data. | |
11 | Other | SqueezeMeta | A fully automatic pipeline for metagenomics/metatranscriptomics, covering all steps of the analysis. | https://github.com/jtamames/SqueezeMeta https://www.frontiersin.org/articles/10.3389/fmicb.2018.03349/full#h2 |
11 | Other | VAMPS | A collection of tools for researchers to visualize and analyze data for microbial population structures and distributions. |
A complete list of tools with more details is available here.
Appendix 2
Survey questions posed to the Microbiome Research Community
...