Aim:

Process metagenomics data using metadata groups ( --metadata metadata.tsv )to enable the generation of alpha and beta diversity analyses

Interactive HPC session

Open a Terminal (Mac users) or PuTTy (Windows users) and paste the text below into the command prompt to start an Interactive Session:

qsub -I -S /bin/bash -l walltime=4:00:00 -l select=1:ncpus=4:mem=8gb

It should take less than a minute for the interactive session.

Running nfcore/ampliseq with metadata information

Metadata is optional to run the ampliseq pipeline, but for performing downstream analysis such as barplots, diversity indices (alpha and beta diversities) or differential abundance testing, a metadata file is essential.
The public data we are using in the workshop does not have associated metadata information, so we will used an ‘artificially’ created metadata.tsv file that assigns the first 15 samples to a “control” group and the remaining samples to a group called “illumina” (technology used to created the amplicon data.

Let’s create a working folder for this exercise and move to it:

mkdir $HOME/workshop/2025/S1W1/metagenomics/runs/run3_ampliseq_metadata
cd $HOME/workshop/2025/S1W1/metagenomics/runs/run3_ampliseq_metadata

Let’s copy the samplesheet.tsv, launch script and metadata file to the newly created folder:

cp /work/training/2025/S1W1/session3_metagenomics/runs/run3_ampliseq_metadata/samplesheet.tsv .
cp /work/training/2025/S1W1/session3_metagenomics/runs/run3_ampliseq_metadata/launch_nfcore_ampliseq_illumina_metadata.pbs .
cp /work/training/2025/S1W1/session3_metagenomics/runs/run3_ampliseq_metadata/metadata.tsv .

Print the content of the metadata file (e.g., cat metadata.tsv):

ID      condition
Illumina1       control
Illumina2       control
Illumina3       control
Illumina4       control
Illumina5       control
Illumina6       control
Illumina7       control
Illumina8       control
Illumina9       control
Illumina10      control
Illumina11      control
Illumina12      control
Illumina13      control
Illumina14      control
Illumina15      control
Illumina16      illumina
Illumina17      illumina
Illumina18      illumina
Illumina19      illumina
Illumina20      illumina
Illumina21      illumina
Illumina22      illumina
Illumina23      illumina
Illumina24      illumina
Illumina25      illumina
Illumina26      illumina
Illumina27      illumina
Illumina28      illumina
Illumina29      illumina
Illumina30      illumina
Illumina31      illumina
Illumina32      illumina
Illumina33      illumina
Illumina34      illumina
Illumina35      illumina
Illumina36      illumina
Illumina37      illumina
Illumina38      illumina
Illumina39      illumina
Illumina40      illumina
Illumina41      illumina
Illumina42      illumina
Illumina43      illumina
Illumina44      illumina
Illumina45      illumina
Illumina46      illumina
Illumina47      illumina
Illumina48      illumina
Illumina49      illumina
Illumina50      illumina
Illumina51      illumina
Illumina52      illumina
Illumina53      illumina
Illumina54      illumina
Illumina55      illumina
Illumina56      illumina
Illumina57      illumina
Illumina58      illumina
Illumina59      illumina

Print the content of the launch script:

cat launch_nfcore_ampliseq_illumina_metadata.pbs

The parameters:

-r 2.9.0 runs version 2.9.0 of the ampliseq workflow. This is important for version control.
-profile singularity is the type of container we use on the HPC. Nextflow uses containers to run.
--single_end Since we have single-end data, we need to add this parameter. If we had paired-end we don’t need to add anything as paired-end is the default.
--ignore_failed_trimming Some of the samples in the public dataset are poor quality and fail the adapter trimming step. We’re ignoring these in this practice session, but if you have your own dataset you’ll want to address this in other ways (e.g. re-sequence samples, remove as outliers, etc).
--input "data/samplesheet.tsv" The samplesheet you created. Note in this case they must be in a ‘data’ subdirectory, but they can be anywhere you like, which you should then provide the full path for.
--FW_primer "GGATTAGATACCCBRGTAGTC" --RV_primer "TCACGRCACGAGCTGACGAC" The forward and reverse primers used are fromComparison of Illumina versus Nanopore 16S rRNA Gene Sequencing of the Human Nasal Microbiota

The hypervariable V5 and V6 regions (276 base pairs—bp) of the 16S rRNA gene were amplified using the 785F (5′-GGA TTA GAT ACC CBR GTA GTC-3′) and 1061R (5′-TCA CGR CAC GAG CTG ACG AC-3′) primers [20]

--outdir results The output directory for results. You can call this whatever you like.
--metadata Specify the sample “ID” and “Condition” on columns 1 and 2, respectively in a tab-delimited file (see above).

Submit job to the cluster

qsub launch_nfcore_ampliseq_illumina_metadata.pbs

The job will take about on hour to complete.

Find the results for alpha and beta diversity in the ./results/quiime2 folder:

results/qiime2/diversity/
├── alpha_diversity
│   ├── evenness_vector
│   ├── faith_pd_vector
│   ├── observed_features_vector
│   └── shannon_vector
├── beta_diversity
│   ├── bray_curtis_distance_matrix-condition
│   ├── bray_curtis_pcoa_results-PCoA
│   ├── jaccard_distance_matrix-condition
│   ├── jaccard_pcoa_results-PCoA
│   ├── unweighted_unifrac_distance_matrix-condition
│   ├── unweighted_unifrac_pcoa_results-PCoA
│   ├── weighted_unifrac_distance_matrix-condition
│   └── weighted_unifrac_pcoa_results-PCoA
└── WARNING The sampling depth of 500 seems too small for rarefaction.txt

Let’s now inspect precomputed results:

Windows PC: open file finder and type the address below to connect to your home directory in the HPC, and then browse to the /workshop/2025/S1W1/session3_metagenomics folder

\\hpc-fs\home\workshop\2025\S1W1\session3_metagenomics

Mac: open file finder and press “command” + “k” to open prompt, then type the below command, and then browse to the /workshop/2025/S1W1/session3_metagenomics folder

smb://hpc-fs/home/workshop/2025/S1W1/session3_metagenomics

Navigate to the runs/run3_ampliseq_metadata folder:

results/
├── barrnap
│   ├── rrna.arc.gff
│   ├── rrna.bac.gff
│   ├── rrna.euk.gff
│   ├── rrna.mito.gff
│   └── summary.tsv
├── cutadapt
│   ├── cutadapt_summary.tsv
│   ├── Illumina10.trimmed.cutadapt.log
│   ├── Illumina11.trimmed.cutadapt.log
│   ├── Illumina12.trimmed.cutadapt.log
│   ├── Illumina13.trimmed.cutadapt.log
│   ├── Illumina14.trimmed.cutadapt.log
│   ├── Illumina15.trimmed.cutadapt.log
│   ├── Illumina16.trimmed.cutadapt.log
│   ├── Illumina17.trimmed.cutadapt.log
│   ├── Illumina18.trimmed.cutadapt.log
│   ├── Illumina19.trimmed.cutadapt.log
│   ├── Illumina1.trimmed.cutadapt.log
│   ├── Illumina20.trimmed.cutadapt.log
│   ├── Illumina21.trimmed.cutadapt.log
│   ├── Illumina22.trimmed.cutadapt.log
│   ├── Illumina23.trimmed.cutadapt.log
│   ├── Illumina24.trimmed.cutadapt.log
│   ├── Illumina25.trimmed.cutadapt.log
│   ├── Illumina26.trimmed.cutadapt.log
│   ├── Illumina27.trimmed.cutadapt.log
│   ├── Illumina28.trimmed.cutadapt.log
│   ├── Illumina29.trimmed.cutadapt.log
│   ├── Illumina2.trimmed.cutadapt.log
│   ├── Illumina30.trimmed.cutadapt.log
│   ├── Illumina31.trimmed.cutadapt.log
│   ├── Illumina32.trimmed.cutadapt.log
│   ├── Illumina33.trimmed.cutadapt.log
│   ├── Illumina34.trimmed.cutadapt.log
│   ├── Illumina35.trimmed.cutadapt.log
│   ├── Illumina36.trimmed.cutadapt.log
│   ├── Illumina37.trimmed.cutadapt.log
│   ├── Illumina38.trimmed.cutadapt.log
│   ├── Illumina39.trimmed.cutadapt.log
│   ├── Illumina3.trimmed.cutadapt.log
│   ├── Illumina40.trimmed.cutadapt.log
│   ├── Illumina41.trimmed.cutadapt.log
│   ├── Illumina42.trimmed.cutadapt.log
│   ├── Illumina43.trimmed.cutadapt.log
│   ├── Illumina44.trimmed.cutadapt.log
│   ├── Illumina45.trimmed.cutadapt.log
│   ├── Illumina46.trimmed.cutadapt.log
│   ├── Illumina47.trimmed.cutadapt.log
│   ├── Illumina48.trimmed.cutadapt.log
│   ├── Illumina49.trimmed.cutadapt.log
│   ├── Illumina4.trimmed.cutadapt.log
│   ├── Illumina50.trimmed.cutadapt.log
│   ├── Illumina51.trimmed.cutadapt.log
│   ├── Illumina52.trimmed.cutadapt.log
│   ├── Illumina53.trimmed.cutadapt.log
│   ├── Illumina54.trimmed.cutadapt.log
│   ├── Illumina55.trimmed.cutadapt.log
│   ├── Illumina56.trimmed.cutadapt.log
│   ├── Illumina57.trimmed.cutadapt.log
│   ├── Illumina58.trimmed.cutadapt.log
│   ├── Illumina59.trimmed.cutadapt.log
│   ├── Illumina5.trimmed.cutadapt.log
│   ├── Illumina6.trimmed.cutadapt.log
│   ├── Illumina7.trimmed.cutadapt.log
│   ├── Illumina8.trimmed.cutadapt.log
│   └── Illumina9.trimmed.cutadapt.log
├── dada2
│   ├── args
│   ├── ASV_seqs.fasta
│   ├── ASV_table.tsv
│   ├── ASV_tax.silva_138.tsv
│   ├── ASV_tax_species.silva_138.tsv
│   ├── DADA2_stats.tsv
│   ├── DADA2_table.rds
│   ├── DADA2_table.tsv
│   ├── log
│   ├── QC
│   └── ref_taxonomy.silva_138.txt
├── fastqc
│   ├── Illumina10_fastqc.html
│   ├── Illumina11_fastqc.html
│   ├── Illumina12_fastqc.html
│   ├── Illumina13_fastqc.html
│   ├── Illumina14_fastqc.html
│   ├── Illumina15_fastqc.html
│   ├── Illumina16_fastqc.html
│   ├── Illumina17_fastqc.html
│   ├── Illumina18_fastqc.html
│   ├── Illumina19_fastqc.html
│   ├── Illumina1_fastqc.html
│   ├── Illumina20_fastqc.html
│   ├── Illumina21_fastqc.html
│   ├── Illumina22_fastqc.html
│   ├── Illumina23_fastqc.html
│   ├── Illumina24_fastqc.html
│   ├── Illumina25_fastqc.html
│   ├── Illumina26_fastqc.html
│   ├── Illumina27_fastqc.html
│   ├── Illumina28_fastqc.html
│   ├── Illumina29_fastqc.html
│   ├── Illumina2_fastqc.html
│   ├── Illumina30_fastqc.html
│   ├── Illumina31_fastqc.html
│   ├── Illumina32_fastqc.html
│   ├── Illumina33_fastqc.html
│   ├── Illumina34_fastqc.html
│   ├── Illumina35_fastqc.html
│   ├── Illumina36_fastqc.html
│   ├── Illumina37_fastqc.html
│   ├── Illumina38_fastqc.html
│   ├── Illumina39_fastqc.html
│   ├── Illumina3_fastqc.html
│   ├── Illumina40_fastqc.html
│   ├── Illumina41_fastqc.html
│   ├── Illumina42_fastqc.html
│   ├── Illumina43_fastqc.html
│   ├── Illumina44_fastqc.html
│   ├── Illumina45_fastqc.html
│   ├── Illumina46_fastqc.html
│   ├── Illumina47_fastqc.html
│   ├── Illumina48_fastqc.html
│   ├── Illumina49_fastqc.html
│   ├── Illumina4_fastqc.html
│   ├── Illumina50_fastqc.html
│   ├── Illumina51_fastqc.html
│   ├── Illumina52_fastqc.html
│   ├── Illumina53_fastqc.html
│   ├── Illumina54_fastqc.html
│   ├── Illumina55_fastqc.html
│   ├── Illumina56_fastqc.html
│   ├── Illumina57_fastqc.html
│   ├── Illumina58_fastqc.html
│   ├── Illumina59_fastqc.html
│   ├── Illumina5_fastqc.html
│   ├── Illumina6_fastqc.html
│   ├── Illumina7_fastqc.html
│   ├── Illumina8_fastqc.html
│   └── Illumina9_fastqc.html
├── input
│   ├── metadata.tsv
│   └── samplesheet.tsv
├── multiqc
│   ├── multiqc_data
│   ├── multiqc_plots
│   └── multiqc_report.html
├── overall_summary.tsv
├── phyloseq
│   └── dada2_phyloseq.rds
├── pipeline_info
│   ├── execution_report_2025-03-28_10-55-51.html
│   ├── execution_timeline_2025-03-28_10-55-51.html
│   ├── execution_trace_2025-03-28_10-55-51.txt
│   ├── params_2025-03-28_10-56-00.json
│   ├── pipeline_dag_2025-03-28_10-55-51.html
│   └── software_versions.yml
├── qiime2
│   ├── abundance_tables
│   ├── alpha-rarefaction
│   ├── ancom
│   ├── barplot
│   ├── diversity
│   ├── input
│   ├── phylogenetic_tree
│   ├── rel_abundance_tables
│   └── representative_sequences
└── summary_report
    ├── dada2_taxonomic_classification_per_taxonomy_level.svg
    ├── evenness_vector_spearman.svg
    ├── faith_pd_vector_spearman.svg
    ├── observed_features_vector_spearman.svg
    ├── rrna_detection_with_barrnap.svg
    ├── shannon_vector_spearman.svg
    ├── stacked_barchart_of_reads.svg
    ├── summary_report.html
    └── versions.yml

Move to the /results/qiime2/diversity folder and evaluate the alpha diversity results, particularly, open the interactive “index.html” reports for each type of alpha diversity generated:

results/qiime2/diversity/alpha_diversity/
├── evenness_vector
│   ├── column-condition.jsonp
│   ├── dist
│   ├── index.html
│   ├── kruskal-wallis-pairwise-condition.csv
│   ├── metadata.tsv
│   └── q2templateassets
├── faith_pd_vector
│   ├── column-condition.jsonp
│   ├── dist
│   ├── index.html
│   ├── kruskal-wallis-pairwise-condition.csv
│   ├── metadata.tsv
│   └── q2templateassets
├── observed_features_vector
│   ├── column-condition.jsonp
│   ├── dist
│   ├── index.html
│   ├── kruskal-wallis-pairwise-condition.csv
│   ├── metadata.tsv
│   └── q2templateassets
└── shannon_vector
    ├── column-condition.jsonp
    ├── dist
    ├── index.html
    ├── kruskal-wallis-pairwise-condition.csv
    ├── metadata.tsv
    └── q2templateassets

Move to the /results/qiime2/diversity folder and evaluate the beta diversity results, particularly, open the interactive “index.html” reports for each type of beta diversity generated:

results/qiime2/diversity/beta_diversity/
├── bray_curtis_distance_matrix-condition
│   ├── control-boxplots.pdf
│   ├── control-boxplots.png
│   ├── illumina-boxplots.pdf
│   ├── illumina-boxplots.png
│   ├── index.html
│   ├── permanova-pairwise.csv
│   ├── q2templateassets
│   └── raw_data.tsv
├── bray_curtis_pcoa_results-PCoA
│   ├── css
│   ├── emperor.html
│   ├── img
│   ├── index.html
│   ├── js
│   ├── q2templateassets
│   ├── templates
│   └── vendor
├── jaccard_distance_matrix-condition
│   ├── control-boxplots.pdf
│   ├── control-boxplots.png
│   ├── illumina-boxplots.pdf
│   ├── illumina-boxplots.png
│   ├── index.html
│   ├── permanova-pairwise.csv
│   ├── q2templateassets
│   └── raw_data.tsv
├── jaccard_pcoa_results-PCoA
│   ├── css
│   ├── emperor.html
│   ├── img
│   ├── index.html
│   ├── js
│   ├── q2templateassets
│   ├── templates
│   └── vendor
├── unweighted_unifrac_distance_matrix-condition
│   ├── control-boxplots.pdf
│   ├── control-boxplots.png
│   ├── illumina-boxplots.pdf
│   ├── illumina-boxplots.png
│   ├── index.html
│   ├── permanova-pairwise.csv
│   ├── q2templateassets
│   └── raw_data.tsv
├── unweighted_unifrac_pcoa_results-PCoA
│   ├── css
│   ├── emperor.html
│   ├── img
│   ├── index.html
│   ├── js
│   ├── q2templateassets
│   ├── templates
│   └── vendor
├── weighted_unifrac_distance_matrix-condition
│   ├── control-boxplots.pdf
│   ├── control-boxplots.png
│   ├── illumina-boxplots.pdf
│   ├── illumina-boxplots.png
│   ├── index.html
│   ├── permanova-pairwise.csv
│   ├── q2templateassets
│   └── raw_data.tsv
└── weighted_unifrac_pcoa_results-PCoA
    ├── css
    ├── emperor.html
    ├── img
    ├── index.html
    ├── js
    ├── q2templateassets
    ├── templates
    └── vendor

ER-User Guides

25S1W1 - 6. Run full pipeline with metadata

Aim:

Interactive HPC session

Running nfcore/ampliseq with metadata information

Related content