25S1W1 - 6. Run full pipeline with metadata
Aim:
Process metagenomics data using metadata groups ( --metadata metadata.tsv )to enable the generation of alpha and beta diversity analyses
Interactive HPC session
Open a Terminal (Mac users) or PuTTy (Windows users) and paste the text below into the command prompt to start an Interactive Session:
qsub -I -S /bin/bash -l walltime=4:00:00 -l select=1:ncpus=4:mem=8gb
It should take less than a minute for the interactive session.
Running nfcore/ampliseq with metadata information
Metadata is optional to run the ampliseq pipeline, but for performing downstream analysis such as barplots, diversity indices (alpha and beta diversities) or differential abundance testing, a metadata file is essential.
The public data we are using in the workshop does not have associated metadata information, so we will used an ‘artificially’ created metadata.tsv file that assigns the first 15 samples to a “control” group and the remaining samples to a group called “illumina” (technology used to created the amplicon data.
Let’s create a working folder for this exercise and move to it:
mkdir $HOME/workshop/2025/S1W1/metagenomics/runs/run3_ampliseq_metadata
cd $HOME/workshop/2025/S1W1/metagenomics/runs/run3_ampliseq_metadata
Let’s copy the samplesheet.tsv, launch script and metadata file to the newly created folder:
cp /work/training/2025/S1W1/session3_metagenomics/runs/run3_ampliseq_metadata/samplesheet.tsv .
cp /work/training/2025/S1W1/session3_metagenomics/runs/run3_ampliseq_metadata/launch_nfcore_ampliseq_illumina_metadata.pbs .
cp /work/training/2025/S1W1/session3_metagenomics/runs/run3_ampliseq_metadata/metadata.tsv .
Print the content of the metadata file (e.g., cat metadata.tsv):
ID condition
Illumina1 control
Illumina2 control
Illumina3 control
Illumina4 control
Illumina5 control
Illumina6 control
Illumina7 control
Illumina8 control
Illumina9 control
Illumina10 control
Illumina11 control
Illumina12 control
Illumina13 control
Illumina14 control
Illumina15 control
Illumina16 illumina
Illumina17 illumina
Illumina18 illumina
Illumina19 illumina
Illumina20 illumina
Illumina21 illumina
Illumina22 illumina
Illumina23 illumina
Illumina24 illumina
Illumina25 illumina
Illumina26 illumina
Illumina27 illumina
Illumina28 illumina
Illumina29 illumina
Illumina30 illumina
Illumina31 illumina
Illumina32 illumina
Illumina33 illumina
Illumina34 illumina
Illumina35 illumina
Illumina36 illumina
Illumina37 illumina
Illumina38 illumina
Illumina39 illumina
Illumina40 illumina
Illumina41 illumina
Illumina42 illumina
Illumina43 illumina
Illumina44 illumina
Illumina45 illumina
Illumina46 illumina
Illumina47 illumina
Illumina48 illumina
Illumina49 illumina
Illumina50 illumina
Illumina51 illumina
Illumina52 illumina
Illumina53 illumina
Illumina54 illumina
Illumina55 illumina
Illumina56 illumina
Illumina57 illumina
Illumina58 illumina
Illumina59 illumina
Print the content of the launch script:
cat launch_nfcore_ampliseq_illumina_metadata.pbs
The parameters:
-r 2.9.0
runs version 2.9.0 of the ampliseq workflow. This is important for version control.-profile singularity
is the type of container we use on the HPC. Nextflow uses containers to run.--single_end
Since we have single-end data, we need to add this parameter. If we had paired-end we don’t need to add anything as paired-end is the default.--ignore_failed_trimming
Some of the samples in the public dataset are poor quality and fail the adapter trimming step. We’re ignoring these in this practice session, but if you have your own dataset you’ll want to address this in other ways (e.g. re-sequence samples, remove as outliers, etc).--input "data/samplesheet.tsv"
The samplesheet you created. Note in this case they must be in a ‘data’ subdirectory, but they can be anywhere you like, which you should then provide the full path for.--FW_primer "GGATTAGATACCCBRGTAGTC" --RV_primer "TCACGRCACGAGCTGACGAC"
The forward and reverse primers used are fromComparison of Illumina versus Nanopore 16S rRNA Gene Sequencing of the Human Nasal Microbiota
The hypervariable V5 and V6 regions (276 base pairs—bp) of the 16S rRNA gene were amplified using the 785F (5′-GGA TTA GAT ACC CBR GTA GTC-3′) and 1061R (5′-TCA CGR CAC GAG CTG ACG AC-3′) primers [20]
--outdir results
The output directory for results. You can call this whatever you like.--metadata
Specify the sample “ID” and “Condition” on columns 1 and 2, respectively in a tab-delimited file (see above).
Submit job to the cluster
qsub launch_nfcore_ampliseq_illumina_metadata.pbs
The job will take about on hour to complete.
Find the results for alpha and beta diversity in the ./results/quiime2 folder:
results/qiime2/diversity/
├── alpha_diversity
│ ├── evenness_vector
│ ├── faith_pd_vector
│ ├── observed_features_vector
│ └── shannon_vector
├── beta_diversity
│ ├── bray_curtis_distance_matrix-condition
│ ├── bray_curtis_pcoa_results-PCoA
│ ├── jaccard_distance_matrix-condition
│ ├── jaccard_pcoa_results-PCoA
│ ├── unweighted_unifrac_distance_matrix-condition
│ ├── unweighted_unifrac_pcoa_results-PCoA
│ ├── weighted_unifrac_distance_matrix-condition
│ └── weighted_unifrac_pcoa_results-PCoA
└── WARNING The sampling depth of 500 seems too small for rarefaction.txt
Let’s now inspect precomputed results:
Windows PC: open file finder and type the address below to connect to your home directory in the HPC, and then browse to the /workshop/2025/S1W1/session3_metagenomics folder
\\hpc-fs\home\workshop\2025\S1W1\session3_metagenomics
Mac: open file finder and press “command” + “k” to open prompt, then type the below command, and then browse to the /workshop/2025/S1W1/session3_metagenomics folder
smb://hpc-fs/home/workshop/2025/S1W1/session3_metagenomics
Navigate to the runs/run3_ampliseq_metadata folder:
results/
├── barrnap
│ ├── rrna.arc.gff
│ ├── rrna.bac.gff
│ ├── rrna.euk.gff
│ ├── rrna.mito.gff
│ └── summary.tsv
├── cutadapt
│ ├── cutadapt_summary.tsv
│ ├── Illumina10.trimmed.cutadapt.log
│ ├── Illumina11.trimmed.cutadapt.log
│ ├── Illumina12.trimmed.cutadapt.log
│ ├── Illumina13.trimmed.cutadapt.log
│ ├── Illumina14.trimmed.cutadapt.log
│ ├── Illumina15.trimmed.cutadapt.log
│ ├── Illumina16.trimmed.cutadapt.log
│ ├── Illumina17.trimmed.cutadapt.log
│ ├── Illumina18.trimmed.cutadapt.log
│ ├── Illumina19.trimmed.cutadapt.log
│ ├── Illumina1.trimmed.cutadapt.log
│ ├── Illumina20.trimmed.cutadapt.log
│ ├── Illumina21.trimmed.cutadapt.log
│ ├── Illumina22.trimmed.cutadapt.log
│ ├── Illumina23.trimmed.cutadapt.log
│ ├── Illumina24.trimmed.cutadapt.log
│ ├── Illumina25.trimmed.cutadapt.log
│ ├── Illumina26.trimmed.cutadapt.log
│ ├── Illumina27.trimmed.cutadapt.log
│ ├── Illumina28.trimmed.cutadapt.log
│ ├── Illumina29.trimmed.cutadapt.log
│ ├── Illumina2.trimmed.cutadapt.log
│ ├── Illumina30.trimmed.cutadapt.log
│ ├── Illumina31.trimmed.cutadapt.log
│ ├── Illumina32.trimmed.cutadapt.log
│ ├── Illumina33.trimmed.cutadapt.log
│ ├── Illumina34.trimmed.cutadapt.log
│ ├── Illumina35.trimmed.cutadapt.log
│ ├── Illumina36.trimmed.cutadapt.log
│ ├── Illumina37.trimmed.cutadapt.log
│ ├── Illumina38.trimmed.cutadapt.log
│ ├── Illumina39.trimmed.cutadapt.log
│ ├── Illumina3.trimmed.cutadapt.log
│ ├── Illumina40.trimmed.cutadapt.log
│ ├── Illumina41.trimmed.cutadapt.log
│ ├── Illumina42.trimmed.cutadapt.log
│ ├── Illumina43.trimmed.cutadapt.log
│ ├── Illumina44.trimmed.cutadapt.log
│ ├── Illumina45.trimmed.cutadapt.log
│ ├── Illumina46.trimmed.cutadapt.log
│ ├── Illumina47.trimmed.cutadapt.log
│ ├── Illumina48.trimmed.cutadapt.log
│ ├── Illumina49.trimmed.cutadapt.log
│ ├── Illumina4.trimmed.cutadapt.log
│ ├── Illumina50.trimmed.cutadapt.log
│ ├── Illumina51.trimmed.cutadapt.log
│ ├── Illumina52.trimmed.cutadapt.log
│ ├── Illumina53.trimmed.cutadapt.log
│ ├── Illumina54.trimmed.cutadapt.log
│ ├── Illumina55.trimmed.cutadapt.log
│ ├── Illumina56.trimmed.cutadapt.log
│ ├── Illumina57.trimmed.cutadapt.log
│ ├── Illumina58.trimmed.cutadapt.log
│ ├── Illumina59.trimmed.cutadapt.log
│ ├── Illumina5.trimmed.cutadapt.log
│ ├── Illumina6.trimmed.cutadapt.log
│ ├── Illumina7.trimmed.cutadapt.log
│ ├── Illumina8.trimmed.cutadapt.log
│ └── Illumina9.trimmed.cutadapt.log
├── dada2
│ ├── args
│ ├── ASV_seqs.fasta
│ ├── ASV_table.tsv
│ ├── ASV_tax.silva_138.tsv
│ ├── ASV_tax_species.silva_138.tsv
│ ├── DADA2_stats.tsv
│ ├── DADA2_table.rds
│ ├── DADA2_table.tsv
│ ├── log
│ ├── QC
│ └── ref_taxonomy.silva_138.txt
├── fastqc
│ ├── Illumina10_fastqc.html
│ ├── Illumina11_fastqc.html
│ ├── Illumina12_fastqc.html
│ ├── Illumina13_fastqc.html
│ ├── Illumina14_fastqc.html
│ ├── Illumina15_fastqc.html
│ ├── Illumina16_fastqc.html
│ ├── Illumina17_fastqc.html
│ ├── Illumina18_fastqc.html
│ ├── Illumina19_fastqc.html
│ ├── Illumina1_fastqc.html
│ ├── Illumina20_fastqc.html
│ ├── Illumina21_fastqc.html
│ ├── Illumina22_fastqc.html
│ ├── Illumina23_fastqc.html
│ ├── Illumina24_fastqc.html
│ ├── Illumina25_fastqc.html
│ ├── Illumina26_fastqc.html
│ ├── Illumina27_fastqc.html
│ ├── Illumina28_fastqc.html
│ ├── Illumina29_fastqc.html
│ ├── Illumina2_fastqc.html
│ ├── Illumina30_fastqc.html
│ ├── Illumina31_fastqc.html
│ ├── Illumina32_fastqc.html
│ ├── Illumina33_fastqc.html
│ ├── Illumina34_fastqc.html
│ ├── Illumina35_fastqc.html
│ ├── Illumina36_fastqc.html
│ ├── Illumina37_fastqc.html
│ ├── Illumina38_fastqc.html
│ ├── Illumina39_fastqc.html
│ ├── Illumina3_fastqc.html
│ ├── Illumina40_fastqc.html
│ ├── Illumina41_fastqc.html
│ ├── Illumina42_fastqc.html
│ ├── Illumina43_fastqc.html
│ ├── Illumina44_fastqc.html
│ ├── Illumina45_fastqc.html
│ ├── Illumina46_fastqc.html
│ ├── Illumina47_fastqc.html
│ ├── Illumina48_fastqc.html
│ ├── Illumina49_fastqc.html
│ ├── Illumina4_fastqc.html
│ ├── Illumina50_fastqc.html
│ ├── Illumina51_fastqc.html
│ ├── Illumina52_fastqc.html
│ ├── Illumina53_fastqc.html
│ ├── Illumina54_fastqc.html
│ ├── Illumina55_fastqc.html
│ ├── Illumina56_fastqc.html
│ ├── Illumina57_fastqc.html
│ ├── Illumina58_fastqc.html
│ ├── Illumina59_fastqc.html
│ ├── Illumina5_fastqc.html
│ ├── Illumina6_fastqc.html
│ ├── Illumina7_fastqc.html
│ ├── Illumina8_fastqc.html
│ └── Illumina9_fastqc.html
├── input
│ ├── metadata.tsv
│ └── samplesheet.tsv
├── multiqc
│ ├── multiqc_data
│ ├── multiqc_plots
│ └── multiqc_report.html
├── overall_summary.tsv
├── phyloseq
│ └── dada2_phyloseq.rds
├── pipeline_info
│ ├── execution_report_2025-03-28_10-55-51.html
│ ├── execution_timeline_2025-03-28_10-55-51.html
│ ├── execution_trace_2025-03-28_10-55-51.txt
│ ├── params_2025-03-28_10-56-00.json
│ ├── pipeline_dag_2025-03-28_10-55-51.html
│ └── software_versions.yml
├── qiime2
│ ├── abundance_tables
│ ├── alpha-rarefaction
│ ├── ancom
│ ├── barplot
│ ├── diversity
│ ├── input
│ ├── phylogenetic_tree
│ ├── rel_abundance_tables
│ └── representative_sequences
└── summary_report
├── dada2_taxonomic_classification_per_taxonomy_level.svg
├── evenness_vector_spearman.svg
├── faith_pd_vector_spearman.svg
├── observed_features_vector_spearman.svg
├── rrna_detection_with_barrnap.svg
├── shannon_vector_spearman.svg
├── stacked_barchart_of_reads.svg
├── summary_report.html
└── versions.yml
Move to the /results/qiime2/diversity folder and evaluate the alpha diversity results, particularly, open the interactive “index.html” reports for each type of alpha diversity generated:
results/qiime2/diversity/alpha_diversity/
├── evenness_vector
│ ├── column-condition.jsonp
│ ├── dist
│ ├── index.html
│ ├── kruskal-wallis-pairwise-condition.csv
│ ├── metadata.tsv
│ └── q2templateassets
├── faith_pd_vector
│ ├── column-condition.jsonp
│ ├── dist
│ ├── index.html
│ ├── kruskal-wallis-pairwise-condition.csv
│ ├── metadata.tsv
│ └── q2templateassets
├── observed_features_vector
│ ├── column-condition.jsonp
│ ├── dist
│ ├── index.html
│ ├── kruskal-wallis-pairwise-condition.csv
│ ├── metadata.tsv
│ └── q2templateassets
└── shannon_vector
├── column-condition.jsonp
├── dist
├── index.html
├── kruskal-wallis-pairwise-condition.csv
├── metadata.tsv
└── q2templateassets
Move to the /results/qiime2/diversity folder and evaluate the beta diversity results, particularly, open the interactive “index.html” reports for each type of beta diversity generated:
results/qiime2/diversity/beta_diversity/
├── bray_curtis_distance_matrix-condition
│ ├── control-boxplots.pdf
│ ├── control-boxplots.png
│ ├── illumina-boxplots.pdf
│ ├── illumina-boxplots.png
│ ├── index.html
│ ├── permanova-pairwise.csv
│ ├── q2templateassets
│ └── raw_data.tsv
├── bray_curtis_pcoa_results-PCoA
│ ├── css
│ ├── emperor.html
│ ├── img
│ ├── index.html
│ ├── js
│ ├── q2templateassets
│ ├── templates
│ └── vendor
├── jaccard_distance_matrix-condition
│ ├── control-boxplots.pdf
│ ├── control-boxplots.png
│ ├── illumina-boxplots.pdf
│ ├── illumina-boxplots.png
│ ├── index.html
│ ├── permanova-pairwise.csv
│ ├── q2templateassets
│ └── raw_data.tsv
├── jaccard_pcoa_results-PCoA
│ ├── css
│ ├── emperor.html
│ ├── img
│ ├── index.html
│ ├── js
│ ├── q2templateassets
│ ├── templates
│ └── vendor
├── unweighted_unifrac_distance_matrix-condition
│ ├── control-boxplots.pdf
│ ├── control-boxplots.png
│ ├── illumina-boxplots.pdf
│ ├── illumina-boxplots.png
│ ├── index.html
│ ├── permanova-pairwise.csv
│ ├── q2templateassets
│ └── raw_data.tsv
├── unweighted_unifrac_pcoa_results-PCoA
│ ├── css
│ ├── emperor.html
│ ├── img
│ ├── index.html
│ ├── js
│ ├── q2templateassets
│ ├── templates
│ └── vendor
├── weighted_unifrac_distance_matrix-condition
│ ├── control-boxplots.pdf
│ ├── control-boxplots.png
│ ├── illumina-boxplots.pdf
│ ├── illumina-boxplots.png
│ ├── index.html
│ ├── permanova-pairwise.csv
│ ├── q2templateassets
│ └── raw_data.tsv
└── weighted_unifrac_pcoa_results-PCoA
├── css
├── emperor.html
├── img
├── index.html
├── js
├── q2templateassets
├── templates
└── vendor