Overview

Initial requirements

To be able to run these exercises, you’ll need:

Instructions for getting a HPC account are here: https://qutvirtual4.qut.edu.au/group/staff/research/conducting/facilities/advanced-research-computing-storage/supercomputing/getting-started-with-hpc

If you haven’t installed Nextflow, follow the instructions in this link: 25S1W1 - 1. Getting started with Nextflow

You’ll need a Terminal (Mac Users) or PuTTY (Windows users) on your PC to access the HPC.

You can download PuTTY from here: https://the.earth.li/~sgtatham/putty/latest/w64/putty.exe

Then add the HPC (Lyra) address: aqua.qut.edu.au and then click ‘open’.

image-20240527-223342.png

Setup Windows File Explorer to access your HPC home account. Follow the instructions here:

https://qutvirtual4.qut.edu.au/group/staff/research/conducting/facilities/advanced-research-computing-storage/supercomputing/using-hpc-filesystems

Interactive HPC session

Open a Terminal (Mac users) or PuTTy (Windows users) and paste the text below into the command prompt to start an Interactive Session:

qsub -I -S /bin/bash -l walltime=4:00:00 -l select=1:ncpus=4:mem=8gb

It should take less than a minute for the interactive session. The interactive session will allow

Create working directories

We’ll be analysing Illumina amplicon data, so first we need to create the workshop directories in your home drive on the HPC. Copy and paste the following into PuTTy or Terminal:

To make sure you are in your home directory, run the following command:

cd $HOME

Next, let’s create working folders for today's exercises:

mkdir -p $HOME/workshop/2025/S1W1/metagenomics/scripts
mkdir -p $HOME/workshop/2025/S1W1/metagenomics/runs
mkdir -p $HOME/workshop/2025/S1W1/metagenomics/runs/run1_test
mkdir -p $HOME/workshop/2025/S1W1/metagenomics/runs/run2_ampliseq
mkdir -p $HOME/workshop/2025/S1W1/metagenomics/data/illumina
mkdir -p $HOME/workshop/2025/S1W1/metagenomics/data/mydata
cd $HOME/workshop/2025/S1W1/metagenomics

Copy scripts for excercises

Let’s now copy the scripts for today’s workshop:

cp /work/training/2025/S1W1/session3_metagenomics/scripts/* $HOME/workshop/2025/S1W1/metagenomics/scripts

Check the list of scripts:

ls -l $HOME/workshop/2025/S1W1/metagenomics/scripts
├── create_samplesheet_nfcore_ampliseq.py
├── launch_nfcore_ampliseq_illumina.pbs
├── launch_nfcore_ampliseq_test.pbs
└── samplesheet.tsv

Copy public data

Now let’s copy previously downloaded Illumina amplicon data:

cp /work/training/2025/S1W1/session3_metagenomics/data/illumina/* $HOME/workshop/2025/S1W1/metagenomics/data/illumina

This can take a couple of minutes. To check that you have copied the data you can do the following:

ls $HOME/workshop/2025/S1W1/metagenomics/data/illumina
create_samplesheet_nfcore_ampliseq.py*  Illumina24.fastq.gz  Illumina39.fastq.gz  Illumina53.fastq.gz
Illumina10.fastq.gz                     Illumina25.fastq.gz  Illumina3.fastq.gz   Illumina54.fastq.gz
Illumina11.fastq.gz                     Illumina26.fastq.gz  Illumina40.fastq.gz  Illumina55.fastq.gz
Illumina12.fastq.gz                     Illumina27.fastq.gz  Illumina41.fastq.gz  Illumina56.fastq.gz
Illumina13.fastq.gz                     Illumina28.fastq.gz  Illumina42.fastq.gz  Illumina57.fastq.gz
Illumina14.fastq.gz                     Illumina29.fastq.gz  Illumina43.fastq.gz  Illumina58.fastq.gz
Illumina15.fastq.gz                     Illumina2.fastq.gz   Illumina44.fastq.gz  Illumina59.fastq.gz
Illumina16.fastq.gz                     Illumina30.fastq.gz  Illumina45.fastq.gz  Illumina5.fastq.gz
Illumina17.fastq.gz                     Illumina31.fastq.gz  Illumina46.fastq.gz  Illumina6.fastq.gz
Illumina18.fastq.gz                     Illumina32.fastq.gz  Illumina47.fastq.gz  Illumina7.fastq.gz
Illumina19.fastq.gz                     Illumina33.fastq.gz  Illumina48.fastq.gz  Illumina8.fastq.gz
Illumina1.fastq.gz                      Illumina34.fastq.gz  Illumina49.fastq.gz  Illumina9.fastq.gz
Illumina20.fastq.gz                     Illumina35.fastq.gz  Illumina4.fastq.gz   samplesheet.tsv
Illumina21.fastq.gz                     Illumina36.fastq.gz  Illumina50.fastq.gz
Illumina22.fastq.gz                     Illumina37.fastq.gz  Illumina51.fastq.gz
Illumina23.fastq.gz                     Illumina38.fastq.gz  Illumina52.fastq.gz

Let’s move to the data folder:

cd $HOME/workshop/2025/S1W1/metagenomics/data

Now we are ready for the next excercise downloading public data from the European Nucleotide Archive (ENA) https://www.ebi.ac.uk/ena/browser/view/PRJEB28612

Next page

25S1W1 - 3. Download public metagenomics data from ENA