Overview

Initial requirements

To be able to run these exercises, you’ll need:

A HPC account
Nextflow installed
Access your HPC home directory from your PC

Instructions for getting a HPC account are here: https://qutvirtual4.qut.edu.au/group/staff/research/conducting/facilities/advanced-research-computing-storage/supercomputing/getting-started-with-hpc

If you haven’t installed Nextflow, follow the instructions in this link: Installing Nextflow

Setup Windows File Explorer to access your HPC home account. Follow the instructions here:

https://qutvirtual4.qut.edu.au/group/staff/research/conducting/facilities/advanced-research-computing-storage/supercomputing/using-hpc-filesystems

Finally, it would be VERY useful if you’ve either completed session 1 of these workshops (Intro to the HPC) or if not, you can watch some videos that go overt the basics: https://mediahub.qut.edu.au/media/t/0_d0bsv333

Interactive HPC session

In session 2 and 3 (variant calling) we submitted jobs to the HPC via a PBS script. This is useful for large datasets that require lots of processing time or resources. For smaller datasets (like 16S amplicon sequence data), you can start ‘interactive mode’ on the HPC, which allocates you a temporary node with RAM/CPUs you request.

Open PuTTy and paste the text below into the command prompt:

qsub -I -S /bin/bash -l walltime=4:00:00 -l select=1:ncpus=16:mem=128gb

After a few minutes interactive mode will start. You will now be able to do all your analysis - including running Nextflow and Nanopore workflows - in this interactive session.

NOTE: I’ve selected 16 CPUs and 128gb of memory. This is based on testing of the Nextflow workflows we’ll be using and their CPU/memory requirements.

Create working directories

We’ll be analysing both Illumina and Nanopore data, so first we need to create the workshop directories in your home drive on the HPC. Copy and paste the following into PuTTy:

cd $HOME
mkdir meta_workshop
mkdir meta_workshop/illumina
mkdir meta_workshop/illumina/fastq
mkdir meta_workshop/nanopore
mkdir meta_workshop/nanopore/fastq
cd meta_workshop

Modify Nextflow to run in ‘local’ mode

Since we’re not submitting our Nextflow run as a PBS script, we’ll need to change the parameters in the Nextflow config file to reflect this.

The following (run in PuTTy) will open up your Nextflow config file in a text editor called Nano.

module load nano
nano $HOME/.nextflow/config

Up the top of the file you’ll see a line that says executor = 'pbspro'

Change this to executor = 'local'

Then save the file by typing <ctrl> o and then <ctrl> x to exit Nano.

Downloading a public dataset

The dataset we’ll be using is from a paper called https://www.mdpi.com/2073-4425/11/9/1105 (more details in the Overview section).

The data is hosted by European Nucleotide Archive (ENA). In the https://www.ebi.ac.uk/ena/browser/view/PRJEB28612 you can find the project by the accession number listed in the paper: PRJEB28612. ENA Browser then can generate a download script to run in a Linux command line.

To save time, I’ve already created this script and downloaded the dataset to the HPC. You’ll just need to copy these files to your workshop directories.

Copy the Illumina and Nanopore fastq files to their respective workshop directories like so:

cp /work/training/metagenomics/public_data/Illumina*.fastq.gz illumina/fastq
cp /work/training/metagenomics/public_data/Nanopore*.fastq.gz nanopore/fastq

This will copy the fastq files into your meta_workshop/illumina/fastq and meta_workshop/nanopore/fastq directories.

Now we can go to the next section: Illumina using nfcore/ampliseq

2. Initial setup

Initial requirements

Interactive HPC session

Create working directories

Modify Nextflow to run in ‘local’ mode

Downloading a public dataset