Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Running PBS jobs on the HPC Confluence page

Note: the wiki page for running PBS jobs is in development. Instead, run an interactive PBS session, as seen below in the ‘Alternative to submitting PBS job: interactive session.’ section

Directory structure

When a NextFlow pipeline is run, it generates multiple directories and output files. We therefore recommend you create a directory where you run all your NextFlow pipelines, so that you don’t have output directories and files scattered across your home directory.

...

Code Block
cd ~
mkdir nextflow

Alternative to submitting PBS job: interactive session.

Run tmux first, so job keeps running when you log off.

Code Block
module load tmux
tmux

Interactive Start an interactive PBS session:

Code Block
qsub -I -S /bin/bash -l walltime=168:00:00 -l select=1:ncpus=4:mem=8gb

...

As mentioned above, spelling mistakes or extra characters in the file paths will cause ampliseq to fail. One way to avoid this is to generate the manifest file on the command line using the Linux tools awk and sed.

Below is an example of how to generate the manifest file. You may need to modify this, depending on how your files are named.

...

NOTE: This is an example nextflow.config file. Don’t simply copy and paste the above. You’ll need to modify it to reflect the primers you used to generate your sequences.: FW_primer = and RV_primer =

The remaining lines can stay the same, presuming that you called your metadata file 'metadata.txt' and you have all the files in the directory where you will be running ampliseq from.

Running NextFlow’s ampliseq pipeline

...

Notes on amplicon primers

There are multiple sets of amplicon primers, designed to amplify different regions of the 16S gene. You should be told by your sequencing company what these primers are.

The standard Ilumina protocol for 16S V3 and V4 region amplicons is here:

https://support.illumina.com/documents/documentation/chemistry_documentation/16s/16s-metagenomic-library-prep-guide-15044223-b.pdf

Note the forward and reverse primers

16S Amplicon PCR Forward Primer = 5' TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTACGGGNGGCWGCAG

16S Amplicon PCR Reverse Primer = 5' GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGACTACHVGGGTATCTAATCC

These are not the primers you use in the nextflow.config file

These Illumina primers contain overhang sequences, that don’t anneal to any known DNA region:

Forward overhang: 5’ TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG‐[locus‐ specific sequence]

Reverse overhang: 5’ GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG‐[locus‐ specific sequence]

The correct primers to use in your nextflow.config file are the 16S Amplicon primers with the overhang sequences removed.

I.e.

FW_primer = "CCTACGGGNGGCWGCAG"

RV_primer = "GACTACHVGGGTATCTAATCC"

Again, this is only for the Illumina 16S V3 and V4 region amplicons. If you’ve amplified a different region, you’ll need to provide different primers. If you’re using Illumina, look out for overhang sequences!

Running NextFlow’s ampliseq pipeline

Make sure Java is loaded (should be already loaded if you are continuing from the above steps, otherwise ‘module load java’) and that you have started an interactive PBS session (again, you should be in this if continuing from above)

...

An example of how to analyse the results in R is here (in progress):

Downstream analysis of NextFlow ampliseq output (16S amplicon analysis)

...