Content Comparison

Overview

Nextflow is a pipeline engine that can take advantage of the batch nature of the HPC environment to efficiently and quickly run Bioinformatic workflows.

For more information about Nextflow, please visit Nextflow - A DSL for parallel and scalable computational pipelines

The ampliseq pipeline is a bioinformatics analysis pipeline used for 16S rRNA amplicon sequencing data.

https://nf-co.re/ampliseq

https://github.com/nf-core/ampliseq

Install NextFlow locally
PBS session: qsub -I -S /bin/bash -l walltime=168:00:00 -l select=1:ncpus=4:mem=8gb
tmux
module load java
Test NextFlow: nextflow run nf-core/ampliseq -profile test,singularity --metadata "Metadata.tsv"

failed! Data files listed in metadata.tsv not found. Need local copies for this parameter.
Tried without metadata: NextFlow: nextflow run nf-core/ampliseq -profile test,singularity
Initially ran, then failed during trimming 'cutadapt: error: pigz: abort: read error on 1_S103_L001_R2_001.fastq.gz (No such file or directory) (exit code 2)' Looks like cannot pull down files. Need to pull files locally.
sKeeps failing with different errors every time. HPC issue methinks.
One intemittant error: cannot download metadata.tsv file.
- Solution: downloading all test files locally, to the root test directory (wget all of them):
- 1. Metadata file: https://github.com/nf-core/test-datasets/raw/ampliseq/testdata/Metadata.tsv
- 2. Classifier: https://github.com/nf-core/test-datasets/raw/ampliseq/testdata/GTGYCAGCMGCCGCGGTAA-GGACTACNVGGGTWTCTAAT-gg_13_8-85-qiime2_2019.7-classifier.qza
- 3. Fastq files. ['1_S103', ['https://github.com/nf-core/test-datasets/raw/ampliseq/testdata/1_S103_L001_R1_001.fastq.gz', 'https://github.com/nf-core/test-datasets/raw/ampliseq/testdata/1_S103_L001_R2_001.fastq.gz']], ['1a_S103', ['https://github.com/nf-core/test-datasets/raw/ampliseq/testdata/1a_S103_L001_R1_001.fastq.gz', 'https://github.com/nf-core/test-datasets/raw/ampliseq/testdata/1a_S103_L001_R2_001.fastq.gz']], ['2_S115', ['https://github.com/nf-core/test-datasets/raw/ampliseq/testdata/2_S115_L001_R1_001.fastq.gz', 'https://github.com/nf-core/test-datasets/raw/ampliseq/testdata/2_S115_L001_R2_001.fastq.gz']], ['2a_S115', ['https://github.com/nf-core/test-datasets/raw/ampliseq/testdata/2a_S115_L001_R1_001.fastq.gz', 'https://github.com/nf-core/test-datasets/raw/ampliseq/testdata/2a_S115_L001_R2_001.fastq.gz']] ]}
- Creating custom test.config file that points to these local files
- NOTE: we should have a standard place on the HPC for these files, that we can point (or symlink) to in the local nextflow.config script.

params {
classifier = "GTGYCAGCMGCCGCGGTAA-GGACTACNVGGGTWTCTAAT-gg_13_8-85-qiime2_2019.7-classifier.qza"
metadata = "Metadata.tsv"
readPaths = [
['1_S103', ['1_S103_L001_R1_001.fastq.gz', '1_S103_L001_R2_001.fastq.gz']],
['1a_S103', ['1a_S103_L001_R1_001.fastq.gz', '1a_S103_L001_R2_001.fastq.gz']],
['2_S115', ['2_S115_L001_R1_001.fastq.gz', '2_S115_L001_R2_001.fastq.gz']],
['2a_S115', ['2a_S115_L001_R1_001.fastq.gz', '2a_S115_L001_R2_001.fastq.gz']]
]
}

6. Run Nextflow tower

7.

8 Stepping past test dataset to Mahsa’s data:

Q for Craig:

Do we need to add any of this to .nextflow/config file? Perhaps just for Tower?

process {
executor = 'pbspro'
scratch = 'true'
beforeScript = {
"""
mkdir -p /data1/whatmorp/singularity/mnt/session
source $HOME/.bashrc
source $HOME/.profile
"""
}
}

singularity {
cacheDir = '/home/whatmorp/NXF_SINGULARITY_CACHEDIR'
autoMounts = true
}
conda {
cacheDir = '/home/whatmorp/NXF_CONDA_CACHEDIR'
}

tower {
accessToken = 'c7f8cc62b24155c0150a6ce4b6db15946bfc19ef'
endpoint = 'https://nftower.qut.edu.au/api'
enabled = true
}

Installing NextFlow

Follow the eResearch wiki entry for installing NextFlow:

...

If you haven’t been set up or have used the HPC previously, click on this link for information on how to get access to and use the HPC:

Need a link here for HPC access and usage

Creating a shared workspace on the HPC

...

To request a node using PBS, submit a shell script containing your RAM/CPU/analysis time requirements and the code needed to run your analysis. For an overview of submitting a PBS job, see here:

Need a link here for creating PBS jobs

Alternatively, you can start up an ‘interactive’ node, using the following:

...

Version	Old Version 5	New Version 6
Changes made by	Paul Whatmore	Paul Whatmore
Saved on	14/01/2021	14/01/2021

Versions Compared

Key

Overview

Installing NextFlow

Creating a shared workspace on the HPC