Overview
Nextflow is a pipeline engine that can take advantage of the batch nature of the HPC environment to efficiently and quickly run Bioinformatic workflows.
For more information about Nextflow, please visit Nextflow - A DSL for parallel and scalable computational pipelines
The ampliseq pipeline is a bioinformatics analysis pipeline used for 16S rRNA amplicon sequencing data.
https://github.com/nf-core/ampliseq
Install NextFlow locally
PBS session: qsub -I -S /bin/bash -l walltime=168:00:00 -l select=1:ncpus=4:mem=8gb
tmux
module load java
Test NextFlow: nextflow run nf-core/ampliseq -profile test,singularity --metadata "Metadata.tsv"
failed! Data files listed in metadata.tsv not found. Need local copies for this parameter.
Tried without metadata: NextFlow: nextflow run nf-core/ampliseq -profile test,singularity
Initially ran, then failed during trimming '
cutadapt: error: pigz: abort: read error on 1_S103_L001_R2_001.fastq.gz (No such file or directory) (exit code 2)
' Looks like cannot pull down files. Need to pull files locally.sKeeps failing with different errors every time. HPC issue methinks.
One intemittant error: cannot download metadata.tsv file.
Solution: downloading all test files locally, to the root test directory (wget all of them):
1. Metadata file: https://github.com/nf-core/test-datasets/raw/ampliseq/testdata/Metadata.tsv
2.
Classifier: https://github.com/nf-core/test-datasets/raw/ampliseq/testdata/GTGYCAGCMGCCGCGGTAA-GGACTACNVGGGTWTCTAAT-gg_13_8-85-qiime2_2019.7-classifier.qza
3. Fastq files. ['1_S103', ['https://github.com/nf-core/test-datasets/raw/ampliseq/testdata/1_S103_L001_R1_001.fastq.gz', 'https://github.com/nf-core/test-datasets/raw/ampliseq/testdata/1_S103_L001_R2_001.fastq.gz']], ['1a_S103', ['https://github.com/nf-core/test-datasets/raw/ampliseq/testdata/1a_S103_L001_R1_001.fastq.gz', 'https://github.com/nf-core/test-datasets/raw/ampliseq/testdata/1a_S103_L001_R2_001.fastq.gz']], ['2_S115', ['https://github.com/nf-core/test-datasets/raw/ampliseq/testdata/2_S115_L001_R1_001.fastq.gz', 'https://github.com/nf-core/test-datasets/raw/ampliseq/testdata/2_S115_L001_R2_001.fastq.gz']], ['2a_S115', ['https://github.com/nf-core/test-datasets/raw/ampliseq/testdata/2a_S115_L001_R1_001.fastq.gz', 'https://github.com/nf-core/test-datasets/raw/ampliseq/testdata/2a_S115_L001_R2_001.fastq.gz']] ]}
Creating custom test.config file that points to these local files
NOTE: we should have a standard place on the HPC for these files, that we can point (or symlink) to in the local nextflow.config script.
params {
classifier = "GTGYCAGCMGCCGCGGTAA-GGACTACNVGGGTWTCTAAT-gg_13_8-85-qiime2_2019.7-classifier.qza"
metadata = "Metadata.tsv"
readPaths = [
['1_S103', ['1_S103_L001_R1_001.fastq.gz', '1_S103_L001_R2_001.fastq.gz']],
['1a_S103', ['1a_S103_L001_R1_001.fastq.gz', '1a_S103_L001_R2_001.fastq.gz']],
['2_S115', ['2_S115_L001_R1_001.fastq.gz', '2_S115_L001_R2_001.fastq.gz']],
['2a_S115', ['2a_S115_L001_R1_001.fastq.gz', '2a_S115_L001_R2_001.fastq.gz']]
]
}
6. Run Nextflow tower
7.
8 Stepping past test dataset to Mahsa’s data:
Q for Craig:
Do we need to add any of this to .nextflow/config file? Perhaps just for Tower?
process {
executor = 'pbspro'
scratch = 'true'
beforeScript = {
"""
mkdir -p /data1/whatmorp/singularity/mnt/session
source $HOME/.bashrc
source $HOME/.profile
"""
}
}
singularity {
cacheDir = '/home/whatmorp/NXF_SINGULARITY_CACHEDIR'
autoMounts = true
}
conda {
cacheDir = '/home/whatmorp/NXF_CONDA_CACHEDIR'
}
tower {
accessToken = 'c7f8cc62b24155c0150a6ce4b6db15946bfc19ef'
endpoint = 'https://nftower.qut.edu.au/api'
enabled = true
}
Installing NextFlow
Follow the eResearch wiki entry for installing NextFlow:
...
If you haven’t been set up or have used the HPC previously, click on this link for information on how to get access to and use the HPC:
Need a link here for HPC access and usage
Creating a shared workspace on the HPC
...
To request a node using PBS, submit a shell script containing your RAM/CPU/analysis time requirements and the code needed to run your analysis. For an overview of submitting a PBS job, see here:
Need a link here for creating PBS jobs
Alternatively, you can start up an ‘interactive’ node, using the following:
...