Prepared by the eResearch Office, QUT.
“Sarek is a workflow designed to detect variants on whole genome or targeted sequencing data”
This page provides a guide to QUT users to run the nf-core/sarek workflow on the QUT HPC.
Installed conda3 or miniconda3 ( https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html )
Further details on the workflow can be found at:
https://nf-co.re/sarek/2.6.1/usage
Install Nextflow
The Sarek workflow requires Nextflow to be installed in your account on the HPC. Find details on how to install and test Nextflow here and prepare a cextflow nextflow.config file and run a PBS pro submission script for Nextflow pipelines.
Additional information available here: https://nf-co.re/usage/installation
The Sarek Workflow Tools
The Sarek workflow will perform the following steps by default:
...
Code Block |
---|
params { tools = 'HaplotypeCaller,mpileup,snpEFF' } |
Workflow steps
Sarek has these steps
mapping, prepare_recalibration, recalibrate, variant_calling, annotate, ControlFREEC
...
Code Block |
---|
#run test nextflow run nf-core/sarek -profile test,singularity #resume nextflow run nf-core/sarek -profile test,singularity -resume |
Preparing Data
Starting the Sarek workflow at the “mapping” step requires paired fastq files (See Sarek documentation for details). You need to create a suitable TAB separated text file that will be the input for the workflow. Mapping requires the following columns
subject sex status sample lane fastq1 fastq2
Example input sample.tsv
Code Block |
---|
Subject01 XX 0 Sample01 1 /work/group/data/subject01-sample01_R1.fastq.gz /work/group/data/subject01-sample01_R2.fastq.gz Subject02 XX 0 Sample01 1 /work/group/data/subject02-sample01_R1.fastq.gz /work/group/data/subject02-sample01_R2.fastq.gz |
Selecting a Genome
Please see Reference Genomes » nf-core (nf-co.re) for details on Genomes available.
...
Code Block |
---|
params { genome = 'CRCh38' } |
Putting it all together
Create a folder to store the run input and output.
...
Code Block |
---|
nextflow run /nf-cire/sarek -profile singularity --input 'input.tsv' --genome 'GRCh38' --tools 'HaplotypeCaller,mpileup,snpEFF' |
Or, create a nextflow.config file to store the options in a different place.
Code Block |
---|
params { input = 'inputsample.tsv' genome = 'GRCh38' tools = 'HaplotypeCaller,mpileup,snpEFF,VEP,CNVkit' } tower { accessToken = 'your tower token' endpoint = 'https://nftower.qut.edu.au/api' enabled = true } |
For this, you have to put in your tower token. You will be assigned a token once you sign in via https://nftower.qut.edu.au/api
With this file in place, the command to run the pipeline is
Code Block |
---|
nextflow run /nf-cire/sarek -profile singularity |
Preparing to run on the HPC
To run this on the HPC a PBS submission script needs to be created.
In the folder you have created for this run create launch.pbs
Code Block |
---|
#!/bin/bash -l #PBS -N MySarekRun #PBS -l walltime=168:00:00 #PBS -l select=1:ncpus=1:mem=5gb cd $PBS_O_WORKDIR NXF_OPTS='-Xms1g -Xmx4g' module load java nextflow run nf-core/sarek |
Alternative An alternative option to run Sark Sarek (define parameters in the command)
Code Block |
---|
#!/bin/bash -l #PBS -N MySarekRun #PBS -l walltime=168:00:00 #PBS -l select=1:ncpus=1:mem=5gb cd $PBS_O_WORKDIR NXF_OPTS='-Xms1g -Xmx4g' module load java #specify the nextflow version to use to run the workflow export NXF_VER=22.06.1-edge nextflow run nf-core/sarek -profile singularity \ --input sample.tsv -name GRCh38_FBS1_LNCAP \ --genome GRCh38 --tools HaplotypeCaller,snpEff,VEP \ --generate_gvcf \ -r 3.1.1 |
Submitting the job
Once you have created the folder for the run, the input.tsv file, nextflow.config and launch.pbs you are ready to submit.
...
Code Block |
---|
qsub launch.pbs |
Monitoring the Run
You can use the command
Code Block |
---|
qstat -u $USER |
...