Page Comparison

“Sarek is a workflow designed to detect variants on whole genome or targeted sequencing data”

This page provides a guide to QUT users to run the nf-core/sarek workflow on the QUT HPC.

Installed conda3 or miniconda3 ( https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html )

Further details on the workflow can be found at:

https://nf-co.re/sarek/2.6.1/usage

Install Nextflow

The Sarek workflow requires Nextflow to be installed in your account on the HPC. Find details on how to install and test Nextflow here and prepare a cextflow nextflow.config file and run a PBS pro submission script for Nextflow pipelines.

Additional information available here: https://nf-co.re/usage/installation

The Sarek Workflow Tools

The Sarek workflow will perform the following steps by default:

...

Code Block
params { tools = 'HaplotypeCaller,mpileup,snpEFF' }

Workflow steps

Sarek has these steps

mapping, prepare_recalibration, recalibrate, variant_calling, annotate, ControlFREEC

...

Code Block
#run test nextflow run nf-core/sarek -profile test,singularity #resume nextflow run nf-core/sarek -profile test,singularity -resume

Preparing Data

Starting the Sarek workflow at the “mapping” step requires paired fastq files (See Sarek documentation for details). You need to create a suitable TAB separated text file that will be the input for the workflow. Mapping requires the following columns

subject sex status sample lane fastq1 fastq2

Example input sample.tsv

Code Block

Subject01 XX  0 Sample01  1 /work/group/data/subject01-sample01_R1.fastq.gz /work/group/data/subject01-sample01_R2.fastq.gz
Subject02 XX  0 Sample01  1 /work/group/data/subject02-sample01_R1.fastq.gz /work/group/data/subject02-sample01_R2.fastq.gz

Selecting a Genome

Please see Reference Genomes » nf-core (nf-co.re) for details on Genomes available.

...

Code Block
params { genome = 'CRCh38' }

Putting it all together

Create a folder to store the run input and output.

...

Code Block
nextflow run /nf-cire/sarek -profile singularity --input 'input.tsv' --genome 'GRCh38' --tools 'HaplotypeCaller,mpileup,snpEFF'

Or, create a nextflow.config file to store the options in a different place.

Code Block

params {
  input = 'inputsample.tsv'
  genome = 'GRCh38'
  tools = 'HaplotypeCaller,mpileup,snpEFF,VEP,CNVkit'
  }
tower {
  accessToken = 'your tower token'
  endpoint = 'https://nftower.qut.edu.au/api'
  enabled = true
  }

For this, you have to put in your tower token. You will be assigned a token once you sign in via https://nftower.qut.edu.au/api

With this file in place, the command to run the pipeline is

Code Block
nextflow run /nf-cire/sarek -profile singularity

Preparing to run on the HPC

To run this on the HPC a PBS submission script needs to be created.

In the folder you have created for this run create launch.pbs

Code Block
#!/bin/bash -l #PBS -N MySarekRun #PBS -l walltime=168:00:00 #PBS -l select=1:ncpus=1:mem=5gb cd $PBS_O_WORKDIR NXF_OPTS='-Xms1g -Xmx4g' module load java nextflow run nf-core/sarek

Alternative An alternative option to run Sark Sarek (define parameters in the command)

Code Block

#!/bin/bash -l
#PBS -N MySarekRun
#PBS -l walltime=168:00:00
#PBS -l select=1:ncpus=1:mem=5gb
cd $PBS_O_WORKDIR
NXF_OPTS='-Xms1g -Xmx4g'
module load java

#specify the nextflow version to use to run the workflow
export NXF_VER=22.06.1-edge

nextflow run nf-core/sarek -profile singularity \
  --input sample.tsv -name GRCh38_FBS1_LNCAP \
  --genome GRCh38 --tools HaplotypeCaller,snpEff,VEP \
  --generate_gvcf \
  -r 3.1.1

Submitting the job

Once you have created the folder for the run, the input.tsv file, nextflow.config and launch.pbs you are ready to submit.

...

Code Block
qsub launch.pbs

Monitoring the Run

You can use the command

Code Block
qstat -u $USER

...

Versions Compared

Old Version 6

New Version Current

Key

“Sarek is a workflow designed to detect variants on whole genome or targeted sequencing data”

Further details on the workflow can be found at:

Install Nextflow

The Sarek Workflow Tools

Workflow steps

Preparing Data

Selecting a Genome

Putting it all together

Preparing to run on the HPC

Submitting the job

Monitoring the Run