Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 21 Next »

Prepared by the eResearch Office, QUT.

This page provides a guide to QUT users on how to install and run the nextflow nf-core/rnaseq workflow on the HPC.

Pre-requisites

Install Nextflow

The nf-core/rnaseq workflow requires Nextflow to be installed in your account on the HPC. Find details on how to install and test Nextflow here prepare a cextflow.config file and run a PBS pro submission script for Nextflow pipelines.

Additional information available here: https://nf-co.re/usage/installation

Additional details on the workflow can be found at:

Overview: https://nf-co.re/rnaseq/3.0

Usage: https://nf-co.re/rnaseq/3.0/usage

Github: https://github.com/nf-core/rnaseq

Pipeline summary

Getting Started

Download and run the workflow using a minimal data provided by nf-core/rnaseq. We recommend using singularity as the profile for QUT’s HPC. Other profile option can be ‘conda’. Note: the profile option ‘docker’ is not available on the HPC.

nextflow run nf-core/rnaseq -profile test,singularity

Running the pipeline using custom data

Example of a typical command to run a RNA-seq analysis for mouse samples:

  nextflow run nf-core/rnaseq \
      --input index.csv \
      --genome GRCm38 \
      --aligner star_salmon \
      -profile singularity \
      -r 3.3

Note, if the running was interrupted or did not complete a particular step or you want to modify a parameter for a particular step, instead of re-running all process again nextflow enables to “-resume” the workflow.

  nextflow run nf-core/rnaseq \
      --input index.csv \
      --genome GRCm38 \
      --aligner star_salmon \
      -profile singularity \
      -r 3.3 \
      -resume

Preparing a ‘samplesheet.csv’ file

Prepare an index.csv file containing the information of the samples to be processed. See below examples of index.csv files.

Example index.csv (previous versions):

group,replicate,fastq_1,fastq_2,strandedness
control,1,/path/to/fastq/control-1_R1.fastq.gz,/path/to/fastq/control-1_R2.fastq.gz,unstranded
control,2,/path/to/fastq/control-2_R1.fastq.gz,/path/to/fastq/control-2_R2.fastq.gz,unstranded
control,3,/path/to/fastq/control-3_R1.fastq.gz,/path/to/fastq/control-3_R2.fastq.gz,unstranded
infected,1,/path/to/fastq/infected-1_R1.fastq.gz,/path/to/fastq/infected-1_R2.fastq.gz,unstranded
infected,2,/path/to/fastq/infected-1_R1.fastq.gz,/path/to/fastq/infected-2_R2.fastq.gz,unstranded
infected,3,/path/to/fastq/infected-1_R1.fastq.gz,/path/to/fastq/infected-3_R2.fastq.gz,unstranded

Index format for current version 3.3:

group,fastq_1,fastq_2,strandedness
control_rep1,/path/to/fastq/control-1_R1.fastq.gz,/path/to/fastq/control-1_R2.fastq.gz,unstranded
control_rep2,/path/to/fastq/control-2_R1.fastq.gz,/path/to/fastq/control-2_R2.fastq.gz,unstranded
control_rep3,/path/to/fastq/control-3_R1.fastq.gz,/path/to/fastq/control-3_R2.fastq.gz,unstranded
infected_rep1,/path/to/fastq/infected-1_R1.fastq.gz,/path/to/fastq/infected-1_R2.fastq.gz,unstranded
infected_rep2,/path/to/fastq/infected-1_R1.fastq.gz,/path/to/fastq/infected-2_R2.fastq.gz,unstranded
infected_rep3,/path/to/fastq/infected-1_R1.fastq.gz,/path/to/fastq/infected-3_R2.fastq.gz,unstranded

Preparing to run on the HPC

To run this on the HPC a PBS submission script needs to be created using a text editor. For example, create a file called launch.pbs using a text editor of choice (i.e., vi or nano) and then copy and paste the code below:

#!/bin/bash -l
#PBS -N nfrna2
#PBS -l select=1:ncpus=2:mem=4gb
#PBS -l walltime=24:00:00

cd $PBS_O_WORKDIR
module load java
NXF_OPTS='-Xms1g -Xmx4g'

nextflow run nf-core/rnaseq -profile singularity -r 3.3 --input index.csv --genome GRCm38 --aligner star_salmon

We recommend running the nextflow nf-core/rnaseq pipeline once and then assess the fastqc results folder to assess if sequence biases are present in the 5'-end and 3'-end ends of the sequences. Then, we can use the PBS script below to tell the pipeline to remove a defined number of bases from the 5'-end (--clip_r1 or --clip_r2) or 3'-end (--three_prime_clip_r1 or --three_prime_clip_r2). Also we can specify to remove ribosomal RNA as these sets of sequences are non-informative.

#!/bin/bash -l
#PBS -N nfrna2
#PBS -l select=1:ncpus=2:mem=4gb
#PBS -l walltime=24:00:00

#work on current directory (folder)
cd $PBS_O_WORKDIR

#load java and set up memory settings to run nextflow
module load java
NXF_OPTS='-Xms1g -Xmx4g'

#run the rnaseq pipeline
#with-dag can output files in .png, .pdf, .svg or .html
nextflow run nf-core/rnaseq -profile conda  --input samplesheet.csv \
        --genome GRCm38 \
        --aligner star_salmon \
        --min_mapped_reads 5 \
        --clip_r1 10 \
        --clip_r2 10 \
        --three_prime_clip_r1 2 \
        --three_prime_clip_r2 2 \
        --remove_ribo_rna \
        -dump-channels \
        -with-dag flowchart.png

Submitting the job

Once you have created the folder for the run, the input.tsv file, nextflow.config and launch.pbs you are ready to submit.

Submit the run with this command (On Lyra)

qsub launch.pbs

Monitoring the Run

You can use the command

qstat -u $USER

Alternatively use the following command:

qjobs

To check on the jobs you are running. Nextflow will launch additional jobs during the run.

You can also check the .nextflow.log file for details on what is going on.

Finally, if you have configured the connection to the NFTower you can logon and check your run.

  • No labels