Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Prepared by the eResarch eResearch Office, QUT.

This page provides a guide to QUT users on how to install and run the nextflow nf-core/rnaseq workflow on the HPC.

Further details on the workflow can be found at:

https://nf-co.re/rnaseq/3.0

https://nf-co.re/rnaseq/3.0/usage

...

Pre-requisites

Install Nextflow

The nf-core/rnaseq workflow requires Nextflow to be installed in your account on the HPC. Find details on how to install and test Nextflow here prepare . Prepare a cextflownextflow.config file and run a PBS pro submission script for Nextflow pipelines.

Additional information is available here: https://nf-co.re/usage/installation

Pipeline summary

...

Additional details on the workflow can be found at:

Overview: https://nf-co.re/rnaseq/3.10.1

Usage: https://nf-co.re/rnaseq/3.10.1/usage

GitHub: https://github.com/nf-core/rnaseq

Pipeline Summary

...

The pipeline is built using Nextflow and processes data using the following steps:

Getting Started

Download and run the workflow using a minimal data provided by nf-core/rnaseq. We recommend using singularity as the profile for QUT’s HPC. Other Another profile option can be ‘conda’‘conda.Note: the profile option ‘docker’ is not available unavailable on the HPC.

Code Block
nextflow run nf-core/rnaseq -profile test,singularity

Running the pipeline using custom data

Example of a typical command to run a RNA-seq analysis for mouse samples:

Code Block
  nextflow run nf-core/rnaseq \
      --input samplesheet.csv \
      --genome GRCm38 \
      -profile conda

Note, if the running was interrupted or did not complete a particular step or you want to modify a parameter for a particular step, instead of re-running all process again nextflow enables to “-resume” the workflow.

Code Block
  nextflow run nf-core/rnaseq \
      --input samplesheet.csv \
  --outdir results -r 3.10.1

Preparing a ‘samplesheet.csv’ file

Prepare a sample sheet file that specifies the input files to be used. To do this, we use an nf-core script to generate the ‘samplesheet.csv’ file as follows (setting strandedness to auto allows the pipeline to determine the strandedness of your RNA-seq data automatically):

Code Block
#load python 3.10
module load python/3.10.8-gcccore-12.2.0

#download script and make executable
wget -L https://raw.githubusercontent.com/nf-core/rnaseq/master/bin/fastq_dir_to_samplesheet.py
chmod +x fastq_dir_to_samplesheet.py

#generate the samplesheet.csv file
./fastq_dir_to_samplesheet.py /path/to/directory/containing/fastq_files/ samplesheet.csv \
    --genomestrandedness GRCm38auto \
      -profile conda
--read1_extension _R1.fastq.gz \
     -resume

Preparing a ‘samplesheet.csv’ file

A samplesheet.csv file tells the workflow the location of the read 1 (R1), read 2 (R2) and other information about the samples including ‘group’ (i.e., control or infected), replicate number and the orientation of the reads (i.e., forward, reverse, unstranded).

Example samplesheet.csv:

Code Block
group,replicate--read2_extension _R2.fastq.gz

Example index.csv (Version 3.10.1):

Code Block
sample,fastq_1,fastq_2,strandedness
control,_1,/path/to/directory/containing/fastq_files/control-1_R1.fastq.gz,/path/to/directory/containing/fastq_files/control-1_R2.fastq.gz,unstrandedauto
control,_2,/path/to/directory/containing/fastq_files/control-2_R1.fastq.gz,/path/to/directory/containing/fastq_files/control-2_R2.fastq.gz,unstrandedauto
control,_3,/path/to/directory/containing/fastq_files/control-3_R1.fastq.gz,/path/to/directory/containing/fastq_files/control-3_R2.fastq.gz,unstrandedauto
infected,_1,/path/to/directory/containing/fastq_files/infected-1_R1.fastq.gz,/path/to/directory/containing/fastq_files/infected-1_R2.fastq.gz,unstrandedauto
infected,_2,/path/to/directory/containing/fastq_files/infected-1_R1.fastq.gz,/path/to/directory/containing/fastq_files/infected-2_R2.fastq.gz,unstrandedauto
infected,_3,/path/to/directory/containing/fastq_files/infected-1_R1.fastq.gz,/path/to/directory/containing/fastq_files/infected-3_R2.fastq.gz,unstrandedauto

Preparing to run on the HPC

To run this on the HPC a PBS submission script needs to be created .In the folder you have created for this run create using a text editor. For example, create a file called launch.pbs using a text editor of choice (i.e., vim, vi or nano) and then copy and paste the code below:

Code Block
#!/bin/bash -l
#PBS -N nfrna2
#PBS -l select=1:ncpus=2:mem=4gb
#PBS -l walltime=24:00:00

#work on current directory (folder)
cd $PBS_O_WORKDIR

#load java and set up memory settings to run nextflow
module load java
NXF_OPTS='-Xms1g -Xmx4g'

#run the rnaseq pipeline
nextflow run nf-core/rnaseq \
      -profile conda singularity \
      -r 3.10.1 \
      --input samplesheet.csv \
      --genome GRCm38 GRCh38 \
      --outdir results \
      --aligned star_rsem --min_mapped_reads 5 

Additional options:

aligner star_salmon

We recommend running the nextflow nf-core/rnaseq pipeline once and then assessing the fastqc results folder to assess if sequence biases are present in the 5'-end and 3'-end ends of the sequences. Then, we can use the PBS script below to tell the pipeline to remove a defined number of bases from the 5'-end (--clip_r1 or --clip_r2) or 3'-end (--three_prime_clip_r1 or --three_prime_clip_r2). Also, we can specify to remove ribosomal RNA as these sets of sequences are non-informative.

Code Block
#!/bin/bash -l
#PBS -N nfrna2
#PBS -l select=1:ncpus=2:mem=4gb
#PBS -l walltime=24:00:00

#work on current directory (folder)
cd $PBS_O_WORKDIR

#load java and set up memory settings to run nextflow
module load java
NXF_OPTS='-Xms1g -Xmx4g'

#run the rnaseq pipeline
#with-dag can output files in .png, .pdf, .svg or .html
nextflow run nf-core/rnaseq -profile conda  --input samplesheet.csv \
        --genomeoutdir GRCm38results \
        --aligned star_rsemr 3.10.1 \
        --min_mapped_reads 5genome GRCh38 \
        --clip_r1 10profile singularity \
        --clip_r2 10aligner star_salmon \
        --three_prime_clip_r1 210 \
        --three_prime_clip_r2 210 \
        -dump-channels-three_prime_clip_r1 2 \
        -with-dag flowchart.png-three_prime_clip_r2 2

Submitting the job

Once you have created the folder for the run, the inputsamplesheet.tsv csv file, nextflow.config, and launch.pbs, you are ready to submit.

Submit the run with this command (On Lyra)

Code Block
qsub launch.pbs

Monitoring the Run

You can use the command

Code Block
qstat -u $USER

Alternatively, use the following command:

Code Block
qjobs

To check on the jobs, you are running. Nextflow will launch additional jobs during the run.

...

Finally, if you have configured the connection to the NFTower, you can logon and check your run.log on and check your run.

Troubleshooting

  1. I have been using version 3.3. and now, when I run version 3.10.1, I get an error that the asset is corrupted. What should I do?

Code Block
#delete the existing assests associated with the RNAseq pipeline:
cd ~/.nextflow/assets/nf-core
rm -r rnaseq/

#run again a test with the new version that you are testing, for example, version 3.10.1. See details on how to run a test above (under 'Getting Started')

Add output folders/files

sample data

Running the pipeline using custom data

Example of a typical command to run an RNA-seq analysis for mouse samples:

Code Block
nextflow run nf-core/rnaseq --input samplesheet.csv \
        --outdir results \
        -r 3.10.1 \
        --genome GRCm38 \
        -profile singularity \
        --aligner star_rsem \
        --clip_r1 10 \
        --clip_r2 10 \
        --three_prime_clip_r1 2 \
        --three_prime_clip_r2 2

Note, if the running was interrupted or you did not complete a particular step, or you want to modify a parameter for a particular step, instead of re-running all processes again, nextflow enables you to “-resume” the workflow.

Code Block
nextflow run nf-core/rnaseq --input samplesheet.csv \
        --outdir results \
        -r 3.10.1 \
        --genome GRCm38 \
        -profile singularity \
        --aligner star_rsem \
        --clip_r1 10 \
        --clip_r2 10 \
        --three_prime_clip_r1 2 \
        --three_prime_clip_r2 2 \
      -resume