nf-core/rnaseq: Gene Expression Analysis (version 3.10.1)

Prepared by the eResearch Office, QUT.

This page provides a guide to QUT users on how to install and run the nextflow nf-core/rnaseq workflow on the HPC.

Pre-requisites

Install Nextflow

The nf-core/rnaseq workflow requires Nextflow to be installed in your account on the HPC. Find details on how to install and test Nextflow here. Prepare a nextflow.config file and run a PBS pro submission script for Nextflow pipelines.

Additional information is available here: https://nf-co.re/usage/installation

Additional details on the workflow can be found at:

Overview: https://nf-co.re/rnaseq/3.10.1

Usage: https://nf-co.re/rnaseq/3.10.1/usage

GitHub: GitHub - nf-core/rnaseq: RNA sequencing analysis pipeline using STAR, RSEM, HISAT2 or Salmon with gene/isoform counts and extensive quality control.

Pipeline Summary

The pipeline is built using Nextflow and processes data using the following steps:

Getting Started

Download and run the workflow using minimal data provided by nf-core/rnaseq. We recommend using singularity as the profile for QUT’s HPC. Another profile option can be ‘conda.’ Note: the profile option ‘docker’ is unavailable on the HPC.

nextflow run nf-core/rnaseq -profile test,singularity --outdir results -r 3.10.1

Preparing a ‘samplesheet.csv’ file

Prepare a sample sheet file that specifies the input files to be used. To do this, we use an nf-core script to generate the ‘samplesheet.csv’ file as follows (setting strandedness to auto allows the pipeline to determine the strandedness of your RNA-seq data automatically):

#load python 3.10 module load python/3.10.8-gcccore-12.2.0 #download script and make executable wget -L https://raw.githubusercontent.com/nf-core/rnaseq/master/bin/fastq_dir_to_samplesheet.py chmod +x fastq_dir_to_samplesheet.py #generate the samplesheet.csv file ./fastq_dir_to_samplesheet.py /path/to/directory/containing/fastq_files/ samplesheet.csv \ --strandedness auto \ --read1_extension _R1.fastq.gz \ --read2_extension _R2.fastq.gz

Example index.csv (Version 3.10.1):

sample,fastq_1,fastq_2,strandedness control_1,/path/to/directory/containing/fastq_files/control-1_R1.fastq.gz,/path/to/directory/containing/fastq_files/control-1_R2.fastq.gz,auto control_2,/path/to/directory/containing/fastq_files/control-2_R1.fastq.gz,/path/to/directory/containing/fastq_files/control-2_R2.fastq.gz,auto control_3,/path/to/directory/containing/fastq_files/control-3_R1.fastq.gz,/path/to/directory/containing/fastq_files/control-3_R2.fastq.gz,auto infected_1,/path/to/directory/containing/fastq_files/infected-1_R1.fastq.gz,/path/to/directory/containing/fastq_files/infected-1_R2.fastq.gz,auto infected_2,/path/to/directory/containing/fastq_files/infected-1_R1.fastq.gz,/path/to/directory/containing/fastq_files/infected-2_R2.fastq.gz,auto infected_3,/path/to/directory/containing/fastq_files/infected-1_R1.fastq.gz,/path/to/directory/containing/fastq_files/infected-3_R2.fastq.gz,auto

Preparing to run on the HPC

To run this on the HPC a PBS submission script needs to be created using a text editor. For example, create a file called launch.pbs using a text editor of choice (i.e., vi or nano) and then copy and paste the code below:

We recommend running the nextflow nf-core/rnaseq pipeline once and then assessing the fastqc results folder to assess if sequence biases are present in the 5'-end and 3'-end ends of the sequences. Then, we can use the PBS script below to tell the pipeline to remove a defined number of bases from the 5'-end (--clip_r1 or --clip_r2) or 3'-end (--three_prime_clip_r1 or --three_prime_clip_r2). Also, we can specify to remove ribosomal RNA as these sets of sequences are non-informative.

Submitting the job

Once you have created the folder for the run, the samplesheet.csv file, nextflow.config, and launch.pbs, you are ready to submit.

Submit the run with this command (On Lyra)

Monitoring the Run

You can use the command

Alternatively, use the following command:

To check on the jobs, you are running. Nextflow will launch additional jobs during the run.

You can also check the .nextflow.log file for details on what is going on.

Finally, if you have configured the connection to the NFTower, you can log on and check your run.

 

Troubleshooting

  1. I have been using version 3.3. and now, when I run version 3.10.1, I get an error that the asset is corrupted. What should I do?

 

Add output folders/files

 

sample data

Running the pipeline using custom data

Example of a typical command to run an RNA-seq analysis for mouse samples:

Note, if the running was interrupted or you did not complete a particular step, or you want to modify a parameter for a particular step, instead of re-running all processes again, nextflow enables you to “-resume” the workflow.