Demonstrate how to run the nextflow nf-core/rnaseq pipeline in the HPC cluster. Initially, running a test and then processing real-life data.
Create a new folder under your ‘myname’ folder called nextflow, and also create a subfolder called ‘run1’ for the first test of the pipeline:
mkdir myname/nextflow mkdir myname/nextflow/run1 |
Go to the subfolder to run the RNAseq test
cd myname/nextflow/run1 |
Download and run the workflow using minimal data provided by nf-core/rnaseq. We recommend using singularity as the profile for QUT’s HPC. Another profile option can be ‘conda.’ Note: the profile option ‘docker’ is unavailable on the HPC.
nextflow run nf-core/rnaseq -profile test,singularity --outdir results -r 3.12.0 |
To execute the above command in the HPC cluster prepare a PBS Pro submission script as follows:
#!/bin/bash -l #PBS -N nfrna2 #PBS -l select=1:ncpus=2:mem=4gb #PBS -l walltime=24:00:00 #work on current directory (folder) cd $PBS_O_WORKDIR #load java and set up memory settings to run nextflow module load java NXF_OPTS='-Xms1g -Xmx4g' nextflow run nf-core/rnaseq -profile test,singularity --outdir results -r 3.12.0 |
Once you have created the folder for the run, the samplesheet.csv file, nextflow.config, and launch.pbs, you are ready to submit.
Submit the run with this command (On Lyra)
qsub launch.pbs |
You can use the command
qstat -u $USER |
Alternatively, use the command
qjobs |
to check on the jobs you are running. Nextflow will launch additional jobs during the run.
You can also check the .nextflow.log file for details on what is going on.
Prepare a sample sheet file that specifies the input files to be used. To do this, we use an nf-core script to generate the ‘samplesheet.csv’ file as follows (setting strandedness to auto allows the pipeline to determine the strandedness of your RNA-seq data automatically):
Load module Python 3.10
module load python/3.10.8-gcccore-12.2.0 |
Download the script for creating the ‘samplesheet.csv’ metadata file.
wget -L https://raw.githubusercontent.com/nf-core/rnaseq/master/bin/fastq_dir_to_samplesheet.py chmod +x fastq_dir_to_samplesheet.py |
If you do not know already, determine the path to where the FASTQ files were downloaded/are located.
cd /myname/data #then run the following command pwd |
Using a text editor (i.e., VIM), edit the following code with the appropriate path to the files:
#generate the samplesheet.csv file ./fastq_dir_to_samplesheet.py /path/to/directory/containing/fastq_files/ samplesheet.csv \ --strandedness auto \ --read1_extension _R1.fastq.gz \ --read2_extension _R2.fastq.gz |
Example of 'samplesheet.csv' required for nf-core/rnaseq pipeline version 3.12.0:
sample,fastq_1,fastq_2,strandedness control_1,/path/to/directory/containing/fastq_files/control-1_R1.fastq.gz,/path/to/directory/containing/fastq_files/control-1_R2.fastq.gz,auto control_2,/path/to/directory/containing/fastq_files/control-2_R1.fastq.gz,/path/to/directory/containing/fastq_files/control-2_R2.fastq.gz,auto control_3,/path/to/directory/containing/fastq_files/control-3_R1.fastq.gz,/path/to/directory/containing/fastq_files/control-3_R2.fastq.gz,auto infected_1,/path/to/directory/containing/fastq_files/infected-1_R1.fastq.gz,/path/to/directory/containing/fastq_files/infected-1_R2.fastq.gz,auto infected_2,/path/to/directory/containing/fastq_files/infected-1_R1.fastq.gz,/path/to/directory/containing/fastq_files/infected-2_R2.fastq.gz,auto infected_3,/path/to/directory/containing/fastq_files/infected-1_R1.fastq.gz,/path/to/directory/containing/fastq_files/infected-3_R2.fastq.gz,auto |
Prepare the following ‘launch_QC.pbs’ script:
#!/bin/bash -l #PBS -N nfrnaseq_QC #PBS -l select=1:ncpus=2:mem=4gb #PBS -l walltime=24:00:00 #work on current directory (folder) cd $PBS_O_WORKDIR #load java and set up memory settings to run nextflow module load java NXF_OPTS='-Xms1g -Xmx4g' nextflow run nf-core/rnaseq \ -profile singularity \ -r 3.12.0 \ --input samplesheet.csv \ --outdir results \ --genome GRCh38 \ --skip_trimming \ --skip_alignment \ --skip_pseudo_alignment |
Once you have created the folder for the run, the samplesheet.csv file, nextflow.config, and launch.pbs, you are ready to submit.
Submit the run with this command
qsub launch.pbs |
You can use the command
qstat -u $USER |
Alternatively, use the command
qjobs |
to check on the jobs, you are running. Nextflow will launch additional jobs during the run.
You can also check the .nextflow.log file for details on what is going on.