...
This page provides a guide to QUT users on how to install and run the nextflow nf-core/rnaseq workflow on the HPC.
Pre-requisites
Installed conda3 or miniconda3 ( https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html )
Basic Unix command line knowledge (example: https://researchcomputing.princeton.edu/education/external-online-resources/linux ; https://swcarpentry.github.io/shell-novice/ )
Familiarity with one unix text editors (example Vi/Vim or Nano):
Have an HPC account on QUT’s HPC compute. Apply for a new HPC account here.
R tutorials:
Install Nextflow
The nf-core/rnaseq workflow requires Nextflow to be installed in your account on the HPC. Find details on how to install and test Nextflow here. Prepare a nextflow.config file and run a PBS pro submission script for Nextflow pipelines.
Additional information is available here: https://nf-co.re/usage/installation
Additional details on the workflow can be found at:
Overview: https://nf-co.re/rnaseq/3.0
...
GitHub: https://github.com/nf-core/rnaseq
Pipeline Summary
...
FastQC - Raw read QC
UMI-tools extract - UMI barcode extraction
TrimGalore - Adapter and quality trimming
SortMeRNA - Removal of ribosomal RNA (optional)
STAR and Salmon - Fast spliced aware genome alignment and transcriptome quantification
STAR via RSEM - Alignment and quantification of expression levels
HISAT2 - Memory efficient splice aware alignment to a reference
SAMtools - Sort and index alignments
UMI-tools dedup - UMI-based deduplication
picard MarkDuplicates - Duplicate read marking
StringTie - Transcript assembly and quantification
BEDTools and bedGraphToBigWig - Create bigWig coverage files
RSeQC - Various RNA-seq QC metrics
Qualimap - Various RNA-seq QC metrics
dupRadar - Assessment of technical/biological read duplication
Preseq - Estimation of library complexity
featureCounts - Read counting relative to gene biotype
DESeq2 - PCA plot and sample pairwise distance heatmap and dendrogram
MultiQC - Present QC for raw reads, alignment, read counting, and sample similarity
Pseudo-alignment and quantification
Salmon - Wicked fast gene and isoform quantification relative to the transcriptome
Workflow reporting and genomes
Reference genome files - Saving reference genome indices/files
Pipeline information - Report metrics generated during the workflow execution
Getting Started
Download and run the workflow using minimal data provided by nf-core/rnaseq. We recommend using singularity as the profile for QUT’s HPC. Another profile option can be ‘conda.’ Note: the profile option ‘docker’ is unavailable on the HPC.
Code Block |
---|
nextflow run nf-core/rnaseq -profile test,singularity --outdir results -r 3.10.1 |
Running the pipeline using custom data
Example of a typical command to run an RNA-seq analysis for mouse samples:
...
Code Block |
---|
nextflow run nf-core/rnaseq --input samplesheet.csv \ --outdir results \ -r 3.10.1 \ --genome GRCh38 \ -profile singularity \ --aligner star_rsem \ --clip_r1 10 \ --clip_r2 10 \ --three_prime_clip_r1 1 \ --three_prime_clip_r2 1 \ -resume |
Preparing a ‘samplesheet.csv’ file
Prepare a sample sheet file that specifies the input files to be used. To do this, we use an nf-core script to generate the ‘samplesheet.csv’ file as follows:
...
Code Block |
---|
sample,fastq_1,fastq_2,strandedness control_1,/path/to/fastq/control-1_R1.fastq.gz,/path/to/fastq/control-1_R2.fastq.gz,unstranded control_2,/path/to/fastq/control-2_R1.fastq.gz,/path/to/fastq/control-2_R2.fastq.gz,unstranded control_3,/path/to/fastq/control-3_R1.fastq.gz,/path/to/fastq/control-3_R2.fastq.gz,unstranded infected_1,/path/to/fastq/infected-1_R1.fastq.gz,/path/to/fastq/infected-1_R2.fastq.gz,unstranded infected_2,/path/to/fastq/infected-1_R1.fastq.gz,/path/to/fastq/infected-2_R2.fastq.gz,unstranded infected_3,/path/to/fastq/infected-1_R1.fastq.gz,/path/to/fastq/infected-3_R2.fastq.gz,unstranded |
Preparing to run on the HPC
To run this on the HPC a PBS submission script needs to be created using a text editor. For example, create a file called launch.pbs using a text editor of choice (i.e., vi or nano) and then copy and paste the code below:
...
Code Block |
---|
#!/bin/bash -l #PBS -N nfrna2 #PBS -l select=1:ncpus=2:mem=4gb #PBS -l walltime=24:00:00 #work on current directory (folder) cd $PBS_O_WORKDIR #load java and set up memory settings to run nextflow module load java NXF_OPTS='-Xms1g -Xmx4g' #run the rnaseq pipeline nextflow run nf-core/rnaseq --input samplesheet.csv \ --outdir results \ -r 3.10.1 \ --genome GRCh38 \ -profile singularity \ --aligner star_rsem \ --clip_r1 10 \ --clip_r2 10 \ --three_prime_clip_r1 1 \ --three_prime_clip_r2 1 |
Submitting the job
Once you have created the folder for the run, the input.tsv file, nextflow.config, and launch.pbs, you are ready to submit.
...
Code Block |
---|
qsub launch.pbs |
Monitoring the Run
You can use the command
Code Block |
---|
qstat -u $USER |
...