This page provides a guide to QUT users to run the nf-core/sarek workflow on the QUT HPC.
Further Further details on the workflow can be found at:
...
FastQC - Raw read QC
UMI-tools extract - UMI barcode extraction
TrimGalore - Adapter and quality trimming
SortMeRNA - Removal of ribosomal RNA (optional)
STAR and Salmon - Fast spliced aware genome alignment and transcriptome quantification
STAR via RSEM - Alignment and quantification of expression levels
HISAT2 - Memory efficient splice aware alignment to a reference
SAMtools - Sort and index alignments
UMI-tools dedup - UMI-based deduplication
picard MarkDuplicates - Duplicate read marking
StringTie - Transcript assembly and quantification
BEDTools and bedGraphToBigWig - Create bigWig coverage files
RSeQC - Various RNA-seq QC metrics
Qualimap - Various RNA-seq QC metrics
dupRadar - Assessment of technical / biological read duplication
Preseq - Estimation of library complexity
featureCounts - Read counting relative to gene biotype
DESeq2 - PCA plot and sample pairwise distance heatmap and dendrogram
MultiQC - Present QC for raw reads, alignment, read counting and sample similiarity
Pseudo-alignment and quantification
Salmon - Wicked fast gene and isoform quantification relative to the transcriptome
Workflow reporting and genomes
Reference genome files - Saving reference genome indices/files
Pipeline information - Report metrics generated during the workflow execution
Getting Started
Download and run the workflow using a minimal data provided by nf-core/rnaseq. We recommend using singularity as the profile for QUT’s HPC. Other profile option can be ‘conda’. Note: the profile option ‘docker’ is not available on the HPC.
Code Block |
---|
nextflow run nf-core/rnaseq -profile test,singularity |
Running the pipeline using custom data
Example of a typical command to run a RNA-seq analysis for mouse samples:
Code Block |
---|
nextflow run nf-core/rnaseq \
--input samplesheet.csv \
--genome GRCm38 \
-profile conda |
Note, if the running was interrupted or did not complete a particular step or you want to modify a parameter for a particular step, instead of re-running all process nextflow enables to “-resume” the workflow.
Code Block |
---|
nextflow run nf-core/rnaseq \
--input samplesheet.csv \
--genome GRCm38 \
-profile conda
-resume |
Preparing a ‘samplesheet.csv’ file
A samplesheet.csv file tells the workflow the location of the read 1 (R1), read 2 (R2) and other information about the samples including ‘group’ (i.e., control or infected), replicate number and the orientation of the reads (i.e., forward, reverse, unstranded).
Example samplesheet.csv:
Code Block |
---|
group,replicate,fastq_1,fastq_2,strandedness
control,1,/path/to/fastq/control-1_R1.fastq.gz,/path/to/fastq/control-1_R2.fastq.gz,unstranded
control,2,/path/to/fastq/control-2_R1.fastq.gz,/path/to/fastq/control-2_R2.fastq.gz,unstranded
control,3,/path/to/fastq/control-3_R1.fastq.gz,/path/to/fastq/control-3_R2.fastq.gz,unstranded
infected,1,/path/to/fastq/infected-1_R1.fastq.gz,/path/to/fastq/infected-1_R2.fastq.gz,unstranded
infected,2,/path/to/fastq/infected-1_R1.fastq.gz,/path/to/fastq/infected-2_R2.fastq.gz,unstranded
infected,3,/path/to/fastq/infected-1_R1.fastq.gz,/path/to/fastq/infected-3_R2.fastq.gz,unstranded |
Preparing to run on the HPC
To run this on the HPC a PBS submission script needs to be created.
In the folder you have created for this run create launch.pbs using a text editor (i.e., vim, nano)
Code Block |
---|
#!/bin/bash -l
#PBS -N nfrna2
#PBS -l select=1:ncpus=2:mem=4gb
#PBS -l walltime=24:00:00
cd $PBS_O_WORKDIR
module load java
NXF_OPTS='-Xms1g -Xmx4g'
nextflow run nf-core/rnaseq -profile conda --input samplesheet.csv --genome GRCm38 --aligned star_rsem --min_mapped_reads 5 |
Submitting the job
Once you have created the folder for the run, the input.tsv file, nextflow.config and launch.pbs you are ready to submit.
Submit the run with this command (On Lyra)
Code Block |
---|
qsub launch.pbs |
Monitoring the Run
You can use the command
Code Block |
---|
qstat -u $USER |
To check on the jobs you are running. Nextflow will launch additional jobs during the run.
You can also check the .nextflow.log file for details on what is going on.
Finally, if you have configured the connection to the NFTower you can logon and check your run.