...
This page provides a guide to QUT users on how to install and run the nextflow nf-core/rnaseq workflow on the HPC.
Further details on the workflow can be found at:
https://nf-co.re/rnaseq/3.0/usage
...
Pre-requisites
Installed conda3 or miniconda3 ( https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html )
Basic unix command line knowledge (example: https://researchcomputing.princeton.edu/education/external-online-resources/linux ; https://swcarpentry.github.io/shell-novice/ )
Familiarity with one unix text editors (example Vi/Vim or Nano):
Have an HPC account on QUT’s lyra
Install Nextflow
The nf-core/rnaseq workflow requires Nextflow to be installed in your account on the HPC. Find details on how to install and test Nextflow here prepare a cextflow.config file and run a PBS pro submission script for Nextflow pipelines.
Additional information available here: https://nf-co.re/usage/installation
Additional details on the workflow can be found at:
https://nf-co.re/rnaseq/3.0/usage
https://nf-co.re/rnaseq/3.0/output
Pipeline summary
FastQC - Raw read QC
UMI-tools extract - UMI barcode extraction
TrimGalore - Adapter and quality trimming
SortMeRNA - Removal of ribosomal RNA (optional)
STAR and Salmon - Fast spliced aware genome alignment and transcriptome quantification
STAR via RSEM - Alignment and quantification of expression levels
HISAT2 - Memory efficient splice aware alignment to a reference
SAMtools - Sort and index alignments
UMI-tools dedup - UMI-based deduplication
picard MarkDuplicates - Duplicate read marking
StringTie - Transcript assembly and quantification
BEDTools and bedGraphToBigWig - Create bigWig coverage files
RSeQC - Various RNA-seq QC metrics
Qualimap - Various RNA-seq QC metrics
dupRadar - Assessment of technical / biological read duplication
Preseq - Estimation of library complexity
featureCounts - Read counting relative to gene biotype
DESeq2 - PCA plot and sample pairwise distance heatmap and dendrogram
MultiQC - Present QC for raw reads, alignment, read counting and sample similiarity
Pseudo-alignment and quantification
Salmon - Wicked fast gene and isoform quantification relative to the transcriptome
Workflow reporting and genomes
Reference genome files - Saving reference genome indices/files
Pipeline information - Report metrics generated during the workflow execution
...