Version 2.1.0 nf-core/smRNA-seq small RNAs
This page guides QUT users on installing and running the nextflow nf-core/rnaseq workflow on the HPC.
Pre-requisites
Basic Unix command line knowledge (example: https://researchcomputing.princeton.edu/education/external-online-resources/linux ; https://swcarpentry.github.io/shell-novice/ )
Familiarity with one unix text editors (example Vi/Vim or Nano):
Have an HPC account on QUT’s HPC server. Apply for a new HPC account here.
R tutorials:
Install Nextflow
The nf-core/rnaseq workflow requires Nextflow to be installed on your HPC account. Find details on how to install and test Nextflow here. Prepare a nextflow.config file and run a PBS pro submission script for Nextflow pipelines.
Additional information is available here: https://nf-co.re/usage/installation
Additional details on the workflow can be found at:
Overview: https://nf-co.re/smrnaseq/2.2.1
Usage: https://nf-co.re/smrnaseq/2.2.1/docs/usage
Parameters: https://nf-co.re/smrnaseq/2.2.1/parameters
Pipeline Summary
Raw read QC (
FastQC
)Adapter trimming (
Trim Galore!
)Insert Size calculation
Collapse reads (
seqcluster
)
Contamination filtering (
Bowtie2
)Alignment against miRBase mature miRNA (
Bowtie1
)Alignment against miRBase hairpin
Unaligned reads from step 3 (
Bowtie1
)Collapsed reads from step 2.2 (
Bowtie1
)
Post-alignment processing of miRBase hairpin
Basic statistics from step 3 and step 4.1 (
SAMtools
)Analysis on miRBase, or MirGeneDB hairpin counts (
edgeR
)TMM normalization and a table of top expression hairpin
MDS plot clustering samples
Heatmap of sample similarities
miRNA and isomiR annotation from step 4.1 (
mirtop
)
Alignment against host reference genome (
Bowtie1
)Post-alignment processing of alignment against host reference genome (
SAMtools
)
Novel miRNAs and known miRNAs discovery (
MiRDeep2
)Mapping against reference genome with the mapper module
Known and novel miRNA discovery with the mirdeep2 module
miRNA quality control (
mirtrace
)Present QC for raw read, alignment, and expression results (
MultiQC
)
Prepare a metadata table for raw data (FASTQ files)
Create a CVS file, for example, named samplesheet.csv, that has the following information:
sample,fastq_1
CONTROL_REP1,/path/to/fastq/AEG588A1_S1_L002_R1_001.fastq.gz
CONTROL_REP2,/path/to/fastq/AEG588A2_S2_L002_R1_001.fastq.gz
CONTROL_REP3,/path/to/fastq/AEG588A3_S3_L002_R1_001.fastq.gz
TREATMENT_REP1,/path/to/fastq/AEG588A4_S4_L003_R1_001.fastq.gz
TREATMENT_REP2,/path/to/fastq/AEG588A5_S5_L003_R1_001.fastq.gz
TREATMENT_REP3,/path/to/fastq/AEG588A6_S6_L003_R1_001.fastq.gz
TREATMENT_REP3,/path/to/fastq/AEG588A6_S6_L004_R1_001.fastq.gz
Modify the example file with your project sample IDs and the path to the FASTQ file.
Run the nextflow nf-core/smrnaseq pipeline
Create a launch_nfsmRNAseq.pbs file that has the following information:
#!/bin/bash -l
#PBS -N nfsmrnaseq
#PBS -l select=1:ncpus=2:mem=4gb
#PBS -l walltime=24:00:00
cd $PBS_O_WORKDIR
module load java
NXF_OPTS='-Xms1g -Xmx4g'
nextflow run nf-core/smrnaseq -r 2.1.0 \
-profile singularity \
--outdir outdir \
--input samplesheet.csv \
--genome GRCh38 \
--three_prime_adapter 'AACTGTAGGCACCATCAAT'\
--fastp_min_length 18 \
--fastp_max_length 30
Submit the job to the PBS scheduler:
qsub launch_phase3.pbs
monitor the progress on the HPC:
Alternatively, view the progress of the submitted job on the Nextflow Tower.