nf-core/smRNA-seq small RNAs

This page guides QUT users on installing and running the nextflow nf-core/rnaseq workflow on the HPC.

Pre-requisites

Basic Unix command line knowledge (example: https://researchcomputing.princeton.edu/education/external-online-resources/linux ; https://swcarpentry.github.io/shell-novice/ )
- https://sandbox.bio/
Familiarity with one unix text editors (example Vi/Vim or Nano):
- VIM ( https://bioinformatics.uconn.edu/vim-guide/ ; https://missing.csail.mit.edu/2020/editors/)
- Nano (https://engineering.purdue.edu/ECN/Support/KB/Docs/BasictutorialforNanou ; https://www.howtogeek.com/howto/42980/the-beginners-guide-to-nano-the-linux-command-line-text-editor/ )
Have an HPC account on QUT’s HPC server. Apply for a new HPC account here.
R tutorials:
- https://girke.bioinformatics.ucr.edu/GEN242/tutorials/

Install Nextflow

The nf-core/rnaseq workflow requires Nextflow to be installed on your HPC account. Find details on how to install and test Nextflow here. Prepare a nextflow.config file and run a PBS pro submission script for Nextflow pipelines.

Additional information is available here: https://nf-co.re/usage/installation

Additional details on the workflow can be found at:

Overview: https://nf-co.re/smrnaseq/2.2.1

Usage: https://nf-co.re/smrnaseq/2.2.1/docs/usage

Parameters: https://nf-co.re/smrnaseq/2.2.1/parameters

Pipeline Summary

Raw read QC (FastQC)
Adapter trimming (Trim Galore!)
1. Insert Size calculation
2. Collapse reads (seqcluster)
Contamination filtering (Bowtie2)
Alignment against miRBase mature miRNA (Bowtie1)
Alignment against miRBase hairpin
1. Unaligned reads from step 3 (Bowtie1)
2. Collapsed reads from step 2.2 (Bowtie1)
Post-alignment processing of miRBase hairpin
1. Basic statistics from step 3 and step 4.1 (SAMtools)
2. Analysis on miRBase, or MirGeneDB hairpin counts (edgeR)
  - TMM normalization and a table of top expression hairpin
  - MDS plot clustering samples
  - Heatmap of sample similarities
3. miRNA and isomiR annotation from step 4.1 (mirtop)
Alignment against host reference genome (Bowtie1)
1. Post-alignment processing of alignment against host reference genome (SAMtools)
Novel miRNAs and known miRNAs discovery (MiRDeep2)
1. Mapping against reference genome with the mapper module
2. Known and novel miRNA discovery with the mirdeep2 module
miRNA quality control (mirtrace)
Present QC for raw read, alignment, and expression results (MultiQC)

Prepare a metadata table for raw data (FASTQ files)

Create a CVS file, for example, named samplesheet.csv, that has the following information:

sample,fastq_1
CONTROL_REP1,/path/to/fastq/AEG588A1_S1_L002_R1_001.fastq.gz
CONTROL_REP2,/path/to/fastq/AEG588A2_S2_L002_R1_001.fastq.gz
CONTROL_REP3,/path/to/fastq/AEG588A3_S3_L002_R1_001.fastq.gz
TREATMENT_REP1,/path/to/fastq/AEG588A4_S4_L003_R1_001.fastq.gz
TREATMENT_REP2,/path/to/fastq/AEG588A5_S5_L003_R1_001.fastq.gz
TREATMENT_REP3,/path/to/fastq/AEG588A6_S6_L003_R1_001.fastq.gz
TREATMENT_REP3,/path/to/fastq/AEG588A6_S6_L004_R1_001.fastq.gz

Modify the example file with your project sample IDs and the path to the FASTQ file.

Run the nextflow nf-core/smrnaseq pipeline

Create a launch_nfsmRNAseq.pbs file that has the following information:

#!/bin/bash -l
#PBS -N nfsmrnaseq
#PBS -l select=1:ncpus=2:mem=4gb
#PBS -l walltime=24:00:00

cd $PBS_O_WORKDIR
module load java
NXF_OPTS='-Xms1g -Xmx4g'

nextflow run nf-core/smrnaseq -r 2.1.0 \
	-profile singularity \
	--outdir outdir \
	--input samplesheet_saliva.csv \
	--genome GRCh38 \
	--three_prime_adapter 'AACTGTAGGCACCATCAAT'\
	--fastp_min_length 18 \
	--fastp_max_length 30

Submit the job to the PBS scheduler:

qsub launch_phase3.pbs

monitor the progress on the HPC:

qjobs

Alternatively, view the progress of the submitted job on the Nextflow Tower.