2024 eResearch - Session 6 - Hands-on smRNAseq training

1 Public small RNA-seq data
2 Exercise 1: Running a test with nf-core sample data
- 2.1 Submitting the job
- 2.2 Monitoring the Run
3 Exercise 2: Running the small RNA pipeline using public human data
- 3.1 Create the metadata file (samplesheet.csv):
- 3.2 Precomputed results:
4 Exercise 3: Running the small RNA pipeline using MirGeneDB
5 Differential expression analysis using RStudio
6 Running R Scripts on the HPC

Public small RNA-seq data

Species	ENA link	Description

Species	ENA link	Description
Human	https://www.ebi.ac.uk/ena/browser/view/PRJEB5212?show=publications	RNA-seq of micro RNAs (miRNAs) in Human prefrontal cortex to identify differentially expressed miRNAs between Huntington's Disease and control brain samples

1. Connect to an rVDI virtual desktop machine

To access and run an rVDI virtual desktop:

Go to https://rvdi.qut.edu.au/

Click on ‘VMware Horizon HTML Access’

Log on with your QUT username and password

*NOTE: you need to be connected to the QUT network first, either being on campus or connecting remotely via VPN.

2. Open PuTTY terminal

Click on the PuTTY icon
Double click on “Lyra”
Fill your password and connect to the HPC

Copying data for hands-on exercises

Before we start using the HPC, let’s start an interactive session:

qsub -I -S /bin/bash -l walltime=10:00:00 -l select=1:ncpus=1:mem=4gb

Get a copy of the scripts to be used in this module

Use the terminal to log into the HPC and create a /RNAseq/ folder to run the nf-core/rnaseq pipeline. For example:

mkdir -p $HOME/workshop/small_RNAseq/scripts
cp /work/training/smallRNAseq/scripts/* $HOME/workshop/small_RNAseq/scripts/
ls -l $HOME/workshop/small_RNAseq/scripts/

Line 1: The -p indicates create 'parental directories as required. Thus the line 1 command creates both /workshop/ and the subfolder /workshop/scripts/
Line 2: Copies all files from /work/datasets/workshop/scripts/ as noted by an asterisk to the newly created folder $HOME/workshop/scripts/
Line 3: List the files in the script folder

Copy multiple subdirectories and files using rsync

mkdir -p $HOME/workshop/small_RNAseq/data/
rsync -rv /work/training/smallRNAseq/data/ $HOME/workshop/small_RNAseq/data/

Line 1: The first command creates the folder /scripts/
Line 2: rsync copies all subfolders and files from the specified source folder to the selected destination folder. The -r = recursively will copy directories and files; -v = verbose messages of the transfer of files

Create a folder for running the nf-core small RNA-seq pipeline

Let’s create a “runs” folder to run the nf-core/rnaseq pipeline:

mkdir -p $HOME/workshop/small_RNAseq
mkdir $HOME/workshop/small_RNAseq/run1_test
mkdir $HOME/workshop/small_RNAseq/run2_smallRNAseq_human
cd $HOME/workshop/small_RNAseq/

Lines 1-4: create sub-folders for each exercise
Line 5: change the directory to the folder “small_RNAseq”

Exercise 1: Running a test with nf-core sample data

First, let’s assess the execution of the nf-core/rnaseq pipeline by running a test using sample data.

Copy the launch_nf-core_smallRNAseq_test.pbs to the working directory

cd $HOME/workshop/small_RNAseq/run1_test
cp $HOME/workshop/small_RNAseq/scripts/launch_nf-core_smallRNAseq_test.pbs .

View the content of the script as follows:

cat launch_nf-core_smallRNAseq_test.pbs

#!/bin/bash -l #PBS -N nfsmrnaseq #PBS -l select=1:ncpus=2:mem=4gb #PBS -l walltime=24:00:00 #work on current directory (folder) cd $PBS_O_WORKDIR #load java and set up memory settings to run nextflow module load java export NXF_OPTS='-Xms1g -Xmx4g' # run the test nextflow run nf-core/smrnaseq -profile test,singularity --outdir results -r 2.1.0

#!/bin/bash -l #PBS -N nfsmrnaseq #PBS -l select=1:ncpus=2:mem=4gb #PBS -l walltime=24:00:00 #work on current directory (folder) cd $PBS_O_WORKDIR #load java and set up memory settings to run nextflow module load java export NXF_OPTS='-Xms1g -Xmx4g' # run the test nextflow run nf-core/smrnaseq -profile test,singularity --outdir results -r 2.1.0

where:

nextflow command: nextflow run
pipeline name: nf-core/smrnaseq
pipeline version: -r 2.1.0
container type and sample data: -profile test,singularity
output directory: --outdir results

Submitting the job

Now we can submit the small RNAseq test job to the HPC scheduler:

qsub launch_nf-core_smallRNAseq_test.pbs

Monitoring the Run

qjobs

Exercise 2: Running the small RNA pipeline using public human data

The pipeline requires preparing at least 2 files:

Metadata file (samplesheet.csv) that specifies the “sample name” and “location of FASTQ files” ('Read 1').
PBS Pro script (launch_nf-core_smallRNAseq_human.pbs) with instructions to run the pipeline

Create the metadata file (samplesheet.csv):

Change to the data folder directory:

cd $HOME/workshop/small_RNAseq/data/human
pwd

Copy the bash script to the working folder

cp /work/training/smallRNAseq/scripts/create_nf-core_smallRNAseq_samplesheet.sh $HOME/workshop/small_RNAseq/data/human

Note: you could replace ‘$HOME/workshop/data’ with “.” A dot indicates ‘current directory’ and will copy the file to the directory where you are currently located

View the content of the script:

cat create_nf-core_smallRNAseq_samplesheet.sh

#!/bin/bash -l

#User defined variables.

##########################################################

DIR='$HOME/workshop/small_RNAseq/data/human'

INDEX='samplesheet.csv'

##########################################################

#load python module

module load python/3.10.8-gcccore-12.2.0

#fetch the script to create the sample metadata table

wget -L https://raw.githubusercontent.com/nf-core/rnaseq/master/bin/fastq_dir_to_samplesheet.py

chmod +x fastq_dir_to_samplesheet.py

#generate initial sample metadata file

./fastq_dir_to_samplesheet.py $DIR index.csv \

--strandedness auto \

--read1_extension .fastq.gz

#format index file

cat index.csv | awk -F "," '{print $1 "," $2}' > ${INDEX}

#Remove intermediate files:

rm index.csv fastq_dir_to_samplesheet.py

#!/bin/bash -l

#User defined variables.

##########################################################

DIR='$HOME/workshop/small_RNAseq/data/human'

INDEX='samplesheet.csv'

##########################################################

#load python module

module load python/3.10.8-gcccore-12.2.0

#fetch the script to create the sample metadata table

wget -L https://raw.githubusercontent.com/nf-core/rnaseq/master/bin/fastq_dir_to_samplesheet.py

chmod +x fastq_dir_to_samplesheet.py

#generate initial sample metadata file

./fastq_dir_to_samplesheet.py $DIR index.csv \

--strandedness auto \

--read1_extension .fastq.gz

#format index file

cat index.csv | awk -F "," '{print $1 "," $2}' > ${INDEX}

#Remove intermediate files:

rm index.csv fastq_dir_to_samplesheet.py

Let’s generate the metadata file by running the following command:

sh create_RNAseq_samplesheet.sh

Check the newly created samplesheet.csv file:

ls -l
cat samplesheet.cvs

sample,fastq_1

SRR20753704,/work/training/smallRNAseq/data/SRR20753704.fastq.gz

SRR20753705,/work/training/smallRNAseq/data/SRR20753705.fastq.gz

SRR20753706,/work/training/smallRNAseq/data/SRR20753706.fastq.gz

SRR20753707,/work/training/smallRNAseq/data/SRR20753707.fastq.gz

SRR20753708,/work/training/smallRNAseq/data/SRR20753708.fastq.gz

SRR20753709,/work/training/smallRNAseq/data/SRR20753709.fastq.gz

SRR20753716,/work/training/smallRNAseq/data/SRR20753716.fastq.gz

SRR20753717,/work/training/smallRNAseq/data/SRR20753717.fastq.gz

SRR20753718,/work/training/smallRNAseq/data/SRR20753718.fastq.gz

SRR20753719,/work/training/smallRNAseq/data/SRR20753719.fastq.gz

SRR20753720,/work/training/smallRNAseq/data/SRR20753720.fastq.gz

SRR20753721,/work/training/smallRNAseq/data/SRR20753721.fastq.gz