Hands-on smRNAseq training

1 Pre-requisites:
- 1.1 Installing Putty and connecting to the HPC (Windows users; Mac users can directly use the available ‘terminal’ app)
2 BYO data or download public small RNA-seq datasets
3 Download Reference microRNA sequences from miRBase
4 Run a test
- 4.1 Submitting the job
- 4.2 Monitoring the Run
5 Preparing a sample metadata file
6 Run the nextflow nf-core/smRNAseq pipeline.

Pre-requisites:

Review this one-hour-long detailed introduction to VIM editor: https://www.youtube.com/watch?v=IiwGbcd8S7I
(optional) Familiarity with one unix text editors (for example Vi/Vim or Nano):
- VIM ( VIM Guide | Computational Biology Core ; Editors (Vim))
- Nano (Basic tutorial for Nano users ; https://www.howtogeek.com/howto/42980/the-beginners-guide-to-nano-the-linux-command-line-text-editor/ )

Installing Putty and connecting to the HPC (Windows users; Mac users can directly use the available ‘terminal’ app)

Install Putty:

Installing PuTTY - QUT Media Hub

Connect to the HPC:

Connecting to the HPC with PuTTY - QUT MediaHub

BYO data or download public small RNA-seq datasets

Either bring your own dataset or use the following guide to Download public small RNA-see data

Public human small RNAseq data:

https://www.ebi.ac.uk/ena/browser/view/PRJEB5212 RNA-seq of micro RNAs (miRNAs) in Human prefrontal cortex to identify differentially expressed miRNAs between Huntington's Disease and control brain samples

Download Reference microRNA sequences from miRBase

First, let’s download a copy of miRBAse reference sequences, including hairpin and mature microRNA sequences.

microRNA mature sequences:

wget https://www.mirbase.org/download_file/mature.fa

Hairpin sequences:

wget https://www.mirbase.org/download_file/hairpin.fa

Fetch the genomic coordinated for precursors and mature sequences:

--mirna_gtf /work/trtp/data/mirbase/hsa.gff3

Alternatively, submit the following PBS Pro script to the cluster. Before running the script, create a ‘reference’ folder (i.e., /myteam/data/reference/ ).

Run a test

Before running the pipeline with real data, run the following test:

To submit the above command to the HPC cluster, prepare the following script:

Submitting the job

Once you have created the folder for the run, the samplesheet.csv file, nextflow.config, and launch.pbs, you are ready to submit.

Submit the run with this command

Monitoring the Run

You can use the command

Alternatively, use the command

to check on the job that you are running. Note, Nextflow will launch additional jobs during the run.

You can also check the .nextflow.log file for details on what is going on.

Preparing a sample metadata file

Now let’s prepare a samplesheet.csv file that specifies the name of your samples and the location of the raw FASTQ files

To generate the above file, let’s use the following PBS Pro script (i.e., called “launch_create_smRNAseq_samplesheet.pbs”)

Assign to the “DIR” variable above the path where the raw FASTQ files are located. For example:

Copy and paste the path to the above script using VI or VIM (check prerequisites above).

Run the nextflow nf-core/smRNAseq pipeline.

Create a launch_nfsmRNAseq.pbs file that has the following information:

Submit the job to the HPC cluster:

Monitor the progress: