Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents

Public small RNA-seq data

Species

ENA link

Description

Human

https://www.ebi.ac.uk/ena/browser/view/PRJEB5212?show=publications

RNA-seq of micro RNAs (miRNAs) in Human prefrontal cortex to identify differentially expressed miRNAs between Huntington's Disease and control brain samples

1. Connect to an rVDI virtual desktop machine

To access and run an rVDI virtual desktop:

...

*NOTE: you need to be connected to the QUT network first, either being on campus or connecting remotely via VPN.

2. Open PuTTY terminal

  • Click on the PuTTY icon

  • Double click on “Lyra”

  • Fill your password and connect to the HPC

Precomputed results from session 6:

We ran the small RNA seq samples and the results can be found at:

...

Note: the “mature_counts.csv” needs to be transposed prior running the statistical analysis. This can be done either user the R script or using a script called “transpose_csv.py”.

Let’s initially create a “DESeq2” folder and copy the files needed for the statistical analysis:

...

Code Block
python transpose_csv.py --input mature_counts.csv --out mature_counts.txt

Differential expression analysis using RStudio

Differential expression analysis for smRNA-Seq is similar to regular RNA-Seq. Since you have already done the step-wise analysis in session 5, in this session we will streamline the analysis by running a single R script.

As with the previous RNA-Seq, we will also be running this smRNA-Seq differential expression analysis in RStudio on an rVDI virtual machine. The reason is the same as before - to save time as the required R packages are pre-installed on these virtual machines. And, as before, you can also copy and paste this script to RStudio on your local computer and adapt it to your own dataset.

1. Connect to an rVDI virtual desktop machine

To access and run an rVDI virtual desktop:

...

*NOTE: you need to be connected to the QUT network first, either being on campus or connecting remotely via VPN.

2. Create a working directory

As we discussed in session 5, R requires that you set a working directory, where it automatically looks for input files/data and outputs figures, tables, etc. We’ll need to first create this directory.

a. Open Windows Explorer.

b. Go to: H:\workshop\small_RNAseq

c. Create a new folder here called ‘DESeq2’ (NOTE: R is case-sensitive, so it must be named exactly like this)

3. Run analysis script in RStudio

a. Open RStudio

b. Create a new R script ('File'->'New File'-> ‘R script’)

c. Hit the save button and save this file in the working directory you created above (H:\workshop\small_RNAseq\DESeq2). Name the R script ‘DESeq2.R’.

...

Panel
bgColor#EAE6FF

Homework for you to try:

There is a different database for microRNA that we’ve analysed this dataset against, called MirGeneDB. MirGeneDB is a database of manually curated microRNA genes that have been validated and annotated as initially described in Fromm et al. 2015 and Fromm et al. 2020. MirGeneDB 2.1 includes more than 16,000 microRNA gene entries representing more than 1,500 miRNA families from 75 metazoan species and published in the 2022 NAR database issue.

The output of this analysis can be found at /work/training/2024/smallRNAseq/runs/run3_MirGeneDB, if you want to practice editing the R scripts we’ve given you to get the same plots as above for this analysis (in preparation for you doing it for your own data).

Running R Scripts on the HPC

If all your data is on the HPC, or your analysis is too large or takes too long on your desktop/laptop, it is possible to run the R scripts on the HPC.

Preparing your R script for the HPC

QUT’s HPC is based on Linux so the path names of where your files are, are likely different on the HPC so we must update them to the HPC path.

...

The H: and W: drives to not exist on the HPC. The folders are there, just under a different path.

Preparing a Script to run the R script on the HPC

A job script needs to be built to request resources and run the script. This one work's well for the DESeq2.R script:

...

Using R Studio, create a Text File and paste in the contents of this script.

Save it as launch_R.pbs in H:\workshop\small_RNAseq\DESeq2 (Same folder as DESeq2.R (Remember, H: is pointed at your HPC Home Folder.

Running the Script on the HPC

Now the script is on the HPC, we can run it, but we have to convert it first. R Studio on Windows will save the text file as a “Windows” format file. The HPC has trouble reading this file so we can easily convert it “Linux” format file. Once we have converted the file, we can submit the script to the scheduler and wait for it to run.

Code Block
# Convert the launch_R.pbs to Linux format
dos2unix launch_R.pbs
#Once this is run, you do not need to run it again, unless you edit it on R Studio again

# Submit the job to the HPC
qsub launch_R.pbs

# Check the status of the job
qjobs

Installing R packages on the HPC (Not Needed Today)

Just like R Studio on a Windows Computer, before you can run your R script you need to install the packages your script needs. We have done this for you for this training session but to install your own packages you can follow a procedure like this:

...