Running R Scripts on the HPC
If all your data is on the HPC, or your analysis is too large or takes too long on your desktop/laptop, it is possible to run the R scripts on the HPC.
Preparing your R script for the HPC
QUT’s HPC is based on Linux so the path names of where your files are, are likely different on the HPC so we must update them to the HPC path.
Using R studio, you can adjust the paths in your script. In the DESeq2.R script, there are a number of places that we need to change for it to work on the HPC:
# The setwd line needs to be changed to: setwd("~/workshop/2024-2/session6_smallRNAseq/runs/run1_human_miRBase/DESeq2") # The line where you read in the mature counts table needs to be changed: metacounts <- read.table("~/workshop/2024-2/session6_smallRNAseq/runs/run1_human_miRBase/DESeq2/mature_counts.txt", header = TRUE, row.names = 1) # The line where you read in the metadata table needs to be changed: meta <- read.table("~/workshop/2024-2/session6_smallRNAseq/runs/run1_human_miRBase/DESeq2/metadata_microRNA.txt", header = TRUE)
The H: and W: drives to not exist on the HPC. The folders are there, just under a different path.
Preparing a Script to run the R script on the HPC
You will need to log on to the HPC, via PuTTY (or ssh to lyra in the terminal on a mac)…then change directory to where we saved the DESeq2.R file:
cd $HOME/workshop/2024-2/session6_smallRNAseq/runs/run1_human_miRBase/DESeq2
We will need to import the R libraries that we need for this R analysis (unlike when we used the rVDI, where we had imported them for you).
We will do this by making a folder for the libraries:
mkdir -p $HOME/workshop/2024-2/session6_smallRNAseq/runs/run1_human_miRBase/r_library
Then we will need to add these sections at the start of our DESeq2.R script:
#### Differential expression analysis #### # When you see '## USER INPUT', this means you have to modify the code for your computer or dataset. All other code can be run as-is (i.e. you don't need to understand the code, just run it) #### 2. Installing required packages #### # **NOTE: this section only needs to be run once (or occasionally to update the packages) # Install devtools chooseCRANmirror(graphics = getOption("menu.graphics"), ind = 3, local.only = TRUE) install.packages("devtools", repos = "https://cran.csiro.au/") # Install R packages. This only needs to be run once. bioconductor_packages <- c("DESeq2", "EnhancedVolcano", "org.Hs.eg.db", "org.Mm.eg.db", "org.Rn.eg.db", "org.EcK12.eg.db", "org.EcSakai.eg.db", "org.Dr.eg.db", "org.Dm.eg.db") cran_packages <- c("ggrepel", "ggplot2", "plyr", "reshape2", "readxl", "FactoMineR", "factoextra", "pheatmap") # Compares installed packages to above packages and returns a vector of missing packages new_packages <- bioconductor_packages[!(bioconductor_packages %in% installed.packages()[,"Package"])] new_cran_packages <- cran_packages[!(cran_packages %in% installed.packages()[,"Package"])] # Install missing bioconductor packages if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager", version = "3.16") BiocManager::install(new_packages) # Install missing cran packages if (length(new_cran_packages)) install.packages(new_cran_packages, repos = "https://cran.csiro.au/") # Update all installed packages to the latest version update.packages(bioconductor_packages, ask = FALSE) update.packages(cran_packages, ask = FALSE, repos = "https://cran.csiro.au/")
Also a job script needs to be built to request resources and run the script. This one works well for the DESeq2.R script…
Using R Studio, create a Text File (or Shell Script) and paste in the contents of this script…
File → New File → Text File (or Shell Script)
Save it as launch_R.pbs (or launch_R if saving as a Shell Script [.sh is added automatically]) in H:\workshop\2024\small_RNAseq\DESeq2 (Same folder as DESeq2.R) - Remember, H: is pointed at your HPC Home Folder.
#!/bin/bash -l #PBS -N R_analysis #PBS -l select=1:ncpus=1:mem=4gb #PBS -l walltime=02:00:00 #PBS -m abe module purge module load r/4.2.2-foss-2022b mkdir -p ~/workshop/2024-2/session6_smallRNAseq/runs/run1_human_miRBase/r_library export R_LIBS_USER='~/workshop/2024-2/session6_smallRNAseq/runs/run1_human_miRBase/r_library' cd $PBS_O_WORKDIR Rscript DESeq2.R
Running the Script on the HPC
Now the script is on the HPC, we can run it, but we have to convert it first. R Studio on Windows will save the text file as a “Windows” format file. The HPC has trouble reading this file so we can easily convert it “Linux” format file. Once we have converted the file, we can submit the script to the scheduler and wait for it to run. Copy and paste each of the unhashed lines into the linux command line on the HPC.
# Convert the launch_R.pbs to Linux format dos2unix launch_R.pbs #Once this is run, you do not need to run it again, unless you edit it on R Studio again # Submit the job to the HPC qsub launch_R.pbs # Check the status of the job qjobs