25S1W1 - 2. Setup
Overview
Installing R and Rstudio
You have three main options for running this analysis in RStudio:
Use QUTs rVDI virtual desktop machines
Install R and RStudio on your own PC
Use the provided PCs in the QUT computer labs
Option1: Use QUTs rVDI virtual desktop machines
This is the preferred method, as R and RStudio are already installed, as are all the required R packages needed for analysis. Installing all of these can take over 30 minutes on your own PC, so using an rVDI machine saves time.
rVDI provides a virtual Windows desktop that can be run in your web browser.
To access and run an rVDI virtual desktop:
Go to https://rvdi.qut.edu.au/
Click on ‘VMware Horizon HTML Access’
Log on with your QUT username and password.
Click the ‘R Workshop’ option, which is already setup for this workshop.
*NOTE: you need to be connected to the QUT network first, either being on campus or connecting remotely via VPN.
Option2: Install R and RStudio on your own PC
Go to the following page Posit and follow the instructions provided to install first R and then Rstudio.
Download and install R, following the default prompts:
https://cran.r-project.org/bin/windows/base/
Download and install RStudio, following the default prompts:
Option3: Use the provided PCs in the QUT computer labs
The PCs in the computer labs already have R and RStudio installed. If using this option, you will need to install the required R packages (unlike rVDI). The code for installing these packages is in the analysis section below.
Map your HPC home directory
On the rVDI machines, your HPC home directory is already mapped to H drive.
If you are using a different computer, follow the instructions here to map your HPC home directory:
Interactive HPC session
In the next step we’re going to copy some data files to your home directory.
Just like in the last session, we’re going to start an interactive PBS session to do this, so we’re not all copying large datasets on the head node.
Use PuTTy to connect to aqua
and once you’ve logged into aqua
copy and paste the text below into the command prompt:
qsub -I -S /bin/bash -l walltime=4:00:00 -l select=1:ncpus=8:mem=32gb
Wait until the interactive session starts, then move to the next step (below).
Create your workshop folders and check your data
In your H drive (the drive you just mapped) check to see if there is an ASV_table.tsv
in
H:\workshop\2025\S1W1\metagenomics\runs\run2_ampliseq\results\dada2
This is the base data file we’ll be working with - an abundance table of read counts per ASV (i.e. taxonomic group) per sample. Using this we’ll be able to quantify and visualise taxonomic diversity and structure, using R.
Check to see if this file is in H:\workshop\2025\S1W1\metagenomics\data\illumina\
If you can’t see the ASV_table.tsv
file (your Nextflow job may have failed, for example), run the following to copy these from a successful nfcore/ampliseq run to your HPC home directory:
cp -r /work/training/metagenomics/public_data/Illumina/results $HOME/workshop/2025/S1W1/metagenomics/runs/run2_ampliseq/results/dada2/
The analysis also requires the metadata.tsv
file that we need to create. We will create a folder and copy and paste this script into it:
mkdir -p $HOME/workshop/2025/S1W1/metagenomics/data/illumina/fastq/
cp /work/training/2025/S1W1/session3_metagenomics/scripts/metadata.tsv $HOME/workshop/2025/S1W1/metagenomics/data/illumina/fastq/
If we have a look at this file, using the cat metadata.tsv
command, it will show this:
ID Nose_size Batch
Illumina10 Big Early
Illumina11 Big Early
Illumina12 Big Early
Illumina13 Big Early
Illumina14 Big Early
Illumina15 Big Early
Illumina16 Big Early
Illumina17 Big Early
Illumina18 Big Early
Illumina19 Big Early
Illumina1 Big Early
Illumina20 Big Early
Illumina21 Medium Early
Illumina22 Medium Early
Illumina23 Medium Early
Illumina24 Medium Early
Illumina25 Medium Early
Illumina26 Medium Early
Illumina27 Medium Early
Illumina28 Medium Early
Illumina29 Medium Early
Illumina2 Medium Early
Illumina30 Medium Early
Illumina31 Medium Early
Illumina32 Medium Early
Illumina33 Medium Early
Illumina34 Medium Early
Illumina35 Medium Early
Illumina36 Medium Early
Illumina37 Small Early
Illumina38 Small Early
Illumina39 Small Early
Illumina3 Small Early
Illumina40 Small Early
Illumina41 Small Early
Illumina42 Small Early
Illumina43 Small Early
Illumina44 Small Early
Illumina45 Small Early
Illumina46 Small Late
Illumina47 Small Late
Illumina48 Small Late
Illumina49 Small Late
Illumina4 Small Late
Illumina50 Small Late
Illumina51 Small Late
Illumina52 Small Late
Illumina53 Small Late
Illumina54 Small Late
Illumina55 Small Late
Illumina56 Small Late
Illumina57 Small Late
Illumina58 Small Late
Illumina59 Small Late
Illumina5 Small Late
Illumina6 Small Late
Illumina7 Small Late
Illumina8 Small Late
Illumina9 Small Late
We’ll need to also create a working directory to store our R scripts and the results of today’s analysis.
In PuTTY run the following:
mkdir $HOME/workshop/2025/S1W1/metagenomics/R_analysis
Downloading the sarek results
In section 7 of today’s workshop we’re going to do some downstream analysis of the nfcore/sarek variant calling workflow that was completed in session 2 of these workshops. We realise not everyone attended these (or didn’t successfully complete the sarek workflows), so you can download the results of successful runs by running the following command in PuTTY:
mkdir $HOME/workshop/2025/S1W1/metagenomics/sarek
cp -r /work/training/2025/S1W1/session2_variant_calling/runs/ $HOME/workshop/2025/S1W1/metagenomics/sarek
While these datasets are copying we can move to the next section (leave PuTTY running).
Open RStudio and create a new R script
The RStudio icon is on your rVDI desktop. Open RStudio.
In this workshop we will not be teaching you R or RStudio, other than the very basics. For the most part you can just copy/paste the code into R and run it. There are a multitude of beginner’s R course out there, for example this QCIF course: https://www.qcif.edu.au/trainingcourses/introduction-to-programming%3A-r-for-reproducible-scientific-analysis . QCIF courses are free to QUT staff and HDR students.
An overview of how to navigate the RStudio GUI is here: https://heardlibrary.github.io/digital-scholarship/script/r/navigate/
Create a new script by ‘File’ → “New File” → “R script”. Now hit ‘File’ → ‘Save’ and save the script in H:/workshop/2025/S1W1/metagenomics/R_analysis (i.e. the directory we just created). Give the script a name. Call it “R_metagenomics”.
Go to the next section, ‘Setting up your R environment’