Overview

Installing R and Rstudio

You have three main options for running this analysis in RStudio:

  1. Use QUTs rVDI virtual desktop machines

  2. Install R and RStudio on your own PC

  3. Use the provided PCs in the QUT computer labs

Option1: Use QUTs rVDI virtual desktop machines

This is the preferred method, as R and RStudio are already installed, as are all the required R packages needed for analysis. Installing all of these can take over 30 minutes on your own PC, so using an rVDI machine saves time.

rVDI provides a virtual Windows desktop that can be run in your web browser.

To access and run an rVDI virtual desktop:

Go to https://rvdi.qut.edu.au/

Click on ‘VMware Horizon HTML Access

Log on with your QUT username and password.

Click the ‘R Workshop’ option, which is already setup for this workshop.

*NOTE: you need to be connected to the QUT network first, either being on campus or connecting remotely via VPN.

Option2: Install R and RStudio on your own PC

Go to the following page https://posit.co/download/rstudio-desktop/ and follow the instructions provided to install first R and then Rstudio.

Download and install R, following the default prompts:

https://cran.r-project.org/bin/windows/base/

Download and install RStudio, following the default prompts:

https://posit.co/download/rstudio-desktop/

Option3: Use the provided PCs in the QUT computer labs

The PCs in the computer labs already have R and RStudio installed. If using this option, you will need to install the required R packages (unlike rVDI). The code for installing these packages is in the analysis section below.

Map your HPC home directory

On the rVDI machines, your HPC home directory is already mapped to H drive.

If you are using a different computer, follow the instructions here to map your HPC home directory:

https://qutvirtual4.qut.edu.au/group/staff/research/conducting/facilities/advanced-research-computing-storage/supercomputing/using-hpc-filesystems

Interactive HPC session

In the next step we’re going to copy some data files to your home directory.

Just like in the last session, we’re going to start an interactive PBS session to do this, so we’re not all copying large datasets on the head node.

Use PuTTy to connect to aqua and once you’ve logged into aqua copy and paste the text below into the command prompt:

qsub -I -S /bin/bash -l walltime=4:00:00 -l select=1:ncpus=8:mem=32gb

Wait until the interactive session starts, then move to the next step (below).

Create your workshop folders and check your data

In your H drive (the drive you just mapped) check to see if there is an ASV_table.tsv in

H:\workshop\2025\S1W1\metagenomics\runs\run2_ampliseq\results\dada2

This is the base data file we’ll be working with - an abundance table of read counts per ASV (i.e. taxonomic group) per sample. Using this we’ll be able to quantify and visualise taxonomic diversity and structure, using R.

Check to see if this file is in H:\workshop\2025\S1W1\metagenomics\data\illumina\

If you can’t see the ASV_table.tsv file (your Nextflow job may have failed, for example), run the following to copy these from a successful nfcore/ampliseq run to your HPC home directory:

cp -r /work/training/metagenomics/public_data/Illumina/results $HOME/workshop/2025/S1W1/metagenomics/runs/run2_ampliseq/results/dada2/

The analysis also requires the metadata.tsv file that we need to create. We will create a folder and copy and paste this script into it:

mkdir -p $HOME/workshop/2025/S1W1/metagenomics/data/illumina/fastq/
cp /work/training/2025/S1W1/session3_metagenomics/scripts/metadata.tsv $HOME/workshop/2025/S1W1/metagenomics/data/illumina/fastq/

If we have a look at this file, using the cat metadata.tsv command, it will show this:

ID	Nose_size	Batch
Illumina10	Big	Early
Illumina11	Big	Early
Illumina12	Big	Early
Illumina13	Big	Early
Illumina14	Big	Early
Illumina15	Big	Early
Illumina16	Big	Early
Illumina17	Big	Early
Illumina18	Big	Early
Illumina19	Big	Early
Illumina1	Big	Early
Illumina20	Big	Early
Illumina21	Medium	Early
Illumina22	Medium	Early
Illumina23	Medium	Early
Illumina24	Medium	Early
Illumina25	Medium	Early
Illumina26	Medium	Early
Illumina27	Medium	Early
Illumina28	Medium	Early
Illumina29	Medium	Early
Illumina2	Medium	Early
Illumina30	Medium	Early
Illumina31	Medium	Early
Illumina32	Medium	Early
Illumina33	Medium	Early
Illumina34	Medium	Early
Illumina35	Medium	Early
Illumina36	Medium	Early
Illumina37	Small	Early
Illumina38	Small	Early
Illumina39	Small	Early
Illumina3	Small	Early
Illumina40	Small	Early
Illumina41	Small	Early
Illumina42	Small	Early
Illumina43	Small	Early
Illumina44	Small	Early
Illumina45	Small	Early
Illumina46	Small	Late
Illumina47	Small	Late
Illumina48	Small	Late
Illumina49	Small	Late
Illumina4	Small	Late
Illumina50	Small	Late
Illumina51	Small	Late
Illumina52	Small	Late
Illumina53	Small	Late
Illumina54	Small	Late
Illumina55	Small	Late
Illumina56	Small	Late
Illumina57	Small	Late
Illumina58	Small	Late
Illumina59	Small	Late
Illumina5	Small	Late
Illumina6	Small	Late
Illumina7	Small	Late
Illumina8	Small	Late
Illumina9	Small	Late

We’ll need to also create a working directory to store our R scripts and the results of today’s analysis.

In PuTTY run the following:

mkdir $HOME/workshop/2025/S1W1/metagenomics/R_analysis

Downloading the sarek results

In section 7 of today’s workshop we’re going to do some downstream analysis of the nfcore/sarek variant calling workflow that was completed in session 2 of these workshops. We realise not everyone attended these (or didn’t successfully complete the sarek workflows), so you can download the results of successful runs by running the following command in PuTTY:

mkdir $HOME/workshop/2025/S1W1/metagenomics/sarek
cp -r /work/training/2025/S1W1/session2_variant_calling/runs/ $HOME/workshop/2025/S1W1/metagenomics/sarek

While these datasets are copying we can move to the next section (leave PuTTY running).

Open RStudio and create a new R script

The RStudio icon is on your rVDI desktop. Open RStudio.

In this workshop we will not be teaching you R or RStudio, other than the very basics. For the most part you can just copy/paste the code into R and run it. There are a multitude of beginner’s R course out there, for example this QCIF course: https://www.qcif.edu.au/trainingcourses/introduction-to-programming%3A-r-for-reproducible-scientific-analysis . QCIF courses are free to QUT staff and HDR students.

An overview of how to navigate the RStudio GUI is here: https://heardlibrary.github.io/digital-scholarship/script/r/navigate/

Create a new script by ‘File’ → “New File” → “R script”. Now hit ‘File’ → ‘Save’ and save the script in H:/workshop/2025/S1W1/metagenomics/R_analysis (i.e. the directory we just created). Give the script a name. Call it “R_metagenomics”.

Go to the next section, ‘Setting up your R environment’