/
25S1W1 - 2. Setup

25S1W1 - 2. Setup

Overview

Installing R and Rstudio

 

You have three main options for running this analysis in RStudio:

  1. Use QUTs rVDI virtual desktop machines

  2. Install R and RStudio on your own PC

  3. Use the provided PCs in the QUT computer labs

 

Option1: Use QUTs rVDI virtual desktop machines

 

This is the preferred method, as R and RStudio are already installed, as are all the required R packages needed for analysis. Installing all of these can take over 30 minutes on your own PC, so using an rVDI machine saves time.

rVDI provides a virtual Windows desktop that can be run in your web browser.

To access and run an rVDI virtual desktop:

Go to https://rvdi.qut.edu.au/

Click on ‘VMware Horizon HTML Access

Log on with your QUT username and password.

Click the ‘R Workshop’ option, which is already setup for this workshop.

*NOTE: you need to be connected to the QUT network first, either being on campus or connecting remotely via VPN.

 

Option2: Install R and RStudio on your own PC

 

Go to the following page Posit and follow the instructions provided to install first R and then Rstudio.

Download and install R, following the default prompts:

https://cran.r-project.org/bin/windows/base/

Download and install RStudio, following the default prompts:

Posit

 

Option3: Use the provided PCs in the QUT computer labs

 

The PCs in the computer labs already have R and RStudio installed. If using this option, you will need to install the required R packages (unlike rVDI). The code for installing these packages is in the analysis section below.

 

Map your HPC home directory

 

On the rVDI machines, your HPC home directory is already mapped to H drive.

If you are using a different computer, follow the instructions here to map your HPC home directory:

https://qutvirtual4.qut.edu.au/group/staff/research/conducting/facilities/advanced-research-computing-storage/supercomputing/using-hpc-filesystems

 

Interactive HPC session

 

In the next step we’re going to copy some data files to your home directory.

Just like in the last session, we’re going to start an interactive PBS session to do this, so we’re not all copying large datasets on the head node.

Use PuTTy to connect to aqua and once you’ve logged into aqua copy and paste the text below into the command prompt:

qsub -I -S /bin/bash -l walltime=4:00:00 -l select=1:ncpus=8:mem=32gb

Wait until the interactive session starts, then move to the next step (below).

 

Create your workshop folders and check your data

 

In your H drive (the drive you just mapped) check to see if there is an ASV_table.tsv in

H:\workshop\2025\S1W1\metagenomics\runs\run2_ampliseq\results\dada2

This is the base data file we’ll be working with - an abundance table of read counts per ASV (i.e. taxonomic group) per sample. Using this we’ll be able to quantify and visualise taxonomic diversity and structure, using R.

Check to see if this file is in H:\workshop\2025\S1W1\metagenomics\data\illumina\

If you can’t see the ASV_table.tsv file (your Nextflow job may have failed, for example), run the following to copy these from a successful nfcore/ampliseq run to your HPC home directory:

cp -r /work/training/metagenomics/public_data/Illumina/results $HOME/workshop/2025/S1W1/metagenomics/runs/run2_ampliseq/results/dada2/

 

The analysis also requires the metadata.tsv file that we need to create. We will create a folder and copy and paste this script into it:

mkdir -p $HOME/workshop/2025/S1W1/metagenomics/data/illumina/fastq/ cp /work/training/2025/S1W1/session3_metagenomics/scripts/metadata.tsv $HOME/workshop/2025/S1W1/metagenomics/data/illumina/fastq/

If we have a look at this file, using the cat metadata.tsv command, it will show this:

ID Nose_size Batch Illumina10 Big Early Illumina11 Big Early Illumina12 Big Early Illumina13 Big Early Illumina14 Big Early Illumina15 Big Early Illumina16 Big Early Illumina17 Big Early Illumina18 Big Early Illumina19 Big Early Illumina1 Big Early Illumina20 Big Early Illumina21 Medium Early Illumina22 Medium Early Illumina23 Medium Early Illumina24 Medium Early Illumina25 Medium Early Illumina26 Medium Early Illumina27 Medium Early Illumina28 Medium Early Illumina29 Medium Early Illumina2 Medium Early Illumina30 Medium Early Illumina31 Medium Early Illumina32 Medium Early Illumina33 Medium Early Illumina34 Medium Early Illumina35 Medium Early Illumina36 Medium Early Illumina37 Small Early Illumina38 Small Early Illumina39 Small Early Illumina3 Small Early Illumina40 Small Early Illumina41 Small Early Illumina42 Small Early Illumina43 Small Early Illumina44 Small Early Illumina45 Small Early Illumina46 Small Late Illumina47 Small Late Illumina48 Small Late Illumina49 Small Late Illumina4 Small Late Illumina50 Small Late Illumina51 Small Late Illumina52 Small Late Illumina53 Small Late Illumina54 Small Late Illumina55 Small Late Illumina56 Small Late Illumina57 Small Late Illumina58 Small Late Illumina59 Small Late Illumina5 Small Late Illumina6 Small Late Illumina7 Small Late Illumina8 Small Late Illumina9 Small Late

 

We’ll need to also create a working directory to store our R scripts and the results of today’s analysis.

In PuTTY run the following:

mkdir $HOME/workshop/2025/S1W1/metagenomics/R_analysis

Downloading the sarek results

In section 7 of today’s workshop we’re going to do some downstream analysis of the nfcore/sarek variant calling workflow that was completed in session 2 of these workshops. We realise not everyone attended these (or didn’t successfully complete the sarek workflows), so you can download the results of successful runs by running the following command in PuTTY:

mkdir $HOME/workshop/2025/S1W1/metagenomics/sarek cp -r /work/training/2025/S1W1/session2_variant_calling/runs/ $HOME/workshop/2025/S1W1/metagenomics/sarek

While these datasets are copying we can move to the next section (leave PuTTY running).

 

Open RStudio and create a new R script

 

The RStudio icon is on your rVDI desktop. Open RStudio.

In this workshop we will not be teaching you R or RStudio, other than the very basics. For the most part you can just copy/paste the code into R and run it. There are a multitude of beginner’s R course out there, for example this QCIF course: https://www.qcif.edu.au/trainingcourses/introduction-to-programming%3A-r-for-reproducible-scientific-analysis . QCIF courses are free to QUT staff and HDR students.

An overview of how to navigate the RStudio GUI is here: https://heardlibrary.github.io/digital-scholarship/script/r/navigate/

Create a new script by ‘File’ → “New File” → “R script”. Now hit ‘File’ → ‘Save’ and save the script in H:/workshop/2025/S1W1/metagenomics/R_analysis (i.e. the directory we just created). Give the script a name. Call it “R_metagenomics”.

 

Go to the next section, ‘Setting up your R environment’

 

 

 

 

 

 

 

 

Related content