Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Overview

Table of Contents
stylenone

Installing R and Rstudio

The analysis scripts in this guide are written in R script. We will be using RStudio, a front-end gui for R, to run the analysis scripts.

You have three main options for running this analysis in RStudio:

...

Log on with your QUT username and password.

Click the ‘R Metagenomics’ option, which is pre setup for this workshop.

*NOTE: you need to be connected to the QUT network first, either being on campus or connecting remotely via VPN.

...

Map your HPC home directory

We did this in the last session, but if you’re on one of the computer lab PCs or an rVDI virtual machine you’ll need to re-map On the rVDI machines, your HPC home directory is already mapped to H drive.

Follow If you are using a different computer, follow the instructions here to map your HPC home directory:

https://qutvirtual4.qut.edu.au/group/staff/research/conducting/facilities/advanced-research-computing-storage/supercomputing/using-hpc-filesystems

Interactive HPC session

In the next step we’re going to copy some data files to your home directory.

Just like in the last session, we’re going to start an interactive PBS session to do this, so we’re not all copying large datasets on the head node.

Open PuTTy and paste the text below into the command prompt:

Code Block
qsub -I -S /bin/bash -l walltime=4:00:00 -l select=1:ncpus=8:mem=32gb

Wait until the interactive session starts, then move to the next step (below).

Create your workshop folders and check your data

In your Z H drive (the drive you just mapped) check to see if there is an ASV_table.tsv in

ZH:\meta_workshop\illumina\results\dada2

...

The analysis also requires the metadata.tsv file we created at the start of last workshop. Check to see if this file is in ZH:\meta_workshop\illumina\data

If Open you can’t see the ASV_table.tsv or metadata.tsv files (your Nextflow job may have failed, for example), open PuTTY and run the following to copy these from a successful nfcore/ampliseq run to your HPC home directory:

Code Block
cp -r /work/training/metagenomics/public_data/Illumina/results $HOME/meta_workshop/illumina
mkdir -p $HOME/meta_workshop/illumina/data
cp -r /work/training/metagenomics/public_data/Illumina/data/metadata.tsv $HOME/meta_workshop/illumina/data

We’ll need to also create today’s workshop foldera working directory to store our R scripts and the results of today’s analysis.

In PuTTY run the following:

Code Block
mkdir $HOME/home/whatmorp/meta_workshop/R_analysis

Downloading the sarek results

In session 7 we’re going to do some downstream analysis of the nfcore/sarek variant calling workflow that was completed in sessions 2 and 3 of these workshops. We realise not everyone attended these (or didn’t successfully complete the sarek workflows), so you can download the results of successful runs by running the following command in PuTTY:

Code Block
mkdir $HOME/workshop/sarek
cp -r /work/training/sarek/

...

runs/ $HOME/workshop/sarek

While these datasets are copying we can move to the next section (leave PuTTY running).

Open RStudio and create a new R script

The RStudio icon is on your rVDI desktop. Open RStudio.

In this workshop we will not be teaching you R or RStudio, other than the very basics. For the most part you can just copy/paste the code into R and run it. There are a multitude of beginner’s R course out there, for example this QCIF course: https://www.qcif.edu.au/trainingcourses/introduction-to-programming%3A-r-for-reproducible-scientific-analysis . QCIF courses are free to QUT staff and HDR students.

An overview of how to navigate the RStudio GUI is here: https://heardlibrary.github.io/digital-scholarship/script/r/navigate/

Create a new script by ‘File’ → “New File” → “R script”. Now hit ‘File’ → ‘Save’ and save the script in H:/meta_workshop/R_analysis (i.e. the directory we just created). Give the script a name. Call it “R_metagenomics”.

Go to the next section, ‘Setting up your R environment’