Aim of today
Identify statistically significant (FDR < 0.05) differentially expressed genes.
Visualise results with PCA plots, heatmaps and volcano plots.
Requirements
Run your samples (FASTQ) using the nextflow nf-core/RNA-seq pipeline using ‘star_salmon’ (Task 4 session) or an alternative pipeline that generates feature counts.
Get Rstudio working for you - Option 1 below for in-class session
Installing R and Rstudio
The analysis scripts in this guide are written in R script. We will be using RStudio, a front-end gui for R, to run the analysis scripts.
You have three main options for running this analysis in RStudio:
Use QUTs rVDI virtual desktop machines
Install R and RStudio on your own PC
Use the provided PCs in the QUT computer labs
Option1: Use QUTs rVDI virtual desktop machines
This is the preferred method, as R and RStudio are already installed, as are all the required R packages needed for analysis. Installing all of these can take over 30 minutes on your own PC, so using an rVDI machine saves time.
rVDI provides a virtual Windows desktop that can be run in your web browser.
To access and run an rVDI virtual desktop:
Go to https://rvdi.qut.edu.au/
Click on ‘VMware Horizon HTML Access’ and select R_Megenomics (if you have > 1 available to you).
Log on with your QUT username and password
*NOTE: you need to be connected to the QUT network first, either being on campus or connecting remotely via VPN.
Option2: Install R and RStudio on your own PC
Go to the following page https://posit.co/download/rstudio-desktop/ and follow the instructions provided to install first R and then Rstudio.
Download and install R, following the default prompts:
https://cran.r-project.org/bin/windows/base/
Download and install RStudio, following the default prompts:
https://posit.co/download/rstudio-desktop/
Option3: Use the provided PCs in the QUT computer labs
The PCs in the computer labs already have R and RStudio installed. If using this option, you will need to install the required R packages (unlike rVDI). The code for installing these packages is in the analysis section below.
We will now perform the following tasks using Rstudio - DE
Preparing your data. Two data files are needed for this analysis: a samples table and your count table
R packages
Install required R packages (only need to run once) - after installation, we only need to load the packages. NOTE: If using an rVDI virtual machine, the R packages are already installed
Load required R packages. Unlike installing the packages, this needs to be done every time you run the analysis
Import your data files (count table and samples table) into R
Checking for outliers and batch effects
PCA plot
Pairwise samples heatmap
Identify differentially expressed (DE) genes using DESeq2
Annotating your DE genes
Volcano plot
DE genes heatmap