2024-2: 5a.2 R packages (DE) - installing, loading and data importation

1 Install required R packages - only need to run it if you are using your own computer
2 To load required R packages:
3 To import your data files into R:

The following analysis contains R code (in the grey text boxes) that you can copy and paste, then run, into the R script you just created on the previous page.

Install required R packages - only need to run it if you are using your own computer

Copy and paste the following code into the R script you just created, then run the code (highlight all the code in your R script, then press the run button). This will install all the required packages and dependencies and may take 30 minutes or more to complete. It may prompt you occasionally to update packages - select 'a' for all if/when this occurs.

NOTE: you only need to run this section once on any laptop/PC, and you don’t need to run it if you’re using an rVDI machine.

#### Differential expression analysis ####
# When you see '## USER INPUT', this means you have to modify the code for your computer or dataset. All other code can be run as-is (i.e. you don't need to understand the code, just run it)
#### 2. Installing required packages ####
# **NOTE: this section only needs to be run once (or occasionally to update the packages)
# Install devtools
install.packages("devtools", repos = "http://cran.us.r-project.org")
# Install R packages. This only needs to be run once.
bioconductor_packages <- c("DESeq2", "EnhancedVolcano", "org.Hs.eg.db", "org.Mm.eg.db", "org.Rn.eg.db", "org.EcK12.eg.db", "org.EcSakai.eg.db", "org.Dr.eg.db", "org.Dm.eg.db")
cran_packages <- c("ggrepel", "ggplot2", "plyr", "reshape2", "readxl", "FactoMineR", "factoextra", "pheatmap")
# Compares installed packages to above packages and returns a vector of missing packages
new_packages <- bioconductor_packages[!(bioconductor_packages %in% installed.packages()[,"Package"])]
new_cran_packages <- cran_packages[!(cran_packages %in% installed.packages()[,"Package"])]
# Install missing bioconductor packages
if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")
BiocManager::install(new_packages)
# Install missing cran packages
if (length(new_cran_packages)) install.packages(new_cran_packages, repos = "http://cran.us.r-project.org")
# Update all installed packages to the latest version
update.packages(bioconductor_packages, ask = FALSE)
update.packages(cran_packages, ask = FALSE, repos = "http://cran.us.r-project.org")

To load required R packages:

This section loads the packages you’ve installed in the previous section. Unlike installing packages, this needs to be run every time and should only take a few seconds to run.

#### 3. Loading required packages ####
# This section needs to be run every time
# Load packages
bioconductor_packages <- c("DESeq2", "EnhancedVolcano", "org.Hs.eg.db", "org.Mm.eg.db", "org.Rn.eg.db", "org.EcK12.eg.db", "org.EcSakai.eg.db", "org.Dr.eg.db", "org.Dm.eg.db")
cran_packages <- c("ggrepel", "ggplot2", "plyr", "reshape2", "readxl", "FactoMineR", "factoextra", "pheatmap")
lapply(cran_packages, require, character.only = TRUE)
lapply(bioconductor_packages, require, character.only = TRUE)

To import your data files into R:

In this section, we will import your count table and samples table into R.

You’ll need to change the ‘setwd' line to your working directory. Click ‘Session’ → ‘Set working directory’ → ‘Choose working directory’ and then choose the analysis workshop directory you created previously that contains your R script file and the 'Data’ directory.

#### 4. Import your count data ####
# Make sure you have: a) your count table (salmon.merged.gene_counts.tsv file, if you used Nextflow nfcore/rnaseq to analyse your data). Copy this to a subdirectory called 'data'. b) your metadata file. This should be either an Excel file called 'metadata.xlsx' or a tab-separated text file called 'metadata.txt'. It needs 3 columns called 'sample_name', 'sample_ID' and 'group'. The sample names should be EXACTLY the same as the names in the count table. These names are often uninformative and long, so the 'sample_ID' is the sample labels you want to put on your plots. E.g. if you have a 'high fat' group, you might want to rename the samples HF1, HF2, HF3, etc)

## USER INPUT
# Set working directory. 
# Change this to your working directory (In the RStudio menu: Session -> Set working directory -> Choose working directory)
setwd("H:/workshop/2024/rnaseq/DE_analysis_workshop")


# Import your count data. make sure you've created a 'data' subdirectory and put the count table file there.
metacountdata <- read.table("./data/salmon.merged.gene_counts.tsv", header = TRUE, row.names = 1)


# Import metadata. Again, need a metadata.xlsx file in the data subdirectory.
meta <- read_excel("./data/metadata.xlsx")


# Remove 1st columns of metadata (gene_name)
counts <- metacountdata[ ,2:ncol(metacountdata)]
# Rename sample names to new sample IDs
counts <- counts[as.character(meta$sample_name)]
colnames(counts) <- meta$sample_ID
# Counts need to be rounded to integers
counts <- ceiling(counts)

Preparing your data - previous

Checking for outliers and batch effects - next

ER-User Guides

2024-2: 5a.2 R packages (DE) - installing, loading and data importation

Analytics

Install required R packages - only need to run it if you are using your own computer

To load required R packages:

To import your data files into R:

Related content