Overview
Table of Contents | ||
---|---|---|
|
...
Copy and paste the following code into the R script you just created, then run the code. This will install all the required packages and dependencies and may take 45 minutes or more to complete. It may prompt you occasionally to update packages - select 'a' for all if/when this occurs.
NOTE: you only need to run this section once if you’re running this analysis on your own laptop/PC, and you don’t need to run it if you’re using an rVDI machine as all the packages are already installed.
Code Block |
---|
#### Metagenomics analysis ####
# When you see '## USER INPUT', this means you have to modify the code for your computer or dataset. All other code can be run as-is (i.e. you don't need to understand the code, just run it)
#### 1. Installing required packages ####
# **NOTE: this section only needs to be run once (or occasionally to update the packages)
# Install devtools
install.packages("devtools", repos = "http://cran.us.r-project.org")
# Install R packages. This only needs to be run once.
# Make a vector of CRAN and Bioconductor packages
bioconductor_packages <- c("VariantAnnotation", "biomaRt", "clusterProfiler", "org.Hs.eg.db")
cran_packages <- c("devtools", "tidyverse", "DT", "gt", "openxlsx", "dplyr", "scales", "ggplot2", "plotly", "tidyr", "ggsci", "viridis", "vcfR", "data.table", "remotes", "vegan")
# Compares installed packages to above packages and returns a vector of missing packages
new_packages <- bioconductor_packages[!(bioconductor_packages %in% installed.packages()[,"Package"])]
new_cran_packages <- cran_packages[!(cran_packages %in% installed.packages()[,"Package"])]
# Install missing bioconductor packages
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(new_packages)
# Install missing cran packages
if (length(new_cran_packages)) install.packages(new_cran_packages, repos = "http://cran.us.r-project.org")
# Update all installed packages to the latest version
update.packages(bioconductor_packages, ask = FALSE)
update.packages(cran_packages, ask = FALSE, repos = "http://cran.us.r-project.org")
# Install ampvis2 (needs to be installed from Github)
remotes::install_github("kasperskytte/ampvis2")
|
...
Code Block |
---|
#### 2. Loading required packages ####
# This section needs to be run every time
# Load packages
bioconductor_packages <- c("VariantAnnotation", "biomaRt", "clusterProfiler", "org.Hs.eg.db")
cran_packages <- c("devtools", "tidyverse", "DT", "gt", "openxlsx", "dplyr", "scales", "ggplot2", "plotly", "tidyr", "ggsci", "viridis", "vcfR", "data.table", "remotes", "vegan")
lapply(cran_packages, require, character.only = TRUE)
lapply(bioconductor_packages, require, character.only = TRUE)
library(ampvis2)
|
...
To set your working directory, click ‘Session’ → ‘Set working directory’ → ‘Choose working directory’ and then choose the H:/meta_workshop/R_analysis directory.
Summary of DADA2 quality filtration
...
Copy, paste and run the following into your RStudio script:
Code Block |
---|
#### 2. Summary of DADA2 quality filtration #### ## Make a summary table # Import the DADA2 filtration summary information dada <- read.table("../illumina/results/dada2/dada2_stats.tsv", sep = "\t", header = T) # Calculate the filtration percentages and add as new columns dada$percentage.of.input.passed.filter <- paste0(round((dada$filtered/dada$DADA2_input)*100,2), "%") dada$percentage.of.input.denoised <- paste0(round((dada$denoised/dada$DADA2_input)*100,2), "%") dada$percentage.of.input.non.chimeric <- paste0(round((dada$nonchim/dada$DADA2_input)*100,2), "%") # Rearrange the columns and change the column names dada <- dada[c(1,2,3,6,4,7,5,8)] colnames(dada) <- c("Sample", "Unfiltered_reads", "Filtered", "%", "Denoised", "%", "Non-chimeric", "%") # Export as a csv file write.csv(dada, "DADA2_filtration.csv") |
The final line in the above code box will write out the results as a table called DADA2_filtration.csv
. It will write this out in your working directory (H:/meta_workshop/R_analysis). You can go there in Windows File Explorer and open the table in Excel.
...
Run the following in your R script:
Code Block |
---|
## Plot the summary results
# Pull out the required columns
dada_bp <- data.frame(dada$Sample, dada$Unfiltered_reads - dada$`Non-chimeric`, dada$`Non-chimeric`)
colnames(dada_bp) <- c("Sample", "Removed reads", "Remaining reads")
df <- tidyr::pivot_longer(dada_bp, cols =c("Removed reads", "Remaining reads"))
colnames(df) <- c("Sample", "Reads", "Read_count")
# Plot the DADA2 filtration results
p <- ggplot(df, aes(Sample, Read_count, fill=Reads)) + geom_bar(stat = "identity", position = 'stack')
p <- p + scale_fill_brewer(palette="Dark2") + ylab("Read count") + theme_bw() +
theme(text = element_text(size = 14), axis.text.x = element_text(angle = 90, size = 10))
p
|
The above will display the plot in RStudio.
You can also save your plot as a 300dpi (i.e. publication quality) tiff or pdf file. These files can be found in your working directory.
Tip: you can adjust the width and height of the saved images by changing width =
and height =
in the code below.
Code Block |
---|
# Export as a 300dpi tiff |
...
tiff_exp <- "DADA2_summary.tiff" ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm") |
...
# Export as a pdf |
...
pdf_exp <- "DADA2_summary.pdf" ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm") |
Go to the next section: Alpha diversity