Table of Contents | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
...
In this section we can estimate alpha diversity using Shannon’s index, Simpson's index, Chao1 richness and observed ASVs. Different diversity indices were used because each have different strengths and weaknesses and are better contextually - i.e. depending on the structure and nature of your dataset. For example, Shannon’s index is more sensitive to sample size and influenced by rare ASVs. Simpson's index is less sensitive to sample size and or rare ASVs, but is influenced by dominance/abundance of ASVs.
...
Observed ASVs calculates diversity based on the number of distinct Amplicon Sequence Variants. Originally this method was developed for observed OTUs, as seen in the below paper.
DeSantis, T.Z., Hugenholtz, P., Larsen, N., Rojas, M., Brodie, E.L., Keller, K. Huber, T., Davis, D., Hu., Hugenholtz, P., Andersen, G.L. (2006). “Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB”. Applied and Environmental Microbiology (72): 5069–5072.
Combined Shannon and Observed ASV. An additional plot combing both Shannon’s index and Observed ASV indices has been included, to compare similarities and differences between these results. As each index uses different units, results for both have been normalised between 0 and 1.
Kruskal-Wallis rank sum test is a rank-based nonparametric test that can be used to determine if there are statistically significant differences between two or more groups. This statistical analysis is provided for each plot, to estimate if there is a significant difference (q.value < 0.05) between all groups.
Pairwise Wilcoxon rank sum test (AKA: Mann-Whitney test is the same as the Kruskal-Wallis test, but applied pairwise to each group (technically, The Kruskal-Wallis test is the generalization of the Wilcoxon rank sum test).
4a. Preparing your data
ampvis2 requires 3 components:
A table of read counts per sample per ASV (abundance table)
A table that matches ASVs to taxonomy information
A metadata table containing your sample IDs and any variables (e.g. experimental groups) associated with each sample.
...
nfcore/ampliseq generates the abundance and taxonomy tables, and we created the metadata table in the previous section, so we can import all 3 into R.
Import the metadata table
When you ran your sequences through the ampliseq pipeline, you submitted the samples with a metadata file. This file contains information on your samples and variables. We need to import this metadata file to run our analysis on selected variables.
Code Block |
---|
samples_table <- read.table("../illumina/data/metadata.tsv", sep = "\t", header = T) |
Have a look at your samples table and variables (metadata):
Code Block |
---|
samples_table |
Import the ASV abundance table
First import the unfiltered ASV table (ampvis2 does internal filtering).
Code Block |
---|
asvtable <- read.table("../illumina/results/dada2/ASV_table.tsv", check.names = FALSE, sep = "\t", skip = 1, stringsAsFactors = FALSE, comment.char = "", header = T)
colnames(asvtable)[1] <- "ASV" |
Have a look at the top few rows of the ASV table, to see if it looks right. The sample IDs should be the column names and the ASV IDs in the first column. All the other columns should contain numbers (i.e. the count of the number of times each ASV was found in each sample).
Code Block |
---|
head(asvtable) |
Import the taxonomy table
Read in the taxonomy data - i.e. the taxonomic assignments for each ASV
...
Larsen, N., Rojas, M., Brodie, E.L., Keller, K. Huber, T., Davis, D., Hu, P., Andersen, G.L. (2006). “Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB”. Applied and Environmental Microbiology (72): 5069–5072.
Combined Shannon and Observed ASV. An additional plot combing both Shannon’s index and Observed ASV indices has been included, to compare similarities and differences between these results. As each index uses different units, results for both have been normalised between 0 and 1.
Kruskal-Wallis rank sum test is a rank-based nonparametric test that can be used to determine if there are statistically significant differences between two or more groups. This statistical analysis is provided for each plot, to estimate if there is a significant difference (q.value < 0.05) between all groups.
Pairwise Wilcoxon rank sum test (AKA: Mann-Whitney test is the same as the Kruskal-Wallis test, but applied pairwise to each group (technically, The Kruskal-Wallis test is the generalization of the Wilcoxon rank sum test).
4a. Preparing your data
ampvis2 requires 3 components:
A table of read counts per sample per ASV (abundance table)
A table that matches ASVs to taxonomy information
A metadata table containing your sample IDs and any variables (e.g. experimental groups) associated with each sample.
...
nfcore/ampliseq generates the abundance and taxonomy tables, and we created the metadata table in the previous section, so we can import all 3 into R.
Import the metadata table
When you ran your sequences through the ampliseq pipeline, you submitted the samples with a metadata file. This file contains information on your samples and variables. We need to import this metadata file to run our analysis on selected variables.
Code Block |
---|
samples_table <- read.table("../illumina/results/dada2/ASV_tax.silva_138data/metadata.tsv", sep = "\t", header = T, quote = "") |
Have a look again at the top 6 rows
Code Block |
---|
head(mytax) |
Combine the taxonomy and ASV tables
Code Block |
---|
asv_table <- cbind(asvtable, mytaxsplit)
# Also remove the ASV_ID, Confidence and sequence columns, as they are not needed
asv_table <- subset(asv_table, select=-c(ASV_ID, Confidence, sequence)) |
Code Block |
---|
head(asv_table) |
...
at your samples table and variables (metadata):
Code Block |
---|
samples_table |
Import the ASV abundance table
First import the unfiltered ASV table (ampvis2 does internal filtering).
Code Block |
---|
asvtable <- read.table("../illumina/results/dada2/ASV_table.tsv", check.names = FALSE, sep = "\t", stringsAsFactors = FALSE, comment.char = "", header = T)
colnames(asvtable)[1] <- "ASV" |
Have a look at the top few rows of the ASV table, to see if it looks right. The sample IDs should be the column names and the ASV IDs in the first column. All the other columns should contain numbers (i.e. the count of the number of times each ASV was found in each sample).
Code Block |
---|
head(asvtable) |
Import the taxonomy table
Read in the taxonomy data - i.e. the taxonomic assignments for each ASV
Code Block |
---|
mytax <- read.table("../illumina/results/dada2/ASV_tax.silva_138.tsv", sep = "\t", header = T, quote = "") |
Have a look again at the top 6 rows
Code Block |
---|
head(mytax) |
Combine the taxonomy and ASV tables
Code Block |
---|
asv_table <- cbind(asvtable, mytax)
# Also remove the ASV_ID, Confidence and sequence columns, as they are not needed
asv_table <- subset(asv_table, select=-c(ASV_ID, confidence, sequence)) |
Code Block |
---|
head(asv_table) |
Convert the imported data to ampvis2 format
...
There are a multitude of other modifications you can make to a ggplot, too many to describe in this notebookworkshop. But there are plenty of online guides on how to modify ggplot plots. Here's an example: http://www.sthda.com/english/wiki/be-awesome-in-ggplot2-a-practical-guide-to-be-highly-effective-r-software-and-data-visualization
The above changes and more can be applied to any ampvis2 or ggplot plot in this notebookworkshop.
Once you have your rarefecation plot looking how you like, you can export it as a 300dpi (i.e. publication quality) tiff or pdf file:
...
Code Block |
---|
pdf_exp <- paste0("rarefaction_", group, "_samples.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")
|
You can now find these files in your working directory (which you originally defined in the 'Setting up your analysis environment' section).
...
Briefly, the above code changes the theme (theme_bw()
), the Y axis label (ylab(indicname)
- note: the current y axis name is inside a variable called indicname
, so you need to replace it with text in quotes, e.g. "Chao richness"
), removes the X axis label (xlab("")
), flips the X and Y axis (coord_flip()
), changes the font size to 17 and removes the legend (both in theme()
). Finally, a trend line is added (geom_smooth()
). Any of these components can be removed or modified as you like.
You can save Save your plot as a 300dpi (i.e. publication quality) tiff or pdf file. These files can be found in your working directory.Export as a 300dpi tifftiff and pdf file.
Code Block |
---|
tiff_exp <- paste0("alpha_div_box_plot_", group, "_", indicname, "_samples.tiff") ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm") |
Export as a pdf
Code Block |
---|
pdf_exp <- paste0("alpha_div_box_plot_", group, "_", indicname,"_samples.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm") |
...
To compare the differences between groups within your variable, a Kruskal-Wallis test (one-way analysis of variance) can be performed to test for overall differences between all groups, and a Wilcoxon rank sum to test pairwise differences between each group.
...