Table of Contents | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Overview
Taxonomic diversity can be measured using a variety of statistical models. Alpha diversity represents the species diversity (richness, evenness, compositional complexity) within experimental samples and treatment groups.
...
Combined Shannon and Observed ASV. An additional plot combing both Shannon’s index and Observed ASV indices has been included, to compare similarities and differences between these results. As each index uses different units, results for both have been normalised between 0 and 1.
Kruskal-Wallis rank sum test is a rank-based nonparametric test that can be used to determine if there are statistically significant differences between two or more groups. This statistical analysis is provided for each plot, to estimate if there is a significant difference (q.value < 0.05) between all groups.
Pairwise Wilcoxon rank sum test (AKA: Mann-Whitney test is the same as the Kruskal-Wallis test, but applied pairwise to each group (technically, The Kruskal-Wallis test is the generalization of the Wilcoxon rank sum test).
3. Preparing your data
Import the samples table
When you ran your sequences through the ampliseq pipeline, you submitted the samples with a metadata file. This file contains information on your samples and variables. We need to import this metadata file to run our analysis on selected variables.
...
Have a look at your samples table and variables (metadata):
Code Block |
---|
samples_table |
Import the ASV abundance table
First import the unfiltered ASV table (ampvis2 does internal filtering).
...
Code Block |
---|
subgroup <- "all" |
Convert the imported data to ampvis2 format
The following code cell manipulates the data in a variety of ways (see the in-code #comments for explanations) to prepare the data for conversion to an ampliseq2 database.
...
Code Block |
---|
ampvisdata <- amp_load(otutable = asv_table, metadata = samples_table) |
4. Choosing a categorical variable to analyse
In your metadata you'll usually have multiple variables. These need to be analysed individually, by selecting the variable in this section, then running the remaining analysis sections on this chosen variable. You can then re-run the analysis on another variable by returning to this section, changing the variable name, then running again the remaining analysis sections.
...
Code Block |
---|
group <- "Time" |
Ordering your variable
The plotting done in ampvis2 is done by the ggplot2 package. ggplot factorises variables and automatically orders them on the plot by alphabetical order.
...
Code Block |
---|
ampvisdata$metadata[[group]] <- factor(ampvisdata$metadata[[group]], levels = lev) |
5. Rarefaction curve
This section is for plotting rarefaction curves for your samples, coloured by your chosen variable (if you want to change variables, go back and re-run section 4, choosing a different variable).
...
You can print this out as-is simply by:
Code Block |
---|
p |
Modifying plot attributes
You can make additional modifications to the plot colours, axis labels, font size, theme, etc:
...
Once you have your rarefecation plot looking how you like, you can export it as a 300dpi (i.e. publication quality) tiff or pdf file:
Exporting your plot as a file
You can save your plot as a 300dpi (i.e. publication quality) tiff or pdf file. These files can be found in your working directory.
...
You can now find these files in your working directory (which you originally defined in the 'Setting up your analysis environment' section).
6. Diversity index plots and statistics - single categorical variable
The overview section outlined (with links and references) the alpha diversity indices that can be examined in this Notebook.
...
Code Block |
---|
samples_table$sample.id |
Calculate alpha diversity
First you need to calculate the alpha diversity index scores using ampvis2 function amp_alphadiv()
. This will calculate all 4 indices.
...
Code Block |
---|
write.csv(div_indices, paste0(group, "_diversity_indices_raw_scores.csv")) |
Choose the index you want to plot
Choose the diversity index scores you want to plot.
...
Shannon's index is used by default ("Shannon"
) Change this to "Simpson"
to plot Simpson's index scores, "Chao1"
for Chao1 richness or "ObservedOTUs"
for Observed ASVs.
Box and whisker plot
You can view the basic plot like so:
...
Code Block |
---|
pdf_exp <- paste0("alpha_div_box_plot_", group, "", indicname, "", subgroup, "_samples.pdf") ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm") |
Statistical analysis
To compare the differences between groups within your variable, a Kruskal-Wallis test (one-way analysis of variance) can be performed to test for overall differences between all groups, and a Wilcoxon rank sum to test pairwise differences between each group.
...
To see the pairwise results (p values).
Code Block |
---|
wt_pair |
Combining diversity plots
You can combine two diversity box and whisker plots, for a side-by-side comparison of results.
...
Code Block |
---|
pdf_exp <- paste0("alpha_div_Shannon_obsASV_box_plot_", group, "_", subgroup "_samples.pdf") ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm") |
7. Diversity index plots and statistics - multiple categorical variables
In the previous section you examined a single variable.
...
If there is only one subcategory for this secondary variable (possible if you have selected out subsamples in the '3. Preparing your data' section) then the plots will fail. If you have a great many subcategories then there may be too many facets, making the results hard to see. Usually between 2-6 subcategories is optimal.
Rarefaction curvegeneralised linear model is applied to examine statistically significant correlations.In addition to the scatter plot, glm (t statistic, p value) and correlation (correlation, p value) statistics can be generated.
NOTE: This section, as with the previous plotting sections, requites that you've run the '3. Preparing your data' section and chosen the samples you want to work with. If you want to change your samples, go back to that section and re-run it with new parameters.
...
Code Block |
---|
colnames(samples_table) |
...
9. Diversity index plots and statistics - continuous variable Generalised linear model t value =
[ ]:
Code Block |
---|
round(glm_sum$coefficients[2,3], 4) |
...
Code Block |
---|
round(cor_stat$p.value, 4) |
Adding another variablefacets based on an additional categorical variable.First select which categorical variable you want to examine (remember, these are the column names of the samples table):
[ ]:
Code Block |
---|
var3 <- "Phase" |
...