Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents
minLevel1
maxLevel6
outlinefalse
stylenone
typelist
printabletrue

Overview

Taxonomic diversity can be measured using a variety of statistical models. Alpha diversity represents the species diversity (richness, evenness, compositional complexity) within experimental samples and treatment groups.

...

  1. Combined Shannon and Observed ASV. An additional plot combing both Shannon’s index and Observed ASV indices has been included, to compare similarities and differences between these results. As each index uses different units, results for both have been normalised between 0 and 1.

  2. Kruskal-Wallis rank sum test is a rank-based nonparametric test that can be used to determine if there are statistically significant differences between two or more groups. This statistical analysis is provided for each plot, to estimate if there is a significant difference (q.value < 0.05) between all groups.

  3. Pairwise Wilcoxon rank sum test (AKA: Mann-Whitney test is the same as the Kruskal-Wallis test, but applied pairwise to each group (technically, The Kruskal-Wallis test is the generalization of the Wilcoxon rank sum test).

3. Preparing your data

Import the samples table

When you ran your sequences through the ampliseq pipeline, you submitted the samples with a metadata file. This file contains information on your samples and variables. We need to import this metadata file to run our analysis on selected variables.

...

Have a look at your samples table and variables (metadata):

Code Block
samples_table

Import the ASV abundance table

First import the unfiltered ASV table (ampvis2 does internal filtering).

...

Code Block
subgroup <- "all"

Convert the imported data to ampvis2 format

The following code cell manipulates the data in a variety of ways (see the in-code #comments for explanations) to prepare the data for conversion to an ampliseq2 database.

...

Code Block
ampvisdata <- amp_load(otutable = asv_table,
              metadata = samples_table)

4. Choosing a categorical variable to analyse

In your metadata you'll usually have multiple variables. These need to be analysed individually, by selecting the variable in this section, then running the remaining analysis sections on this chosen variable. You can then re-run the analysis on another variable by returning to this section, changing the variable name, then running again the remaining analysis sections.

...

Code Block
group <- "Time"

Ordering your variable

The plotting done in ampvis2 is done by the ggplot2 package. ggplot factorises variables and automatically orders them on the plot by alphabetical order.

...

Code Block
ampvisdata$metadata[[group]] <- factor(ampvisdata$metadata[[group]], levels = lev)

5. Rarefaction curve

This section is for plotting rarefaction curves for your samples, coloured by your chosen variable (if you want to change variables, go back and re-run section 4, choosing a different variable).

...

You can print this out as-is simply by:

Code Block
p

Modifying plot attributes

You can make additional modifications to the plot colours, axis labels, font size, theme, etc:

...

Once you have your rarefecation plot looking how you like, you can export it as a 300dpi (i.e. publication quality) tiff or pdf file:

Exporting your plot as a file

You can save your plot as a 300dpi (i.e. publication quality) tiff or pdf file. These files can be found in your working directory.

...

You can now find these files in your working directory (which you originally defined in the 'Setting up your analysis environment' section).

6. Diversity index plots and statistics - single categorical variable

The overview section outlined (with links and references) the alpha diversity indices that can be examined in this Notebook.

...

Code Block
samples_table$sample.id

Calculate alpha diversity

First you need to calculate the alpha diversity index scores using ampvis2 function amp_alphadiv(). This will calculate all 4 indices.

...

Code Block
write.csv(div_indices, paste0(group, "_diversity_indices_raw_scores.csv"))

Choose the index you want to plot

Choose the diversity index scores you want to plot.

...

Shannon's index is used by default ("Shannon") Change this to "Simpson" to plot Simpson's index scores, "Chao1" for Chao1 richness or "ObservedOTUs" for Observed ASVs.

Box and whisker plot

You can view the basic plot like so:

...

Code Block
pdf_exp <- paste0("alpha_div_box_plot_", group, "", indicname, "", subgroup, "_samples.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")

Statistical analysis

To compare the differences between groups within your variable, a Kruskal-Wallis test (one-way analysis of variance) can be performed to test for overall differences between all groups, and a Wilcoxon rank sum to test pairwise differences between each group.

...

To see the pairwise results (p values).

Code Block
wt_pair

Combining diversity plots

You can combine two diversity box and whisker plots, for a side-by-side comparison of results.

...

Code Block
pdf_exp <- paste0("alpha_div_Shannon_obsASV_box_plot_", group, "_", subgroup "_samples.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")

7. Diversity index plots and statistics - multiple categorical variables

In the previous section you examined a single variable.

...

If there is only one subcategory for this secondary variable (possible if you have selected out subsamples in the '3. Preparing your data' section) then the plots will fail. If you have a great many subcategories then there may be too many facets, making the results hard to see. Usually between 2-6 subcategories is optimal.

Rarefaction curvegeneralised linear model is applied to examine statistically significant correlations.In addition to the scatter plot, glm (t statistic, p value) and correlation (correlation, p value) statistics can be generated.

NOTE: This section, as with the previous plotting sections, requites that you've run the '3. Preparing your data' section and chosen the samples you want to work with. If you want to change your samples, go back to that section and re-run it with new parameters.

...

Code Block
colnames(samples_table)

...

9. Diversity index plots and statistics - continuous variable Generalised linear model t value =

[ ]:

Code Block
round(glm_sum$coefficients[2,3], 4)

...

Code Block
round(cor_stat$p.value, 4)

Adding another variablefacets based on an additional categorical variable.First select which categorical variable you want to examine (remember, these are the column names of the samples table):

[ ]:

Code Block
var3 <- "Phase"

...