Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents
minLevel1
maxLevel6
outlinefalse
stylenone
typelist
printabletrue

Overview

Taxonomic diversity can be measured using a variety of statistical models. Alpha diversity represents the diversity (richness, evenness, compositional complexity) within a group (e.g. an experiment group) and beta diversity (next section) examines diversity between groups.

'Diversity' is based on the amplicon sequence variants (ASV) detected by DADA2 per sample.

In this section we can estimate alpha diversity using Shannon’s index, Simpson's index, Chao1 richness and observed ASVs. Different diversity indices were used because each have different strengths and weaknesses and are better contextually - i.e. depending on the structure and nature of your dataset. For example, Shannon’s index is more sensitive to sample size and influenced by rare ASVs. Simpson's index is less sensitive to sample size and or rare ASVs, but is influenced by dominance/abundance of ASVs.

A good overview of most of these methods can be seen in this paper:

https://www.jmb.or.kr/submission/Journal/027/JMB027-12-02_FDOC_2.pdf

The plots and tables in each section (where each section = a variable) are as follows:

  1. Shannon's index calculates richness and diversity using a natural logarithm and accounts for both abundance and evenness of the taxa present. Shannon's index is a universally accepted index for diversity as it accounts for entropy in an ecosystem or in representative samples.

Shannon, C.E. and Weaver, W. (1949). “The mathematical theory of communication”. University of Illonois Press, Champaign, Illonois.

  1. Simpson's index is used to estimate dominance of the species but does not account for species richness.

Simpson, E. H. (1949). Measurement of diversity. nature, 163(4148), 688-688.

  1. Chao1 richness is a non-parametric estimator of diversity, based on abundance.

Chao, A. (1984). Nonparametric estimation of the number of classes in a population. Scandinavian Journal of statistics, 265-270.

  1. Observed ASVs calculates diversity based on the number of distinct Amplicon Sequence Variants. Originally this method was developed for observed OTUs, as seen in the below paper.

DeSantis, T.Z., Hugenholtz, P., Larsen, N., Rojas, M., Brodie, E.L., Keller, K. Huber, T., Davis, D., Hu, P., Andersen, G.L. (2006). “Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB”. Applied and Environmental Microbiology (72): 5069–5072.

  1. Combined Shannon and Observed ASV. An additional plot combing both Shannon’s index and Observed ASV indices has been included, to compare similarities and differences between these results. As each index uses different units, results for both have been normalised between 0 and 1.

  2. Kruskal-Wallis rank sum test is a rank-based nonparametric test that can be used to determine if there are statistically significant differences between two or more groups. This statistical analysis is provided for each plot, to estimate if there is a significant difference (q.value < 0.05) between all groups.

  3. Pairwise Wilcoxon rank sum test (AKA: Mann-Whitney test is the same as the Kruskal-Wallis test, but applied pairwise to each group (technically, The Kruskal-Wallis test is the generalization of the Wilcoxon rank sum test).

4a. Preparing your data

ampvis2 requires 3 components:

  1. A table of read counts per sample per ASV (abundance table)

  2. A table that matches ASVs to taxonomy information

  3. A metadata table containing your sample IDs and any variables (e.g. experimental groups) associated with each sample.

...

nfcore/ampliseq generates the abundance and taxonomy tables, and we created the metadata table in the previous section, so we can import all 3 into R.

Import the metadata table

When you ran your sequences through the ampliseq pipeline, you submitted the samples with a metadata file. This file contains information on your samples and variables. We need to import this metadata file to run our analysis on selected variables.

Code Block
samples_table <- read.table("../illumina/data/metadata.tsv", sep = "\t", header = T)

Have a look at your samples table and variables (metadata):

Code Block
samples_table

Import the ASV abundance table

First import the unfiltered ASV table (ampvis2 does internal filtering).

Code Block
asvtable <- read.table("../illumina/results/dada2/ASV_table.tsv", check.names = FALSE, sep = "\t", skip = 1, stringsAsFactors = FALSE, comment.char = "", header = T)
colnames(asvtable)[1] <- "ASV"

Have a look at the top few rows of the ASV table, to see if it looks right. The sample IDs should be the column names and the ASV IDs in the first column. All the other columns should contain numbers (i.e. the count of the number of times each ASV was found in each sample).

Code Block
head(asvtable)

Import the taxonomy table

Read in the taxonomy data - i.e. the taxonomic assignments for each ASV

Code Block
mytax <- read.table("../illumina/results/dada2/ASV_tax.silva_138.tsv", sep = "\t", header = T, quote = "")

Have a look again at the top 6 rows

Code Block
head(mytax)

Combine the taxonomy and ASV tables

Code Block
asv_table <- cbind(asvtable, mytaxsplit)
# Also remove the ASV_ID, Confidence and sequence columns, as they are not needed
asv_table <- subset(asv_table, select=-c(ASV_ID, Confidence, sequence))
Code Block
head(asv_table)

...

Code Block
asv_table$Kingdom <- gsub("D_0", "k", asv_table$Kingdom)
asv_table$Phylum <- gsub("D_1", "p", asv_table$Phylum)
asv_table$Class <- gsub("D_2", "c", asv_table$Class)
asv_table$Order <- gsub("D_3", "o", asv_table$Order)
asv_table$Family <- gsub("D_4", "f", asv_table$Family)
asv_table$Genus <- gsub("D_5", "g", asv_table$Genus)
asv_table$Species <- gsub("D_6", "s", asv_table$Species)

Convert the imported data to ampvis2 format

Now combine the samples data with the ASV table using amp_load(). This creates an ampvis2 database that can be used by ampvis2

Code Block
ampvisdata <- amp_load(otutable = asv_table, metadata = samples_table)

You can see information about the ampvis2 object you just created by typing its name

Code Block
ampvisdata

4b. Choosing a variable to analyse

In your metadata you'll usually have multiple variables. These need to be analysed individually, by selecting the variable in this section, then running the remaining analysis sections on this chosen variable. You can then re-run the analysis on another variable by returning to this section, changing the variable name, then running again the remaining analysis sections.

You can view your variables as column names in your samples_table:

Code Block
colnames(samples_table)

Enter the column name of the variable you want to analyse (i.e. change group <- "Myvariable" in the below cell to your chosen variable's column name). This has to be exactly the same as the column name, including capitalisation, characters such as underscores, etc.

Code Block
group <- "Nose_size"

Ordering your variable

The plotting done in ampvis2 is done by the ggplot2 package. ggplot factorises variables and automatically orders them on the plot by alphabetical order.

This can cause your groups to be ordered incorrectly on the plot axes (e.g. a time series may not be plotted sequentially).

You can manually set the order of your variable here.

First have a look at how ggplot will order your variable.

Code Block
levels(factor(ampvisdata$metadata[[group]]))

If these are in the order you want to see them on your plot axes, nothing needs to be done. If they are in the wrong order you need to order them manually by setting the levels.

Choose how you want to order your groups here:

Code Block
lev <- c("Small", "Medium", "Big")

To order your variable you need to put all the variable levels into the lev = c(..). Make sure each level is in double quotes and separated by a comma.

Then run the following to apply the levels to your data:

Code Block
ampvisdata$metadata[[group]] <- factor(ampvisdata$metadata[[group]], levels = lev)

4c. Rarefaction curve

This section is for plotting rarefaction curves for your samples, coloured by your chosen variable (if you want to change variables, go back and re-run section 4, choosing a different variable).

Use the ampvis2 function amp_rarecurve() to generate the rarefaction curve plot.

Code Block
p <- amp_rarecurve(ampvisdata, color_by = group, stepsize = 200)

You can print this out as-is simply by:

Code Block
p

Modifying plot attributes

You can make additional modifications to the plot colours, axis labels, font size, theme, etc:

Code Block
p <- p + scale_color_manual(values=c("red", "green", "blue", "orange", "black", "forestgreen", "firebrick", "goldenrod", "darkviolet", "olivedrab", "purple", "tomato", "yellow", "cornflowerblue")) +
theme_bw() +
ylab("Number of observed ASVs") +
theme(text = element_text(size = 14))
p

You can edit the above modifications and then re-run the code cell to generate a plot of your liking. The modifications in the above code are:

Colours - scale_color_manual(). See the available R colours here: http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf. If you set colours manually, you need to enter as many colours (or more, never fewer) as there are categories in your variable.

Theme - theme_bw(). There are many themes available, seen here: https://ggplot2.tidyverse.org/reference/ggtheme.html . You can change theme_bw() to theme_minimal() or theme_light() for example, to see how it looks.

Y axis label - ylab(). You can change this to what you like. You can also add an ylab() to change the X axis label.

Font size - theme(text = element_text(size = 14)). There are many things you can change in the theme, such as text position, font type, plot margins, etc. See here: https://ggplot2.tidyverse.org/reference/element.html

There are a multitude of other modifications you can make to a ggplot, too many to describe in this notebook. But there are plenty of online guides on how to modify ggplot plots. Here's an example: http://www.sthda.com/english/wiki/be-awesome-in-ggplot2-a-practical-guide-to-be-highly-effective-r-software-and-data-visualization

The above changes and more can be applied to any ampvis2 or ggplot plot in this notebook.

Once you have your rarefecation plot looking how you like, you can export it as a 300dpi (i.e. publication quality) tiff or pdf file:

Exporting your plot as a file

You can save your plot as a 300dpi (i.e. publication quality) tiff or pdf file. These files can be found in your working directory.

Tip: you can adjust the width and height of the saved images by changing width = and height = in the code below (and every time ggsave appears in this workflow).

Export as a 300dpi TIFF

Code Block
tiff_exp <- paste0("rarefaction_", group, "_samples_.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")

Export as a pdf

Code Block
pdf_exp <- paste0("rarefaction_", group, "_samples.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")

You can now find these files in your working directory (which you originally defined in the 'Setting up your analysis environment' section).

4d. Diversity index plots and statistics - single categorical variable

The overview section outlined (with links and references) the alpha diversity indices that can be examined.

Briefly, these are: Shannon’s index, Simpson's index, Chao1 richness and Observed ASVs. Each of these has strengths and weaknesses. It's up to you, the researcher, to explore the literature and decide which is the best index to use for your data.

This section will generate the diversity index scores and then plot them.

This is based on the variable you chose in the previous sections. If you’ve forgotten what variable group you’ve selected, type:

Code Block
group

Calculate alpha diversity

First you need to calculate the alpha diversity index scores using ampvis2 function amp_alphadiv(). This will calculate all 4 indices.

Code Block
div_indices <- amp_alphadiv(ampvisdata, richness = T)

The raw diversity index scores are now attached to your metadata table. You can view the raw scores by (scroll to last 5 columns for scores):

Code Block
div_indices

Export this table to your working directory as a csv file:

Code Block
write.csv(div_indices, "diversity_indices_raw_scores.csv")

Choose the index you want to plot

Choose the diversity index scores you want to plot.

Code Block
indicname <- "Shannon"

Shannon's index is used by default ("Shannon") Change this to "Simpson" to plot Simpson's index scores, "Chao1" for Chao1 richness or "ObservedOTUs" for Observed ASVs.

Box and whisker plot

You can view the basic plot like so:

Code Block
p <- ggplot(div_indices, aes_string(x=group, y=indicname)) + 
  geom_boxplot(fill = "grey", outlier.size = 2.5, lwd=0.7)
p

And you can use the above plot as-is or modify it. For example:

Code Block
p <- p + theme_bw() + 
ylab(indicname) + 
xlab("") + 
coord_flip() + 
theme(text = element_text(size = 17)) + 
theme(legend.position = "none") + 
geom_smooth(se = F, aes(group=1, color = "red"))
p

Change the above code to modify the plot as you like. See the rarefaction curve section for more details on modifying a plot.

...

Table of Contents
minLevel1
maxLevel6
outlinefalse
stylenone
typelist
printabletrue

Overview

Taxonomic diversity can be measured using a variety of statistical models. Alpha diversity represents the diversity (richness, evenness, compositional complexity) within a group (e.g. an experiment group) and beta diversity (next section) examines diversity between groups.

'Diversity' is based on the amplicon sequence variants (ASV) detected by DADA2 per sample.

In this section we can estimate alpha diversity using Shannon’s index, Simpson's index, Chao1 richness and observed ASVs. Different diversity indices have different strengths and weaknesses and are better contextually - i.e. depending on the structure and nature of your dataset. For example, Shannon’s index is more sensitive to sample size and influenced by rare ASVs. Simpson's index is less sensitive to sample size and or rare ASVs, but is influenced by dominance/abundance of ASVs.

A good overview of most of these methods can be seen in this paper:

https://www.jmb.or.kr/submission/Journal/027/JMB027-12-02_FDOC_2.pdf

The plots and tables in each section (where each section = a variable) are as follows:

  1. Shannon's index calculates richness and diversity using a natural logarithm and accounts for both abundance and evenness of the taxa present. Shannon's index is a universally accepted index for diversity as it accounts for entropy in an ecosystem or in representative samples.

Shannon, C.E. and Weaver, W. (1949). “The mathematical theory of communication”. University of Illonois Press, Champaign, Illonois.

  1. Simpson's index is used to estimate dominance of the species but does not account for species richness.

Simpson, E. H. (1949). Measurement of diversity. nature, 163(4148), 688-688.

  1. Chao1 richness is a non-parametric estimator of diversity, based on abundance.

Chao, A. (1984). Nonparametric estimation of the number of classes in a population. Scandinavian Journal of statistics, 265-270.

  1. Observed ASVs calculates diversity based on the number of distinct Amplicon Sequence Variants. Originally this method was developed for observed OTUs, as seen in the below paper.

DeSantis, T.Z., Hugenholtz, P., Larsen, N., Rojas, M., Brodie, E.L., Keller, K. Huber, T., Davis, D., Hu, P., Andersen, G.L. (2006). “Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB”. Applied and Environmental Microbiology (72): 5069–5072.

  1. Combined Shannon and Observed ASV. An additional plot combing both Shannon’s index and Observed ASV indices has been included, to compare similarities and differences between these results. As each index uses different units, results for both have been normalised between 0 and 1.

  2. Kruskal-Wallis rank sum test is a rank-based nonparametric test that can be used to determine if there are statistically significant differences between two or more groups. This statistical analysis is provided for each plot, to estimate if there is a significant difference (q.value < 0.05) between all groups.

  3. Pairwise Wilcoxon rank sum test (AKA: Mann-Whitney test is the same as the Kruskal-Wallis test, but applied pairwise to each group (technically, The Kruskal-Wallis test is the generalization of the Wilcoxon rank sum test).

4a. Preparing your data

ampvis2 requires 3 components:

  1. A table of read counts per sample per ASV (abundance table)

  2. A table that matches ASVs to taxonomy information

  3. A metadata table containing your sample IDs and any variables (e.g. experimental groups) associated with each sample.

...

nfcore/ampliseq generates the abundance and taxonomy tables, and we created the metadata table in the previous section, so we can import all 3 into R.

Import the metadata table

When you ran your sequences through the ampliseq pipeline, you submitted the samples with a metadata file. This file contains information on your samples and variables. We need to import this metadata file to run our analysis on selected variables.

Code Block
samples_table <- read.table("../illumina/data/metadata.tsv", sep = "\t", header = T)

Have a look at your samples table and variables (metadata):

Code Block
samples_table

Import the ASV abundance table

First import the unfiltered ASV table (ampvis2 does internal filtering).

Code Block
asvtable <- read.table("../illumina/results/dada2/ASV_table.tsv", check.names = FALSE, sep = "\t", stringsAsFactors = FALSE, comment.char = "", header = T)
colnames(asvtable)[1] <- "ASV"

Have a look at the top few rows of the ASV table, to see if it looks right. The sample IDs should be the column names and the ASV IDs in the first column. All the other columns should contain numbers (i.e. the count of the number of times each ASV was found in each sample).

Code Block
head(asvtable)

Import the taxonomy table

Read in the taxonomy data - i.e. the taxonomic assignments for each ASV

Code Block
mytax <- read.table("../illumina/results/dada2/ASV_tax.silva_138.tsv", sep = "\t", header = T, quote = "")

Have a look again at the top 6 rows

Code Block
head(mytax)

Combine the taxonomy and ASV tables

Code Block
asv_table <- cbind(asvtable, mytax)
# Also remove the ASV_ID, Confidence and sequence columns, as they are not needed
asv_table <- subset(asv_table, select=-c(ASV_ID, confidence, sequence))
Code Block
head(asv_table)

Convert the imported data to ampvis2 format

Now combine the samples data with the ASV table using amp_load(). This creates an ampvis2 database that can be used by ampvis2

Code Block
ampvisdata <- amp_load(otutable = asv_table, metadata = samples_table)

You can see information about the ampvis2 object you just created by typing its name

Code Block
ampvisdata

4b. Choosing a variable to analyse

In your metadata you'll usually have multiple variables. These need to be analysed individually, by selecting the variable in this section, then running the remaining analysis sections on this chosen variable. You can then re-run the analysis on another variable by returning to this section, changing the variable name, then running again the remaining analysis sections.

You can view your variables as column names in your samples_table:

Code Block
colnames(samples_table)

Enter the column name of the variable you want to analyse (i.e. change group <- "Myvariable" in the below cell to your chosen variable's column name). This has to be exactly the same as the column name, including capitalisation, characters such as underscores, etc.

Code Block
group <- "Nose_size"

Ordering your variable

The plotting done in ampvis2 is done by the ggplot2 package. ggplot factorises variables and automatically orders them on the plot by alphabetical order.

This can cause your groups to be ordered incorrectly on the plot axes (e.g. a time series may not be plotted sequentially).

You can manually set the order of your variable here.

First have a look at how ggplot will order your variable.

Code Block
levels(factor(ampvisdata$metadata[[group]]))

If these are in the order you want to see them on your plot axes, nothing needs to be done. If they are in the wrong order you need to order them manually by setting the levels.

Choose how you want to order your groups here:

Code Block
lev <- c("Small", "Medium", "Big")

To order your variable you need to put all the variable levels into the lev = c(..). Make sure each level is in double quotes and separated by a comma.

Then run the following to apply the levels to your data:

Code Block
ampvisdata$metadata[[group]] <- factor(ampvisdata$metadata[[group]], levels = lev)

4c. Rarefaction curve

This section is for plotting rarefaction curves for your samples, coloured by your chosen variable (if you want to change variables, go back and re-run section 4, choosing a different variable).

Use the ampvis2 function amp_rarecurve() to generate the rarefaction curve plot.

Code Block
p <- amp_rarecurve(ampvisdata, color_by = group, stepsize = 200)

You can print this out as-is simply by:

Code Block
p

Modifying plot attributes

You can make additional modifications to the plot colours, axis labels, font size, theme, etc:

Code Block
p <- p + scale_color_manual(values=c("red", "green", "blue", "orange", "black", "forestgreen", "firebrick", "goldenrod", "darkviolet", "olivedrab", "purple", "tomato", "yellow", "cornflowerblue")) +
theme_bw() +
ylab("Number of observed ASVs") +
theme(text = element_text(size = 14))
p

You can edit the above modifications and then re-run the code cell to generate a plot of your liking. The modifications in the above code are:

Colours - scale_color_manual(). See the available R colours here: http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf. If you set colours manually, you need to enter as many colours (or more, never fewer) as there are categories in your variable.

Theme - theme_bw(). There are many themes available, seen here: https://ggplot2.tidyverse.org/reference/ggtheme.html . You can change theme_bw() to theme_minimal() or theme_light() for example, to see how it looks.

Y axis label - ylab(). You can change this to what you like. You can also add an ylab() to change the X axis label.

Font size - theme(text = element_text(size = 14)). There are many things you can change in the theme, such as text position, font type, plot margins, etc. See here: https://ggplot2.tidyverse.org/reference/element.html

There are a multitude of other modifications you can make to a ggplot, too many to describe in this workshop. But there are plenty of online guides on how to modify ggplot plots. Here's an example: http://www.sthda.com/english/wiki/be-awesome-in-ggplot2-a-practical-guide-to-be-highly-effective-r-software-and-data-visualization

The above changes and more can be applied to any ampvis2 or ggplot plot in this workshop.

Once you have your rarefecation plot looking how you like, you can export it as a 300dpi (i.e. publication quality) tiff or pdf file:

Exporting your plot as a file

You can save your plot as a 300dpi (i.e. publication quality) tiff or pdf file. These files can be found in your working directorypdf file. These files can be found in your working directory.

Tip: you can adjust the width and height of the saved images by changing width = and height = in the code below (and every time ggsave appears in this workflow).

Export as a 300dpi tiffTIFF

Code Block
tiff_exp <- paste0("alpha_div_box_plot_rarefaction_", group, "_", indicname, "_samplessamples_.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")

...

Code Block
pdf_exp <- paste0("alpha_div_box_plot_rarefaction_", group, "_", indicname,"_samples.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")

Statistical analysis

To compare the differences between groups within your variable, a Kruskal-Wallis test (one-way analysis of variance) can be performed to test for overall differences between all groups, and a Wilcoxon rank sum to test pairwise differences between each group.

Kruskal-Wallis:

Code Block
kr_all <- kruskal.test(get(indicname) ~ get(group), data = div_indices)

This runs the test on the diversity index you chose: indicname, the variable: group and uses the diversity indices table you generated: div_indices

To remind you what diversity index and variable you're examining:

Code Block
paste0("Diverity index: ", indicname)
paste0("Variable: ", group)

To see the results:

Code Block
kr_all

Pairwise Wilcoxon rank sum test:

Code Block
wt_pair <- pairwise.wilcox.test(div_indices[[indicname]], div_indices[[group]], p.adjust.method = "BH")
wt_pair <- data.frame(wt_pair$p.value)
colnames(wt_pair) <- gsub("X", "", colnames(wt_pair))

To see the pairwise results (p values).

Code Block
wt_pair

4e. Diversity index plots and statistics - multiple categorical variables

In the previous section you examined a single variable.

In this section you can examine multiple variables by estimating diversity for your primary variable, then splitting the resulting plots into 'facets', based on another variable. This often allows for a better examination of associations and trends between variables.

To refresh your memory, the variable you've chosen to analyse is:

Code Block
group

The first few steps are the same as the previous section. You generate your diversity indices and then select which index you want to plot:

Code Block
div_indices <- amp_alphadiv(ampvisdata, richness = T)
Code Block
indicname <- "Shannon"

Shannon's index is used by default ("Shannon") Change this to "Simpson" to plot Simpson's index scores, "Chao1" for Chao1 richness and "ObservedOTUs" for Observed ASVs.

Now you need to choose a secondary variable.

You can see all the available variables you can choose from by looking at your samples_table column names

Code Block
colnames(samples_table)

Select the secondary variable.

Code Block
var2 <- "Batch"

IMPORTANT: your plots will be split into as many facets as there are unique subcategories in your secondary variable. To see how many subcategories:

Code Block
unique(samples_table[[var2]])

Rarefaction curvegeneralised linear model is applied to examine statistically significant correlations.In addition to the scatter plot, glm (t statistic, p value) and correlation (correlation, p value) statistics can be generated.

NOTE: This section, as with the previous plotting sections, requites that you've run the '3. Preparing your data' section and chosen the samples you want to work with. If you want to change your samples, go back to that section and re-run it with new parameters.

To refresh your memory, the samples you've chosen to analyse are:

[ ]:

Code Block
samples_table$sample.id

Now choose the continuous variable you want to analyse:

Enter the column name of the variable you want to analyse (i.e. change group <- "Myvariable" in the below cell to your chosen variable's column name). This has to be exactly the same as the column name, including capitalisation, characters such as underscores, etc.

[ ]:

Code Block
group <- "DM_digestibility_perc"

You can see view your variable names as column names in your samples table:

[ ]:

Code Block
colnames(samples_table)

4f. Diversity index plots and statistics - continuous variable Generalised linear model t value =

[ ]:

Code Block
round(glm_sum$coefficients[2,3], 4)

Pearson's correlation =

[ ]:

Code Block
round(cor_stat$estimate, 4)

Pearson's correlation p value =

[ ]:

Code Block
round(cor_stat$p.value, 4)

Adding another variablefacets based on an additional categorical variable.First select which categorical variable you want to examine (remember, these are the column names of the samples table):

[ ]:

Code Block
var3 <- "Phase"

Make sure you've run the plotting code in the previous section. This will create an object called 'p' which you can now split into facets based on the variable you chose in the above code cell"

[ ]:

Code Block
p <- p + facet_grid(get(var3) ~ .)
Code Block
p

And once again, you can export the plot as a 300dpi TIFF and pdf file

[ ]:

Code Block
tiffcm")

You can now find these files in your working directory (which you originally defined in the 'Setting up your analysis environment' section).

4d. Diversity index plots and statistics - single categorical variable

The overview section outlined (with links and references) the alpha diversity indices that can be examined.

Briefly, these are: Shannon’s index, Simpson's index, Chao1 richness and Observed ASVs. Each of these has strengths and weaknesses. It's up to you, the researcher, to explore the literature and decide which is the best index to use for your data.

This section will generate the diversity index scores and then plot them.

This is based on the variable you chose in the previous sections. If you’ve forgotten what variable group you’ve selected, type:

Code Block
group

Calculate alpha diversity

First you need to calculate the alpha diversity index scores using ampvis2 function amp_alphadiv(). This will calculate all 4 indices.

Code Block
div_indices <- amp_alphadiv(ampvisdata, richness = T)

The raw diversity index scores are now attached to your metadata table. You can view the raw scores by (scroll to last 5 columns for scores):

Code Block
div_indices

Export this table to your working directory as a csv file:

Code Block
write.csv(div_indices, "diversity_indices_raw_scores.csv")

Choose the index you want to plot

Choose the diversity index scores you want to plot.

Code Block
indicname <- "Shannon"

Shannon's index is used by default ("Shannon") Change this to "Simpson" to plot Simpson's index scores, "Chao1" for Chao1 richness or "ObservedOTUs" for Observed ASVs.

Box and whisker plot

Create a box and whisker plot showing diversity per group. You can view the basic plot like so:

Code Block
p <- ggplot(div_indices, aes_string(x=group, y=indicname)) + 
  geom_boxplot(fill = "grey", outlier.size = 2.5, lwd=0.7)
p

And you can use the above plot as-is or modify it. For example:

Code Block
p <- p + theme_bw() + 
ylab(indicname) + 
xlab("") + 
coord_flip() + 
theme(text = element_text(size = 17)) + 
theme(legend.position = "none") + 
geom_smooth(se = F, aes(group=1, color = "red"))
p

Change the above code to modify the plot as you like. See the rarefaction curve section for more details on modifying a plot.

Briefly, the above code changes the theme (theme_bw()), the Y axis label (ylab(indicname) - note: the current y axis name is inside a variable called indicname, so you need to replace it with text in quotes, e.g. "Chao richness"), removes the X axis label (xlab("")), flips the X and Y axis (coord_flip()), changes the font size to 17 and removes the legend (both in theme()). Finally, a trend line is added (geom_smooth()). Any of these components can be removed or modified as you like.

Save your plot as a tiff and pdf file.

Code Block
tiff_exp <- paste0("alpha_div_box_plot_", group, "_", indicname, "_samples.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
pdf_exp <- paste0("alpha_div_scatterbox_glmplot_", group, "_", indicname,"_samples.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, var2, "_", var3, "_", indicname, "_", subgroup, "_samples.tiff")
Code Block
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")

[ ]:

Code Block
pdf_exp <- paste0("alpha_div_scatter_glm_", group, "_", var2, "_", var3, "_", indicname, "_", subgroup, "_samples.pdf")
Code Block
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")

...

height = 20, units = "cm")

Statistical analysis

To compare the differences between groups within your variable, a Kruskal-Wallis test (one-way analysis of variance) can be performed to test for overall differences between all groups, and a Wilcoxon rank sum to test pairwise differences between each group.

Kruskal-Wallis:

Code Block
kr_all <- kruskal.test(get(indicname) ~ get(group), data = div_indices)

This runs the test on the diversity index you chose: indicname, the variable: group and uses the diversity indices table you generated: div_indices

To remind you what diversity index and variable you're examining:

Code Block
paste0("Diverity index: ", indicname)
paste0("Variable: ", group)

To see the results:

Code Block
kr_all

Pairwise Wilcoxon rank sum test:

Code Block
wt_pair <- pairwise.wilcox.test(div_indices[[indicname]], div_indices[[group]], p.adjust.method = "BH")
wt_pair <- data.frame(wt_pair$p.value)
colnames(wt_pair) <- gsub("X", "", colnames(wt_pair))

To see the pairwise results (p values).

Code Block
wt_pair

Go to the next section: Beta Diversity