Table of Contents

minLevel	1
maxLevel	6
outline	false
style	none
type	list
printable	true

Observed ASVs calculates diversity based on the number of distinct Amplicon Sequence Variants. Originally this method was developed for observed OTUs, as seen in the below paper.

DeSantis, T.Z., Hugenholtz, P., Larsen, N., Rojas, M., Brodie, E.L., Keller, K. Huber, T., Davis, D., Hu, P., Andersen, G.L. (2006). “Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB”. Applied and Environmental Microbiology (72): 5069–5072.

Combined Shannon and Observed ASV. An additional plot combing both Shannon’s index and Observed ASV indices has been included, to compare similarities and differences between these results. As each index uses different units, results for both have been normalised between 0 and 1.
Kruskal-Wallis rank sum test is a rank-based nonparametric test that can be used to determine if there are statistically significant differences between two or more groups. This statistical analysis is provided for each plot, to estimate if there is a significant difference (q.value < 0.05) between all groups.
Pairwise Wilcoxon rank sum test (AKA: Mann-Whitney test is the same as the Kruskal-Wallis test, but applied pairwise to each group (technically, The Kruskal-Wallis test is the generalization of the Wilcoxon rank sum test).

4a. Preparing your data

ampvis2 requires 3 components:

A table of read counts per sample per ASV (abundance table)
A table that matches ASVs to taxonomy information
A metadata table containing your sample IDs and any variables (e.g. experimental groups) associated with each sample.

...

nfcore/ampliseq generates the abundance and taxonomy tables, and we created the metadata table in the previous section, so we can import all 3 into R.

Import the metadata table

When you ran your sequences through the ampliseq pipeline, you submitted the samples with a metadata file. This file contains information on your samples and variables. We need to import this metadata file to run our analysis on selected variables.

Code Block
samples_table <- read.table("metadata.tsv", sep = "\t", header = T)

Have a look at your samples table and variables (metadata):

Code Block
samples_table

Import the ASV abundance table

First import the unfiltered ASV table (ampvis2 does internal filtering).

Code Block

asvtable <- read.table("./results/abundance_table/unfiltered/feature-table.tsv", check.names = FALSE, sep = "\t", skip = 1, stringsAsFactors = FALSE, comment.char = "", header = T)
colnames(asvtable)[1] <- "ASV"

Have a look at the top few rows of the ASV table, to see if it looks right. The sample IDs should be the column names and the ASV IDs in the first column. All the other columns should contain numbers (i.e. the count of the number of times each ASV was found in each sample).

Code Block
head(asvtable)

Important: always run this next code cell, even if you haven't subset your data If you're working with a subset of your data (see section 3) you'll need to choose a name for your subgroup (e.g. subgroup <- "high_fat") or if you haven't subset your data (i.e. you're working with all your samples) subgroup <- "all" . This is for naming output plots and files by subset group, so you're not overwriting them each time you run a subset of your data.

Code Block
subgroup <- "all"

Import the taxonomy table

Read in the taxonomy data - i.e. the taxonomic assignments for each ASV

Code Block
mytax <- read.table("./results/taxonomy/taxonomy.tsv", sep = "\t", header = T, quote = "")

Each taxonomic level needs to be in a separate column, and in taxonomy.tsv they are all combined and separated by ';' so we need to split by that delimiter. Use tidyr::separate function for this.

...

.Z., Hugenholtz, P., Larsen, N., Rojas, M., Brodie, E.L., Keller, K. Huber, T., Davis, D., Hu, P., Andersen, G.L. (2006). “Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB”. Applied and Environmental Microbiology (72): 5069–5072.

Combined Shannon and Observed ASV. An additional plot combing both Shannon’s index and Observed ASV indices has been included, to compare similarities and differences between these results. As each index uses different units, results for both have been normalised between 0 and 1.
Kruskal-Wallis rank sum test is a rank-based nonparametric test that can be used to determine if there are statistically significant differences between two or more groups. This statistical analysis is provided for each plot, to estimate if there is a significant difference (q.value < 0.05) between all groups.
Pairwise Wilcoxon rank sum test (AKA: Mann-Whitney test is the same as the Kruskal-Wallis test, but applied pairwise to each group (technically, The Kruskal-Wallis test is the generalization of the Wilcoxon rank sum test).

4a. Preparing your data

ampvis2 requires 3 components:

A table of read counts per sample per ASV (abundance table)
A table that matches ASVs to taxonomy information
A metadata table containing your sample IDs and any variables (e.g. experimental groups) associated with each sample.

...

nfcore/ampliseq generates the abundance and taxonomy tables, and we created the metadata table in the previous section, so we can import all 3 into R.

Import the metadata table

When you ran your sequences through the ampliseq pipeline, you submitted the samples with a metadata file. This file contains information on your samples and variables. We need to import this metadata file to run our analysis on selected variables.

Code Block
samples_table <- read.table("../illumina/data/metadata.tsv", sep = "\t", header = T)

Have a look at your samples table and variables (metadata):

Code Block
samples_table

Import the ASV abundance table

First import the unfiltered ASV table (ampvis2 does internal filtering).

Code Block
asvtable <- read.table("../illumina/results/dada2/ASV_table.tsv", check.names = FALSE, sep = "\t", skip = 1, stringsAsFactors = FALSE, comment.char = "", header = T) colnames(asvtable)[1] <- "ASV"

Have a look at the top few rows of the ASV table, to see if it looks right. The sample IDs should be the column names and the ASV IDs in the first column. All the other columns should contain numbers (i.e. the count of the number of times each ASV was found in each sample).

Code Block
head(asvtable)

Import the taxonomy table

Read in the taxonomy data - i.e. the taxonomic assignments for each ASV

Code Block
mytax <- read.table("../illumina/results/dada2/ASV_tax.silva_138.tsv", sep = "\t", header = T, quote = "")

Have a look again at the top 6 rows

Code Block
head(mytax)

Combine the taxonomy and ASV tables

Code Block

asv_table <- cbind(asvtable, mytaxsplit)
# Also remove the Feature.IDASV_ID, Confidence and Confidencesequence columns, as they are not needed
asv_table <- subset(asv_table, select=-c(Feature.ID, Confidence)ASV_ID, Confidence, sequence))

Code Block
head(asv_table)

The taxonomy data from ampliseq is prefaced by D_0_D_1_ etc but ampvis2 expects taxonomy assignments to be k_p_, etc (kingdom, phylum, etc). So we need to convert these using gsub()

Code Block

asv_table$Kingdom <- gsub("D_0", "k", asv_table$Kingdom)
asv_table$Phylum <- gsub("D_1", "p", asv_table$Phylum)
asv_table$Class <- gsub("D_2", "c", asv_table$Class)
asv_table$Order <- gsub("D_3", "o", asv_table$Order)
asv_table$Family <- gsub("D_4", "f", asv_table$Family)
asv_table$Genus <- gsub("D_5", "g", asv_table$Genus)
asv_table$Species <- gsub("D_6", "s", asv_table$Species)

...

Code Block
ampvisdata <- amp_load(otutable = asv_table, metadata = samples_table)

...

You can see information about the ampvis2 object you just created by typing its name

Code Block
ampvisdata

4b. Choosing a

...

variable to analyse

In your metadata you'll usually have multiple variables. These need to be analysed individually, by selecting the variable in this section, then running the remaining analysis sections on this chosen variable. You can then re-run the analysis on another variable by returning to this section, changing the variable name, then running again the remaining analysis sections.

...

Enter the column name of the variable you want to analyse (i.e. change group <- "Myvariable" in the below cell to your chosen variable's column name). This has to be exactly the same as the column name, including capitalisation, characters such as underscores, etc.

Code Block
group <- "TimeNose_size"

Ordering your variable

The plotting done in ampvis2 is done by the ggplot2 package. ggplot factorises variables and automatically orders them on the plot by alphabetical order.

...

If these are in the order you want to see them on your plot axes, nothing needs to be done. If they are in the wrong order you need to order them manually by setting the levels.

Choose how you want to order your groups here:

...

to order your groups here:

Code Block
lev <- c("Small", "60hMedium", "72h", "96h", "120h"Big")

To order your variable you need to put all the variable levels into the lev = c(..). Make sure each level is in double quotes and separated by a comma.

...

Colours - scale_color_manual(). See the available R colours here: http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf. If you set colours manualymanually, you need to enter as many colours (or more, never fewer) as there are categories in your variable.

...

You can save your plot as a 300dpi (i.e. publication quality) tiff or pdf file. These files can be found in your working directory.

Tip: you can adjust the width and height of the saved images by changing width = and height = in the code below (and every time ggsave appears in this workflow). Pdf files can be opened within Jupyter, so a good way to test a suitable width/height would be to save the image by running the pdf code below with the default 20cm width/height, then open the pdf file by clicking on it in the file browser panel (to the left of this notebook), then change the width/height and repeat this process as neededwidth and height of the saved images by changing width = and height = in the code below (and every time ggsave appears in this workflow).

Export as a 300dpi TIFF

Code Block
tiff_exp <- paste0("rarefaction_", group, "_", subgroup, "samples_.tiff") ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")

...

Code Block
pdf_exp <- paste0("rarefaction_", group, "", subgroup, "_samples.pdf") ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")

...

Version	Old Version 14	New Version 15
Changes made by	Paul Whatmore	Paul Whatmore
Saved on	Jun 02, 2024	Jun 02, 2024

Versions Compared

Key

4a. Preparing your data

Import the metadata table

Import the ASV abundance table

Import the taxonomy table

4a. Preparing your data

Import the metadata table

Import the ASV abundance table

Import the taxonomy table

4b. Choosing a

variable to analyse

Ordering your variable

Content Comparison

Versions Compared

Key

4a. Preparing your data

Import the metadata table

Import the ASV abundance table

Import the taxonomy table

4a. Preparing your data

Import the metadata table

Import the ASV abundance table

Import the taxonomy table

4b. Choosing a

variable to analyse

Ordering your variable