Content Comparison

...

The following table shows how many reads were kept and removed. Percentage columns are the filtered, merged, and non-chimeric reads compared to the number of original, unfiltered reads. Thus the number of non-chimeric reads and percentage is the number and % of reads remaining after all filtration steps.

First, choose your working directory. This is the directory where your ampliseq results directories are. See the next section, Alpha diversity, for more details on this.

Code Block
setwd("~/Mahsa_paper_1")

Then, import the DADA2 stat and create the percentage columns:

Code Block

dada <- read.table("results/abundance_table/unfiltered/dada_stats.tsv", sep = "\t", header = T)
dada$percentage.of.input.passed.filter <- paste0(round(dada$percentage.of.input.passed.filter, 2), "%")
dada$percentage.of.input.denoised <- paste0(round(dada$percentage.of.input.denoised, 2), "%")
dada$percentage.of.input.non.chimeric <- paste0(round(dada$percentage.of.input.non.chimeric, 2), "%")
colnames(dada) <- c("Sample", "Unfiltered_reads", "Filtered", "%", "Denoised", "%", "Non-chimeric", "%")

To generate the table we'll use the DT: datatables package. This creates a table that is searchable, columns can be ordered, table can be exported as a csv or Excel file, etc.

Code Block

DT::datatable(dada, rownames = F,
              width = "100%",
              extensions = 'Buttons',
              options = list(scrollX = TRUE,
                             dom = 'Bfrtip',
                             columnDefs = list(list(className = 'dt-center', targets="_all")),
                             buttons =
          list('copy', 'print', list(
            extend = 'collection',
            buttons = list(
                list(extend = 'csv', filename = "DADA_filtration"),
                list(extend = 'excel', filename = "DADA_filtration"),
                list(extend = 'pdf', filename = "DADA_filtration")),
            text = 'Download'
          ))
      )
    )

Stacked bar plot of DADA2 filtration

Now we'll plot the above results (filtered vs unfiltered reads) on a stacked barplot.

We'll do this using the ggplot package.

Import the data:

Code Block

dada_bp <- data.frame(dada$Sample, dada$Unfiltered_reads - dada$`Non-chimeric`, dada$`Non-chimeric`)
colnames(dada_bp) <- c("Sample", "Removed reads", "Remaining reads")
df <- tidyr::pivot_longer(dada_bp, cols =c("Removed reads", "Remaining reads"))
colnames(df) <- c("Sample", "Reads", "Read_count")
options(scipen = 999)

Now plot this:

Code Block

options(repr.plot.width=12, repr.plot.height=8)
p <- ggplot(df, aes(Sample, Read_count, fill=Reads)) + geom_bar(stat = "identity", position = 'stack')
p <- p + scale_fill_brewer(palette="Dark2") + ylab("Read count") + theme_bw() +
  theme(text = element_text(size = 14), axis.text.x = element_text(angle = 90, size = 10))
p

You can save your plot as a 300dpi (i.e. publication quality) tiff or pdf file. These files can be found in your working directory.

Tip: you can adjust the width and height of the saved images by changing width = and height = in the code below. Pdf files can be opened within Jupyter, so a good way to test a suitable width/height would be to save the image by running the pdf code below with the default 20cm width/height, then open the pdf file by clicking on it in the file browser panel (to the left of this notebook), then change the width/height and repeat this process as needed.

Export as a 300dpi tiff

Code Block
tiff_exp <- "DADA2_summary.tiff" ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")

Export as a pdf

Code Block
pdf_exp <- "DADA2_summary.pdf" ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")

Version	Old Version 2	New Version 3
Changes made by	Paul Whatmore (Deactivated)	Paul Whatmore (Deactivated)
Saved on	28/05/2024	28/05/2024

Versions Compared

Key

Stacked bar plot of DADA2 filtration