2024-2: 5b.3 KEGG pathway enrichment
3a. Gene ID conversion
KEGG pathways are based on Entrez gene IDs, NCBIs numeric gene identifiers. Therefore, to run this workflow you need to convert your input gene IDs (in our case gene symbols) into Entrez IDs
NOTE: You need to have the correct path for your DE_genes_Basal_cells_Vs_Differentiated_cells.csv file, that you generated in the Differential Expression session earlier. It should be in
H:/workshop/2024/rnaseq/DE_analysis_workshop/Tables folder for most people. If not, change this to the correct path containing your DE genes table.
If you’ve deleted your DE_genes_Basal_cells_Vs_Differentiated_cells.csv file or didn’t attend the Differential Expression section 4 from earlier, you can download the file here , create the H:/workshop/2024/rnaseq/DE_analysis_workshop/Tables folders and put the file in the Tables folder. Copy and paste this section into R.
#### 7. Convert to Entrez gene IDs ####
# Set your working directory
setwd("H:/workshop/2024/rnaseq/functional_annotation")
# Import your DE genes data:
# NOTE: THE DIRECTORY THAT CONTAINS YOUR RESULTS DATA MAY VARY. YOU NEED TO LOOK IN 'H:/workshop/RNAseq' TO SEE WHERE YOUR 'Tables' SUBDIRECTORY IS, AND CHANGE THE BELOW PATH TO REFLECT THAT.
dat <- read.csv("H:/workshop/2024/rnaseq/DE_analysis_workshop/Tables/DE_genes_Basal_cells_Vs_Differentiated_cells.csv", row.names = 1)
# Convert the gene symbol column to Entrez IDs
# NOTE: 'OrgDb=' needs to be your organism database. We're using mouse here, so we're using 'OrgDb=org.Mm.eg.db' if your species was human you'd use 'OrgDb=org.Hs.eg.db', etc
gene_list <- bitr(gene = dat$SYMBOL, fromType="SYMBOL", toType="ENTREZID", OrgDb=org.Mm.eg.db, drop=TRUE)
3b. KEGG pathway enrichment
Now we can run the clusterProfiler function enrichKEGG
to match your set of Entrez IDs to the KEGG pathways for your species.
NOTE: our data is from mouse, so we’re using the KEGG species ID for mouse (organism = "mmu"
). The list of KEGG species IDs is here (e.g. if your data is human, you need to change this to organism = "hsu"
) :
https://www.genome.jp/kegg/catalog/org_list.html
Copy and paste this section into R.
#### 8. KEGG pathway enrichment ####
# Use clusterProfiler's enrichKEGG() function to match your genes list to KEGG pathways
kk <- enrichKEGG(gene = gene_list$ENTREZID, organism = "mmu", pvalueCutoff = 0.05, qvalueCutoff = 0.2)
# Now you can save this as a table of enriched KEGG pathways
# Create a 'results' subdirectory where all figures will be output
dir.create("results", showWarnings = FALSE)
write.csv(as.data.frame(kk), "./results/Enriched_KEGG_pathways.csv")
3c. Plotting enriched KEGG pathways
Here we’ll plot the enriched KEGG pathways. We’ll generate 3 plots (bar plot, dot plot, network plot) to visualise the same results in 3 different ways. This gives you options to present the data how you choose.
All of these plotting functions are part of the clusterProfiler package and can be modified to change the number of pathways shown and also change a wide array of other plot characteristics, such as plot colours, labels, legend, etc. More details can be seen here: 📖 Introduction | Biomedical Knowledge Mining using GOSemSim and clusterProfiler
Copy and paste this section into R.
#### 9. Plotting KEGG results ####
# Remove some text (' - Mus musculus (house mouse)') from each pathway description, for brevity
kk@result$Description <- gsub(" - Mus musculus (house mouse)", "", kk@result$Description, fixed = T)
# Create a barplot
p <- barplot(kk, showCategory = 14, font.size = 14)
p
# Export as a 300dpi tiff
tiff_exp <- "./results/KEGG_bar.tiff"
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
# Export as a pdf
pdf_exp <- "./results/KEGG_bar.pdf"
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")
# Dot plot
p <- dotplot(kk, showCategory = 8, font.size = 14)
p
# Export as a 300dpi tiff
tiff_exp <- "./results/KEGG_dot.tiff"
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
# Export as a pdf
pdf_exp <- "./results/KEGG_dot.pdf"
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")
# Network plot
p <- cnetplot(kk, showCategory = 8, colorEdge = F, node_label = "category")
p
# Export as a 300dpi tiff
tiff_exp <- "./results/KEGG_network.tiff"
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
# Export as a pdf
pdf_exp <- "./results/KEGG_network.pdf"
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")
3d. KEGG pathway maps
The package pathview can pull down KEGG pathway maps from the KEGG database and annotate these with your input genes. It shades the genes by colour and intensity (e.g. using log fold change data, upregulated genes can be coloured green and downregulated coloured red).
NOTE: this only plots one of your enriched pathways at a time. You’ll need to enter one of these KEGG pathways IDs into into the keggpath <- ..
line. You can see your set of enriched pathways in the H:\workshop\2024\rnaseq\functional_annotation\results\Enriched_KEGG_pathways.csv or type as.data.frame(kk)$ID
into the RStudio console.
You can change the KEGG pathway to a different pathway and re-run this for as many of your pathways as you like. Copy and paste this section into R.