/
2024-2: 5b.3 KEGG pathway enrichment

2024-2: 5b.3 KEGG pathway enrichment

3a. Gene ID conversion

KEGG pathways are based on Entrez gene IDs, NCBIs numeric gene identifiers. Therefore, to run this workflow you need to convert your input gene IDs (in our case gene symbols) into Entrez IDs

NOTE: You need to have the correct path for your DE_genes_Basal_cells_Vs_Differentiated_cells.csv file, that you generated in the Differential Expression session earlier. It should be in

H:/workshop/2024/rnaseq/DE_analysis_workshop/Tables folder for most people. If not, change this to the correct path containing your DE genes table.

If you’ve deleted your DE_genes_Basal_cells_Vs_Differentiated_cells.csv file or didn’t attend the Differential Expression section 4 from earlier, you can download the file here , create the H:/workshop/2024/rnaseq/DE_analysis_workshop/Tables folders and put the file in the Tables folder. Copy and paste this section into R.

 

#### 7. Convert to Entrez gene IDs #### # Set your working directory setwd("H:/workshop/2024/rnaseq/functional_annotation") # Import your DE genes data: # NOTE: THE DIRECTORY THAT CONTAINS YOUR RESULTS DATA MAY VARY. YOU NEED TO LOOK IN 'H:/workshop/RNAseq' TO SEE WHERE YOUR 'Tables' SUBDIRECTORY IS, AND CHANGE THE BELOW PATH TO REFLECT THAT. dat <- read.csv("H:/workshop/2024/rnaseq/DE_analysis_workshop/Tables/DE_genes_Basal_cells_Vs_Differentiated_cells.csv", row.names = 1) # Convert the gene symbol column to Entrez IDs # NOTE: 'OrgDb=' needs to be your organism database. We're using mouse here, so we're using 'OrgDb=org.Mm.eg.db' if your species was human you'd use 'OrgDb=org.Hs.eg.db', etc gene_list <- bitr(gene = dat$SYMBOL, fromType="SYMBOL", toType="ENTREZID", OrgDb=org.Mm.eg.db, drop=TRUE)

 

3b. KEGG pathway enrichment

Now we can run the clusterProfiler function enrichKEGG to match your set of Entrez IDs to the KEGG pathways for your species.

NOTE: our data is from mouse, so we’re using the KEGG species ID for mouse (organism = "mmu"). The list of KEGG species IDs is here (e.g. if your data is human, you need to change this to organism = "hsu") :

https://www.genome.jp/kegg/catalog/org_list.html

Copy and paste this section into R.

#### 8. KEGG pathway enrichment #### # Use clusterProfiler's enrichKEGG() function to match your genes list to KEGG pathways kk <- enrichKEGG(gene = gene_list$ENTREZID, organism = "mmu", pvalueCutoff = 0.05, qvalueCutoff = 0.2) # Now you can save this as a table of enriched KEGG pathways # Create a 'results' subdirectory where all figures will be output dir.create("results", showWarnings = FALSE) write.csv(as.data.frame(kk), "./results/Enriched_KEGG_pathways.csv")

 

3c. Plotting enriched KEGG pathways

Here we’ll plot the enriched KEGG pathways. We’ll generate 3 plots (bar plot, dot plot, network plot) to visualise the same results in 3 different ways. This gives you options to present the data how you choose.

All of these plotting functions are part of the clusterProfiler package and can be modified to change the number of pathways shown and also change a wide array of other plot characteristics, such as plot colours, labels, legend, etc. More details can be seen here: 📖 Introduction | Biomedical Knowledge Mining using GOSemSim and clusterProfiler

 Copy and paste this section into R.

#### 9. Plotting KEGG results #### # Remove some text (' - Mus musculus (house mouse)') from each pathway description, for brevity kk@result$Description <- gsub(" - Mus musculus (house mouse)", "", kk@result$Description, fixed = T) # Create a barplot p <- barplot(kk, showCategory = 14, font.size = 14) p # Export as a 300dpi tiff tiff_exp <- "./results/KEGG_bar.tiff" ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm") # Export as a pdf pdf_exp <- "./results/KEGG_bar.pdf" ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm") # Dot plot p <- dotplot(kk, showCategory = 8, font.size = 14) p # Export as a 300dpi tiff tiff_exp <- "./results/KEGG_dot.tiff" ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm") # Export as a pdf pdf_exp <- "./results/KEGG_dot.pdf" ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm") # Network plot p <- cnetplot(kk, showCategory = 8, colorEdge = F, node_label = "category") p # Export as a 300dpi tiff tiff_exp <- "./results/KEGG_network.tiff" ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm") # Export as a pdf pdf_exp <- "./results/KEGG_network.pdf" ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")

 

 

KEGG_bar.tiff
Figure 7. KEGG bar plot

 

 

KEGG_dot.tiff
Figure 8. KEGG dot plot

 

 

3d. KEGG pathway maps

The package pathview can pull down KEGG pathway maps from the KEGG database and annotate these with your input genes. It shades the genes by colour and intensity (e.g. using log fold change data, upregulated genes can be coloured green and downregulated coloured red).

NOTE: this only plots one of your enriched pathways at a time. You’ll need to enter one of these KEGG pathways IDs into into the keggpath <- .. line. You can see your set of enriched pathways in the H:\workshop\2024\rnaseq\functional_annotation\results\Enriched_KEGG_pathways.csv or type as.data.frame(kk)$ID into the RStudio console.

You can change the KEGG pathway to a different pathway and re-run this for as many of your pathways as you like. Copy and paste this section into R.

 

 

 


Related content

2024-2: 5b.1 Preparing your data for FA
2024-2: 5b.1 Preparing your data for FA
More like this
2024-2: 5a.1 Preparing your data for DE
2024-2: 5a.1 Preparing your data for DE
More like this
2024-2: 5a.3 Checking for outliers and batch effects
2024-2: 5a.3 Checking for outliers and batch effects
Read with this
2024-2: 5b-Homework - Functional Annotation
2024-2: 5b-Homework - Functional Annotation
More like this
2024-2: 5b.4 GO (Gene ontology) term enrichment
2024-2: 5b.4 GO (Gene ontology) term enrichment
Read with this
eResearch - Session 3 : Differential expression analysis using R
eResearch - Session 3 : Differential expression analysis using R
More like this