Skip to end of metadata
Go to start of metadata

You are viewing an old version of this content. View the current version.

Compare with Current View Version History

« Previous Version 11 Current »

3a. Gene ID conversion

KEGG pathways are based on Entrez gene IDs, NCBIs numeric gene identifiers. Therefore, to run this workflow you need to convert your input gene IDs (in our case gene symbols) into Entrez IDs

NOTE: You need to have the correct path for your DE_genes_Basal_cells_Vs_Differentiated_cells.csv file, that you generated in the Differential Expression session earlier. It should be in

H:/workshop/2024/rnaseq/DE_analysis_workshop/Tables folder for most people. If not, change this to the correct path containing your DE genes table.

If you’ve deleted your DE_genes_Basal_cells_Vs_Differentiated_cells.csv file or didn’t attend the Differential Expression section 4 from earlier, you can download the file here , create the H:/workshop/2024/rnaseq/DE_analysis_workshop/Tables folders and put the file in the Tables folder. Copy and paste this section into R.

 

#### 7. Convert to Entrez gene IDs ####
# Set your working directory
setwd("H:/workshop/2024/rnaseq/functional_annotation")
# Import your DE genes data:
# NOTE: THE DIRECTORY THAT CONTAINS YOUR RESULTS DATA MAY VARY. YOU NEED TO LOOK IN 'H:/workshop/RNAseq' TO SEE WHERE YOUR 'Tables' SUBDIRECTORY IS, AND CHANGE THE BELOW PATH TO REFLECT THAT.
dat <- read.csv("H:/workshop/2024/rnaseq/DE_analysis_workshop/Tables/DE_genes_Basal_cells_Vs_Differentiated_cells.csv", row.names = 1)
# Convert the gene symbol column to Entrez IDs
# NOTE: 'OrgDb=' needs to be your organism database. We're using mouse here, so we're using 'OrgDb=org.Mm.eg.db' if your species was human you'd use 'OrgDb=org.Hs.eg.db', etc
gene_list <- bitr(gene = dat$SYMBOL, fromType="SYMBOL", toType="ENTREZID", OrgDb=org.Mm.eg.db, drop=TRUE)

3b. KEGG pathway enrichment

Now we can run the clusterProfiler function enrichKEGG to match your set of Entrez IDs to the KEGG pathways for your species.

NOTE: our data is from mouse, so we’re using the KEGG species ID for mouse (organism = "mmu"). The list of KEGG species IDs is here (e.g. if your data is human, you need to change this to organism = "hsu") :

https://www.genome.jp/kegg/catalog/org_list.html

Copy and paste this section into R.

#### 8. KEGG pathway enrichment ####
# Use clusterProfiler's enrichKEGG() function to match your genes list to KEGG pathways
kk <- enrichKEGG(gene = gene_list$ENTREZID, organism = "mmu", pvalueCutoff = 0.05, qvalueCutoff = 0.2)
# Now you can save this as a table of enriched KEGG pathways
# Create a 'results' subdirectory where all figures will be output
dir.create("results", showWarnings = FALSE)
write.csv(as.data.frame(kk), "./results/Enriched_KEGG_pathways.csv")

3c. Plotting enriched KEGG pathways

Here we’ll plot the enriched KEGG pathways. We’ll generate 3 plots (bar plot, dot plot, network plot) to visualise the same results in 3 different ways. This gives you options to present the data how you choose.

All of these plotting functions are part of the clusterProfiler package and can be modified to change the number of pathways shown and also change a wide array of other plot characteristics, such as plot colours, labels, legend, etc. More details can be seen here: https://yulab-smu.top/biomedical-knowledge-mining-book/index.html

 Copy and paste this section into R.

#### 9. Plotting KEGG results ####
# Remove some text (' - Mus musculus (house mouse)') from each pathway description, for brevity
kk@result$Description <- gsub(" - Mus musculus (house mouse)", "", kk@result$Description, fixed = T)
# Create a barplot
p <- barplot(kk, showCategory = 14, font.size = 14)
p
# Export as a 300dpi tiff
tiff_exp <- "./results/KEGG_bar.tiff"
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
# Export as a pdf
pdf_exp <- "./results/KEGG_bar.pdf"
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")
# Dot plot
p <- dotplot(kk, showCategory = 8, font.size = 14)
p
# Export as a 300dpi tiff
tiff_exp <- "./results/KEGG_dot.tiff"
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
# Export as a pdf
pdf_exp <- "./results/KEGG_dot.pdf"
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")
# Network plot
p <- cnetplot(kk, showCategory = 8, colorEdge = F, node_label = "category")
p
# Export as a 300dpi tiff
tiff_exp <- "./results/KEGG_network.tiff"
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
# Export as a pdf
pdf_exp <- "./results/KEGG_network.pdf"
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")

 

KEGG_bar.tiff

KEGG_dot.tiff

3d. KEGG pathway maps

The package pathview can pull down KEGG pathway maps from the KEGG database and annotate these with your input genes. It shades the genes by colour and intensity (e.g. using log fold change data, upregulated genes can be coloured green and downregulated coloured red).

NOTE: this only plots one of your enriched pathways at a time. You’ll need to enter one of these KEGG pathways IDs into into the keggpath <- .. line. You can see your set of enriched pathways in the H:\workshop\2024\rnaseq\functional_annotation\results\Enriched_KEGG_pathways.csv or type as.data.frame(kk)$ID into the RStudio console.

You can change the KEGG pathway to a different pathway and re-run this for as many of your pathways as you like. Copy and paste this section into R.

#### 10. KEGG pathway maps ####
# pathview requires a named vector as input, so we first will pull out your numeric data (e.g. log fold change data) and name this using your gene IDs.
lfc <- dat$log2FoldChange
names(lfc) <- dat$SYMBOL
# Select the pathway you want to annotate from one of the significantly enriched pathways
# To view these:
as.data.frame(kk)$ID
# Enter one of these pathways below
keggpath <- "mmu04360"
# Run the pathview function to generate the pathway map
pview <- pathview(gene.data  = lfc,
                  pathway.id = keggpath,
                  species    = "mmu",
                  gene.idtype = "SYMBOL",
                  low = list(gene = "red"),
                  mid = list(gene = "gray"),
                  high = list(gene = "green"),
                  out.suffix = "KEGG_pathway_map",
                  limit = list(gene=max(abs(lfc)), cpd=1))

 


  • No labels