/
2024-2: 5b-Introduction - Functional Annotation (FA)

2024-2: 5b-Introduction - Functional Annotation (FA)

KEGG-pathway-and-GO-term-analysis-of-CCR-genes-and-gene-interactions-A-KEGG-pathway.png
Figure 6. From ‘Clinical significance and prospective molecular mechanism of C‑C motif chemokine receptors in patients with early‑stage pancreatic ductal adenocarcinoma after pancreaticoduodenectomy’ 2019 Oncology Reports 42(5). CC BY-NC-ND 4.0 https://www.researchgate.net/figure/KEGG-pathway-and-GO-term-analysis-of-CCR-genes-and-gene-interactions-A-KEGG-pathway_fig1_335148343

What is functional annotation?

Many types of genetic analysis will output a set of genes that are associated with a specific experimental condition. The classic example of this is RNA-Seq, which outputs a set of genes that are differentially expressed between experimental conditions. But micro RNA, epigenetics (e.g. differential methylation), variant calling, and various other analysis types can also generate a set of condition-based genes.

Functional annotation uses a set of genes (such as differentially expressed genes) to examine enrichment of these genes in Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways and Gene Ontology (GO) terms.

KEGG

.. is a database resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from genomic and molecular-level information. It is a computer representation of the biological system, consisting of molecular building blocks of genes and proteins (genomic information) and chemical substances (chemical information) that are integrated with the knowledge on molecular wiring diagrams of interaction, reaction and relation networks (systems information). It also contains disease and drug information (health information) as perturbations to the biological system.

GO

.. provides a computational representation of our current scientific knowledge about the functions of genes (or, more properly, the protein and non-coding RNA molecules produced by genes) from many different organisms, from humans to bacteria. It is widely used to support scientific research, and has been cited in tens of thousands of publications.

Understanding gene function—how individual genes contribute to the biology of an organism at the molecular, cellular and organism levels—is one of the primary aims of biomedical research. Moreover, experimental knowledge obtained in one organism is often applicable to other organisms, particularly if the organisms share the relevant genes because they inherited them from their common ancestor.

Associations of gene products to GO terms are statements that describe

Molecular Function: the molecular activities of individual gene products

Cellular Component: where the gene products are active

Biological Process: the pathways and larger processes to which that gene product’s activity contributes

R Packages

We’ll be using two main R packages:

Functional enrichment for KEGG pathways and GO terms was completed using the package https://bioconductor.org/packages/release/bioc/html/clusterProfiler.html

You can read more about clusterProfiler’s statistical and analysis methods here: https://yulab-smu.top/biomedical-knowledge-mining-book/index.html

Annotated KEGG pathway maps are generated using the package https://www.bioconductor.org/packages/release/bioc/html/pathview.html

 

Connect to an rVDI virtual desktop machine

As with the previous differential expression analyses we did earlier, we will also be running this analysis in RStudio on an rVDI virtual machine. The reason is the same as before - to save time as the required R packages are pre-installed on these virtual machines. And, as before, you can copy and paste this script to RStudio on your local computer and adapt it to your own dataset.

Overview of FA section - We will now perform the following tasks using Rstudio

  1. Preparing your data for Functional Annotation analysis. Only one data file is needed for this analysis: a differentially expressed gene table from earlier

  2. R packages

    1. Installing required R packages (only need to run once) - after installation, we only need to load the packages. NOTE: If using an rVDI virtual machine, the R packages are already installed

    2. Loading required R packages. Unlike installing the packages, this needs to be done every time you run the analysis

  3. KEGG pathway enrichment

    1. Gene ID conversion

    2. KEGG pathway enrichment

    3. Plotting enriched KEGG pathways

    4. KEGG pathway maps

  4. GO term enrichment

    1. GO term enrichment

    2. Plotting enriched GO terms


Related content

2024-2: 5b.4 GO (Gene ontology) term enrichment
2024-2: 5b.4 GO (Gene ontology) term enrichment
More like this
2024-2: 5a.4 Identifying differentially expressed (DE) genes
2024-2: 5a.4 Identifying differentially expressed (DE) genes
Read with this
Functional annotation
Functional annotation
More like this
2024-2: 5a.3 Checking for outliers and batch effects
2024-2: 5a.3 Checking for outliers and batch effects
Read with this
2024-2: 5b.1 Preparing your data for FA
2024-2: 5b.1 Preparing your data for FA
More like this
2024-2: 5a.1 Preparing your data for DE
2024-2: 5a.1 Preparing your data for DE
Read with this