Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents

...

To access this count table:

Go to the/sandpit/demo/run3_full_pipeline/ folder that contains the results from running the nfcore/rnaseq pipeline. The output folders from task 3 look like this:

...

Now let’s find the full path to the ‘salmon.merged.gene_counts.tsv’ file:

  • Windows:

  • Mac:

    • cd /folder/that/contains/feature_counts/

    • pwd

  • Rstudio:

    • Open Rstudio, go to the top bar a click on “Session” → “Select working directory: → “Choose directory

    • The path to the directory will be printed in Rstudio console, copy and paste in line 1 of the script ‘RNAseq_DESeq2_analysis.R’ (see below)

...

Code Block
#### 6a. Annotating your DE genes ####

# Annotation packages for human (org.Hs.eg.db), mouse (org.Mm.eg.db), rat (org.Rn.eg.db), E. coli strain K12 (org.EcK12.eg.db), E. coli strain Sakai (org.EcSakai.eg.db), zebrafish (org.Dr.eg.db) and Drosophila (org.Dm.eg.db) were installed with the required R packages. If your species is not in the above list, contact the eResearch team.

## USER INPUT
# Input your species genome below (select from the above list)
my_genome <- org.Mm.eg.db

# Pull out just the Entrez/Ensembl gene IDs
gene_ids <- row.names(res)

# You can see the list of annotations you can apply to your data by:
keytypes(my_genome)

# Annotate gene symbol and description to your DE gene IDs (you can add other keytypes from the above 'keytypes(my_genome)' list, if you choose)
cols <- c("SYMBOL", "GENENAME")

## USER INPUT
# Provide the gene ID type for your DE data
# If you have Ensemble IDs, enter "ENSEMBL", if you have Entrez IDs, enter "ENTREZID", if you have gene symbols, enter "SYMBOL"
idtype <- "ENSEMBL"

# Map the Entrez/Ensembl gene IDs to gene symbol and description
# NOTE: IF YOUR GENE ID IS AN ENTREZ ID, CHANGE keytype="ENSEMBL" TO keytype="ENTREZID"
map <- AnnotationDbi::select(my_genome, keys=gene_ids, columns=cols, keytype=idtype)

# Combine the annotation data with the DE data
# Since we're matching Ensembl -> Entrez there aren't a 1:1 mapping, so need to merge rather than cbind
annot <- merge(x = res, y = map, by.x = 0, by.y = idtype, all = F)
# There isn't always a 1:1 mapping between gene identifiers, so we also need to remove duplicates
annot_nodups <- annot[!duplicated(annot$Row.names), ]
# Reorder by adjusted p
annot_nodups <- annot_nodups[order(annot_nodups$padj), ]

# Add normalised counts to the output table. This is so you can later plot expression trends for individual genes in R, Excel, etc.
# Need to normalise the counts first, using the size factors calculated by DESeq2 (in the 'deseq' object)
expdata_norm <- as.matrix(expdata) %*% diag(deseq$sizeFactor)
colnames(expdata_norm) <- colnames(expdata)
annot_counts <- merge(x = annot_nodups, y = expdata_norm, by.x = "Row.names", by.y = 0, all = TRUE)

# Export the full annotated dataset as a csv (for journal submission)
# Creates a 'Tables' subdirectory where all tables will be output
dir.create("Tables", showWarnings = FALSE)
write.csv(annot_counts, file=paste0("./Tables/all_genes_", paste(degroups, collapse = "_Vs_"), ".csv"), row.names = FALSE)

# Pull out just significant genes (change from 0.05 to 0.01 if needed)
DE_genes <- subset(annot_counts, padj < 0.05, select=colnames(annot_counts))
# NOTE: add lfc cutoffs if needed. E.g., log2FoldChange > 1 and < log2FoldChange -1 cutoff
# DE_genes <- subset(DE_genes, log2FoldChange > 1 | log2FoldChange < -1, select=colnames(DE_genes))

# Export the sig DE genes as a table (to be used in downstream analysis)
write.csv(DE_genes, file=paste0("./Tables/DE_genes_", paste(degroups, collapse = "_Vs_"), ".csv"), row.names = FALSE)

...