Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents

...

  1. Open RStudio (you can type it in the Windows search bar)

  2. Create a new R script: ‘File’ → “New File” → “R script”

  3. Save this script where your samples folders are (‘File’ → ‘Save’). These should be on your H or W drive. Save the script file as scrnaseq.R

In the following sections you will be copying and running the R code into your scrnaseq.R script.

Cell Ranger (and nfcore/scrnaseq) generates a default folder and file output structure. There will be a main folder that contains all the sample subfolders (NOTE: this is where you must save your R script). Each sample folder will have an ‘outs’ subfolder. This ‘outs’ folder contains a ‘filtered_feature_bc_matrix’ folder, which contains the files that Seurat uses in its analysis.

...

You can manually set your working directory in RStudio by selecting ‘Session' -> 'Set working directory' -> 'Choose directory'. Choose the same directory as you saved your scrnaseq.R script, previous section. This will output the setwd(...) command with your working directory into the console window (bottom left panel). Copy this command to replace the default setwd(...) line in your R script.

...

Code Block
#### 4e. Differential expression between samples ####

# You will need to choose two of your samples to compare, based on your aggregated dataset. 
# To see which samples you've aggregated:
samples

# Now select the first sample (which will be the baseline sample):
## **USER INPUT**
desamp1 <- "Cerebellum"

# Then select the second (comparison) sample:
## **USER INPUT**
desamp2 <- "Cortex"

# Then run the Seurat differential expression function:
DE_genes <- FindMarkers(data1, group.by = 'orig.ident', ident.1 = desamp1, ident.2 = desamp2, logfc.threshold = 0.2)

# We can see how many 'differentially expressed' genes this produced by:
nrow(DE_genes)

# As mentioned in the descriptive text for this section, using 'logfc.threshold = 0.2' may not find all DE genes

# Run the following:
tail(DE_genes)
# If all the listed genes are non-significant (i.e. p_val_adj < 0.05) then you have captured all genes
# If not, change to 'logfc.threshold = 0.1' and re-run the 'DE_genes <- FindMarkers(..)' line
# Run 'tail(DE_genes)' to again see if you've captured all DE genes
# If you still have significant genes in your list, re-run with 'logfc.threshold = 0'
# This will take a long time to run, but is guaranteed to capture all DE genes

# You can also view the top 20 DE genes by:
head(DE_genes, 20)

# You can extract just the **significantly** differentially expressed genes (adjusted p < 0.05) like so:
DE_genes_sig <- DE_genes[DE_genes$p_val_adj < 0.05, ]

# See how many significantly DE genes there are:
nrow(DE_genes_sig)

# You can also extract genes based on log fold change as well. 
# Select a log fold change threshold here:
## **USER INPUT**
lfc_threshold <- 0.3

# Then filter your data by this log fold change threshold.
DE_genes_sig_lfc <- DE_genes_sig[DE_genes_sig$avg_log2FC < -lfc_threshold | DE_genes_sig$avg_log2FC > lfc_threshold, ]

# Then see how many sig DE genes remain after lfc filtration:
nrow(DE_genes_sig_lfc)

# **NOTE** You can easily filter out all your results here by using too high a lfc threshold. 
# You can see the maximum log fold change for all DE genes like so:
max(DE_genes$avg_log2FC)

# And you can view the minimum as well. 
# Use this min/max log fold change information to decide on a suitable log fold change threshold. 
# This can vary greatly, depending on your data.
min(DE_genes$avg_log2FC)

# You can export the DE genes table to your working directory as a csv file that can be opened in Excel:
write.csv(DE_genes_sig_lfc, paste0("DE_", desamp1, "_vs_", desamp2, ".csv"))


## Visualising DE results ##

# There are a variety of ways to visualise your DE results. 
# Below are a few examples and more can be added to this workflow as needed.

# Not all of these methods have the space to plot all DE genes, so we can provide them with a list of DE genes to plot:
# The below pulls out the top 20 most significantly DE genes, by adjusted p value.
# You can change this number '[1:20]' to whatever number of top genes you prefer
# You can also enter selected genes symbols to plot (e.g., choose genes of interest from the table of DE genes), 
# e.g. plotgenes <- c("pDC", "Eryth", "Mk", "DC"). You can include as many genes as you like.
## **USER INPUT**
plotgenes <- rownames(DE_genes_sig)[1:20]

# Scatter plot of individual genes

# You can select `reduction =` to be either `"pca"` for PCA plot, or `"umap"` or `"tsne"`. You can also change the colours or point sizes.
p <- FeaturePlot(data1, features = plotgenes[1:10], cols = c("lightgrey", "red"), reduction = "pca", pt.size = 1)
p

# Export as a pdf and tiff
tiff_exp <- paste0(desamp1, "_vs_", desamp2, "_DE_genes.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
pdf_exp <- paste0(desamp1, "_vs_", desamp2, "_DE_genes.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")

# Dot plot

p <- DotPlot(data1, features = plotgenes, group.by = 'orig.ident', dot.scale = 12, cols = c("lightgrey", "red")) + RotatedAxis() +
ylab("Sample") + xlab("Genes") +
theme(text = element_text(size = 18))
p

# Export as a pdf and tiff
tiff_exp <- paste0(desamp1, "_vs_", desamp2, "_DE_genes_dotplot.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
pdf_exp <- paste0(desamp1, "_vs_", desamp2, "_DE_genes_dotplot.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")

# Violin plots of individual genes

p <- VlnPlot(data1, features = plotgenes[1:10], split.plot = TRUE, split.by = 'orig.ident', cols = c25, pt.size = 0)
p

# Export as a pdf and tiff
tiff_exp <- paste0(desamp1, "_vs_", desamp2, "_DE_genes_violin.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
pdf_exp <- paste0(desamp1, "_vs_", desamp2, "_DE_genes_violin.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")

# Heatmap
p <- DoHeatmap(data1, features = plotgenes, raster = T, group.by = 'orig.ident') + 
scale_fill_gradientn(colors = c("darkorange", "floralwhite", "dodgerblue4")) + 
theme(text = element_text(size = 16))
p

# Export as a pdf and tiff
tiff_exp <- paste0(desamp1, "_vs_", desamp2, "_DE_genes_heatmap.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
pdf_exp <- paste0(desamp1, "_vs_", desamp2, "_DE_genes_heatmap.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")

5. Differential expression

This section examines significantly differentially expressed genes for each cluster, for a selected sample. Significantly DE genes are those with false-discovery adjusted p values of < 0.05.

IMPORTANT: To run this section, you must have processed your sample(s) by completing the '3.Analysis using the seurat package' section. That section removes outliers and non-target cells and identifies clusters. This differential expression analysis uses these filtered cells and cluster information. Section 3 only has to run once for each sample, as it outputs a datafile for each sample that is imported into this section.

5a. Select a sample to work with and import sample data

Code Block
#### 5a. Select a sample to work with and import sample data ####

# Find any sample datasets you generated by running section 3
dir(pattern = "seurat_filtered.rds")