Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents

...

  1. Open RStudio (you can type it in the Windows search bar)

  2. Create a new R script: ‘File’ → “New File” → “R script”

  3. Save this script where your samples folders are (‘File’ → ‘Save’). These should be on your H or W drive. Save the script file as scrnaseq.R

In the following sections you will be copying and running the R code into your scrnaseq.R script.

Cell Ranger (and nfcore/scrnaseq) generates a default folder and file output structure. There will be a main folder that contains all the sample subfolders (NOTE: this is where you must save your R script). Each sample folder will have an ‘outs’ subfolder. This ‘outs’ folder contains a ‘filtered_feature_bc_matrix’ folder, which contains the files that Seurat uses in its analysis.

...

You can manually set your working directory in RStudio by selecting ‘Session' -> 'Set working directory' -> 'Choose directory'. Choose the same directory as you saved your scrnaseq.R script, previous section. This will output the setwd(...) command with your working directory into the console window (bottom left panel). Copy this command to replace the default setwd(...) line in your R script.

...

Code Block
#### 4e. Differential expression between samples ####

# You will need to choose two of your samples to compare, based on your aggregated dataset. 
# To see which samples you've aggregated:
samples

# Now select the first sample (which will be the baseline sample):
## **USER INPUT**
desamp1 <- "Cerebellum"

# Then select the second (comparison) sample:
## **USER INPUT**
desamp2 <- "Cortex"

# Then run the Seurat differential expression function:
DE_genes <- FindMarkers(data1, group.by = 'orig.ident', ident.1 = desamp1, ident.2 = desamp2, logfc.threshold = 0.2)

# We can see how many 'differentially expressed' genes this produced by:
nrow(DE_genes)

# As mentioned in the descriptive text for this section, using 'logfc.threshold = 0.2' may not find all DE genes

# Run the following:
tail(DE_genes)
# If all the listed genes are non-significant (i.e. p_val_adj < 0.05) then you have captured all genes
# If not, change to 'logfc.threshold = 0.1' and re-run the 'DE_genes <- FindMarkers(..)' line
# Run 'tail(DE_genes)' to again see if you've captured all DE genes
# If you still have significant genes in your list, re-run with 'logfc.threshold = 0'
# This will take a long time to run, but is guaranteed to capture all DE genes

# You can also view the top 20 DE genes by:
head(DE_genes, 20)

# You can extract just the **significantly** differentially expressed genes (adjusted p < 0.05) like so:
DE_genes_sig <- DE_genes[DE_genes$p_val_adj < 0.05, ]

# See how many significantly DE genes there are:
nrow(DE_genes_sig)

# You can also extract genes based on log fold change as well. 
# Select a log fold change threshold here:
## **USER INPUT**
lfc_threshold <- 0.3

# Then filter your data by this log fold change threshold.
DE_genes_sig_lfc <- DE_genes_sig[DE_genes_sig$avg_log2FC < -lfc_threshold | DE_genes_sig$avg_log2FC > lfc_threshold, ]

# Then see how many sig DE genes remain after lfc filtration:
nrow(DE_genes_sig_lfc)

# **NOTE** You can easily filter out all your results here by using too high a lfc threshold. 
# You can see the maximum log fold change for all DE genes like so:
max(DE_genes$avg_log2FC)

# And you can view the minimum as well. 
# Use this min/max log fold change information to decide on a suitable log fold change threshold. 
# This can vary greatly, depending on your data.
min(DE_genes$avg_log2FC)

# You can export the DE genes table to your working directory as a csv file that can be opened in Excel:
write.csv(DE_genes_sig_lfc, paste0("DE_", desamp1, "_vs_", desamp2, ".csv"))


## Visualising DE results ##

# There are a variety of ways to visualise your DE results. 
# Below are a few examples and more can be added to this workflow as needed.

# Not all of these methods have the space to plot all DE genes, so we can provide them with a list of DE genes to plot:
# The below pulls out the top 20 most significantly DE genes, by adjusted p value.
# You can change this number '[1:20]' to whatever number of top genes you prefer
# You can also enterplot selectedspecific genes symbolsby to plotname (i.e.g., choose genes of interest from the table of DE genes), 
# e.g. plotgenes <- c("pDC", "Eryth", "Mk", "DC"). You can include as many genes as you like.
## **USER INPUT**
plotgenes <- rownames(DE_genes_sig)[1:20]

# Scatter plot of individual genes

# You can select `reduction =` to be either `"pca"` for PCA plot, or `"umap"` or `"tsne"`. You can also change the colours or point sizes.
p <- FeaturePlot(data1, features = plotgenes[1:10], cols = c("lightgrey", "red"), reduction = "pca", pt.size = 1)
p

# Export as a pdf and tiff
tiff_exp <- paste0(desamp1, "_vs_", desamp2, "_DE_genes.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
pdf_exp <- paste0(desamp1, "_vs_", desamp2, "_DE_genes.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")

# Dot plot

p <- DotPlot(data1, features = plotgenes, group.by = 'orig.ident', dot.scale = 12, cols = c("lightgrey", "red")) + RotatedAxis() +
ylab("Sample") + xlab("Genes") +
theme(text = element_text(size = 18))
p

# Export as a pdf and tiff
tiff_exp <- paste0(desamp1, "_vs_", desamp2, "_DE_genes_dotplot.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
pdf_exp <- paste0(desamp1, "_vs_", desamp2, "_DE_genes_dotplot.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")

# Violin plots of individual genes

p <- VlnPlot(data1, features = plotgenes[1:10], split.plot = TRUE, split.by = 'orig.ident', cols = c25, pt.size = 0)
p

# Export as a pdf and tiff
tiff_exp <- paste0(desamp1, "_vs_", desamp2, "_DE_genes_violin.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
pdf_exp <- paste0(desamp1, "_vs_", desamp2, "_DE_genes_violin.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")

# Heatmap
p <- DoHeatmap(data1, features = plotgenes, raster = T, group.by = 'orig.ident') + 
scale_fill_gradientn(colors = c("darkorange", "floralwhite", "dodgerblue4")) + 
theme(text = element_text(size = 16))
p

# Export as a pdf and tiff
tiff_exp <- paste0(desamp1, "_vs_", desamp2, "_DE_genes_heatmap.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
pdf_exp <- paste0(desamp1, "_vs_", desamp2, "_DE_genes_heatmap.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")

...

Code Block
#### 5c. Differential expression: one cluster vs all other cells ####

# First, we should have a look at the clusters we have to work with:
levels(x = data)

# You'll need to select one of these clusters as the baseline cluster (`declust <-`). 
# You can re-run the below code with a different baseline cluster if you want to look at other comparisons:
## **USER INPUT**
declust <- 0

# Then run the Seurat differential expression function:
DE_genes <- FindMarkers(data, ident.1 = declust, logfc.threshold = 0.2)

# NOTE: modify the 'logfc.threshold =' parameter so that you've captured all DE genes.
# See section '4e. Differential expression between samples' for an explanation

# Look at the bottom 6 genes, to see if you've captured all DE genes. If not, lower 'logfc.threshold =' and re-run
tail(DE_genes)

# We can see how many genes this found by:
nrow(DE_genes)

# View the top 20 DE genes by:
head(DE_genes, 20)

# You can extract just the **significantly** differentially expressed genes (adjusted p < 0.05) like so:
DE_genes_sig <- DE_genes[DE_genes$p_val_adj < 0.05, ]

# See how many significantly DE genes there are
nrow(DE_genes_sig)

# You can also extract genes based on log fold change as well. Enter a log fold change threshold here (e.g. lfc_threshold <- 1):
# By default, this is zero, which will not apply any log fold change filtration
lfc_threshold <- 0

# Then filter your data by this log fold change threshold.
DE_genes_sig_lfc <- DE_genes_sig[DE_genes_sig$avg_log2FC < -lfc_threshold | DE_genes_sig$avg_log2FC > lfc_threshold, ]

# See how many sig DE genes remain after lfc filtration:
nrow(DE_genes_sig_lfc)
# **NOTE** You can easily filter out all your results here by using too high a lfc threshold.

# Export the DE genes table to your working directory as a csv file that can be opened in Excel:
write.csv(DE_genes_sig_lfc, paste0("DE_1vsALL_clust_", declust, "_", sample, ".csv"))

## Visualising DE results ##

# There are a variety of ways to visualise your DE results. 
# Below are a few examples and more can be added to this workflow as needed.

# Not all of these methods have the space to plot all DE genes, so we can provide them with a list of DE genes to plot:
# The below pulls out the top 20 most significantly DE genes, by adjusted p value.
# You can change this number '[1:20]' to whatever number of top genes you prefer
# You can also plot specific genes by name (i.e., choose genes of interest from the table of DE genes), 
# e.g. plotgenes <- c("pDC", "Eryth", "Mk", "DC"). You can include as many genes as you like.
## **USER INPUT**
plotgenes <- rownames(DE_genes_sig)[1:20]

# Scatter plot of individual genes

# You can select `reduction =` to be either `"pca"` for PCA plot, or `"umap"` or `"tsne"`. You can also change the colours or point sizes.
p <- FeaturePlot(data, features = plotgenes, cols = c("lightgrey", "red"), reduction = "pca", pt.size = 1)
p

# Export as a pdf and tiff
tiff_exp <- paste0(sample, "clust_", declust, "_DE_genes.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
pdf_exp <- paste0(sample, "clust_", declust, "_DE_genes.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")

# Dot plot

p <- DotPlot(data, features = plotgenes, dot.scale = 8, cols = c("lightgrey", "red")) + RotatedAxis() +
ylab("Cluster") + xlab("Genes") +
theme(text = element_text(size = 18))
p

# Export as a pdf and tiff
tiff_exp <- paste0(sample, "clust_", declust, "_DE_genes_dotplot.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
pdf_exp <- paste0(sample, "clust_", declust, "_DE_genes_dotplot.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")

# Violin plots of individual genes
p <- VlnPlot(data, features = plotgenes, cols = c25, pt.size = 0)
p

# Export as a pdf and tiff
tiff_exp <- paste0(sample, "clust_", declust, "_DE_genes_violin.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
pdf_exp <- paste0(sample, "clust_", declust, "_DE_genes_violin.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")

# Heatmap

p <- DoHeatmap(data, features = plotgenes, raster = T) + 
scale_fill_gradientn(colors = c("darkorange", "floralwhite", "dodgerblue4")) + 
theme(text = element_text(size = 16))
p

# Export as a pdf and tiff
tiff_exp <- paste0(sample, "clust_", declust, "_DE_genes_heatmap.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
pdf_exp <- paste0(sample, "clust_", declust, "_DE_genes_heatmap.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")