Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents

...

  1. Open RStudio (you can type it in the Windows search bar)

  2. Create a new R script: ‘File’ → “New File” → “R script”

  3. Save this script where your samples folders are (‘File’ → ‘Save’). These should be on your H or W drive. Save the script file as scrnaseq.R

In the following sections you will be copying and running the R code into your scrnaseq.R script.

Cell Ranger (and nfcore/scrnaseq) generates a default folder and file output structure. There will be a main folder that contains all the sample subfolders (NOTE: this is where you must save your R script). Each sample folder will have an ‘outs’ subfolder. This ‘outs’ folder contains a ‘filtered_feature_bc_matrix’ folder, which contains the files that Seurat uses in its analysis.

...

You can manually set your working directory in RStudio by selecting ‘Session' -> 'Set working directory' -> 'Choose directory'. Choose the same directory as you saved your scrnaseq.R script, previous section. This will output the setwd(...) command with your working directory into the console window (bottom left panel). Copy this command to replace the default setwd(...) line in your R script.

...

Code Block
#### 5c. Differential expression: one cluster vs all other cells ####

# First, we should have a look at the clusters we have to work with:
levels(x = data)

# You'll need to select one of these clusters as the baseline cluster (`declust <-`). 
# You can re-run the below code with a different baseline cluster if you want to look at other comparisons:
## **USER INPUT**
declust <- 0

# Then run the Seurat differential expression function:
DE_genes <- FindMarkers(data, ident.1 = declust, logfc.threshold = 0.2)

# NOTE: modify the 'logfc.threshold =' parameter so that you've captured all DE genes.
# See section '4e. Differential expression between samples' for an explanation

# Look at the bottom 6 genes, to see if you've captured all DE genes. If not, lower 'logfc.threshold =' and re-run
tail(DE_genes)

# We can see how many genes this found by:
nrow(DE_genes)

# View the top 20 DE genes by:
head(DE_genes, 20)

# You can extract just the **significantly** differentially expressed genes (adjusted p < 0.05) like so:
DE_genes_sig <- DE_genes[DE_genes$p_val_adj < 0.05, ]

# See how many significantly DE genes there are
nrow(DE_genes_sig)

# You can also extract genes based on log fold change as well. Enter a log fold change threshold here (e.g. lfc_threshold <- 1):
# By default, this is zero, which will not apply any log fold change filtration
lfc_threshold <- 0

# Then filter your data by this log fold change threshold.
DE_genes_sig_lfc <- DE_genes_sig[DE_genes_sig$avg_log2FC < -lfc_threshold | DE_genes_sig$avg_log2FC > lfc_threshold, ]

# See how many sig DE genes remain after lfc filtration:
nrow(DE_genes_sig_lfc)
# **NOTE** You can easily filter out all your results here by using too high a lfc threshold.

# Export the DE genes table to your working directory as a csv file that can be opened in Excel:
write.csv(DE_genes_sig_lfc, paste0("DE_1vsALL_clust_", declust, "_", sample, ".csv"))

## Visualising DE results ##

# There are a variety of ways to visualise your DE results. 
# Below are a few examples and more can be added to this workflow as needed.

# Not all of these methods have the space to plot all DE genes, so we can provide them with a list of DE genes to plot:
# The below pulls out the top 20 most significantly DE genes, by adjusted p value.
# You can change this number '[1:20]' to whatever number of top genes you prefer
# You can also plot specific genes by name (i.e., choose genes of interest from the table of DE genes), 
# e.g. plotgenes <- c("pDC", "Eryth", "Mk", "DC"). You can include as many genes as you like.
## **USER INPUT**
plotgenes <- rownames(DE_genes_sig)[1:20]

# Scatter plot of individual genes

# You can select `reduction =` to be either `"pca"` for PCA plot, or `"umap"` or `"tsne"`. You can also change the colours or point sizes.
p <- FeaturePlot(data, features = plotgenes, cols = c("lightgrey", "red"), reduction = "pca", pt.size = 1)
p

# Export as a pdf and tiff
tiff_exp <- paste0(sample, "clust_", declust, "_DE_genes.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
pdf_exp <- paste0(sample, "clust_", declust, "_DE_genes.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")

# Dot plot

p <- DotPlot(data, features = plotgenes, dot.scale = 8, cols = c("lightgrey", "red")) + RotatedAxis() +
ylab("Cluster") + xlab("Genes") +
theme(text = element_text(size = 18))
p

# Export as a pdf and tiff
tiff_exp <- paste0(sample, "clust_", declust, "_DE_genes_dotplot.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
pdf_exp <- paste0(sample, "clust_", declust, "_DE_genes_dotplot.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")

# Violin plots of individual genes
p <- VlnPlot(data, features = plotgenes, cols = c25, pt.size = 0)
p

# Export as a pdf and tiff
tiff_exp <- paste0(sample, "clust_", declust, "_DE_genes_violin.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
pdf_exp <- paste0(sample, "clust_", declust, "_DE_genes_violin.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")

# Heatmap

p <- DoHeatmap(data, features = plotgenes, raster = T) + 
scale_fill_gradientn(colors = c("darkorange", "floralwhite", "dodgerblue4")) + 
theme(text = element_text(size = 16))
p

# Export as a pdf and tiff
tiff_exp <- paste0(sample, "clust_", declust, "_DE_genes_heatmap.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
pdf_exp <- paste0(sample, "clust_", declust, "_DE_genes_heatmap.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")

# You can go back to the start of this section and change 'declust <- ' to another cluster to examine DE for that cluster.

5d. Differential expression: one cluster vs another cluster

Code Block
#### 5d.Differential expression: one cluster vs another cluster ####

# In this section you'll select to clusters to compare. Any upregulated genes (postive log fold change) are more highly expressed in the second cluster.

# First, select your baseline cluster:
## **USER INPUT**
declust <- 0

# Then, the cluster you want to compare it to:
## **USER INPUT**
declust2 <- 1

# Then run the Seurat DE function.
DE_genes <- FindMarkers(data, ident.1 = declust, ident.2 = declust2, logfc.threshold = 0.2)

# As in the previous section, you can run `tail(DE_genes)` to see if you need to reduce `logfc.threshold = 0.2` so as to capture all **significantly** DE genes.
tail(DE_genes)

# And again you can filter by p value and log fold change (i.e. change `p_val_adj` and `lfc_threshold` to suitable numbers), count the number of sig DE genes and view the top few.
DE_genes_sig <- DE_genes[DE_genes$p_val_adj < 0.05, ]
lfc_threshold <- 0.
DE_genes_sig_lfc <- DE_genes_sig[DE_genes_sig$avg_log2FC < -lfc_threshold | DE_genes_sig$avg_log2FC > lfc_threshold, ]
nrow(DE_genes_sig_lfc)
head(DE_genes_sig_lfc)

# Export the table of DE genes as a csv file:
write.csv(DE_genes_sig_lfc, paste0("DE_1vs1_clust", declust, "_vs_clust", declust2, "_", sample, ".csv"))

## Plotting ##

# Pull out the top 20 genes to plot (or change to the range or names of genes you want to plot, as in the previous section.)
plotgenes <- rownames(DE_genes_sig)[1:20]

# Scatter plot
p <- FeaturePlot(data, features = plotgenes, cols = c("red", "lightgrey"), reduction = "pca", pt.size = 1)
p

# Export as a pdf and tiff
tiff_exp <- paste0(sample, "clust_", declust, "clust_", declust2, "_DE_genes.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
pdf_exp <- paste0(sample, "clust_", declust, "clust_", declust2, "_DE_genes.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")


# Dot plot

p <- DotPlot(data, features = plotgenes, dot.scale = 8, cols = c("red", "lightgrey")) + RotatedAxis() +
ylab("Cluster") + xlab("Genes") +
theme(text = element_text(size = 18))
p

# Export as a pdf and tiff
tiff_exp <- paste0(sample, "clust_", declust, "clust_", declust2, "_DE_genes_dotplot.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
pdf_exp <- paste0(sample, "clust_", declust, "clust_", declust2, "_DE_genes_dotplot.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")

# Violin plots

p <- VlnPlot(data, features = plotgenes, cols = c25, pt.size = 0)
p

# Export as a pdf and tiff
tiff_exp <- paste0(sample, "clust_", declust, "clust_", declust2, "_DE_genes_violin.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
pdf_exp <- paste0(sample, "clust_", declust, "clust_", declust2, "_DE_genes_violin.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")

# Heatmap
p <- DoHeatmap(data, features = plotgenes, raster = T) + 
scale_fill_gradientn(colors = c("darkorange", "floralwhite", "dodgerblue4")) + 
theme(text = element_text(size = 16))
p

# Export as a pdf and tiff
tiff_exp <- paste0(sample, "clust_", declust, "clust_", declust2, "_DE_genes_heatmap.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
pdf_exp <- paste0(sample, "clust_", declust, "clust_", declust2, "_DE_genes_heatmap.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")

# You can go back to the start of this section and change declust <- and declust2 <- to compare different clusters.