Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents

...

  1. Open RStudio (you can type it in the Windows search bar)

  2. Create a new R script: ‘File’ → “New File” → “R script”

  3. Save this script where your samples folders are (‘File’ → ‘Save’). These should be on your H or W drive. Save the script file as scrnaseq.R

In the following sections you will be copying and running the R code into your scrnaseq.R script.

Cell Ranger (and nfcore/scrnaseq) generates a default folder and file output structure. There will be a main folder that contains all the sample subfolders (NOTE: this is where you must save your R script). Each sample folder will have an ‘outs’ subfolder. This ‘outs’ folder contains a ‘filtered_feature_bc_matrix’ folder, which contains the files that Seurat uses in its analysis.

...

You can manually set your working directory in RStudio by selecting ‘Session' -> 'Set working directory' -> 'Choose directory'. Choose the same directory as you saved your scrnaseq.R script, previous section. This will output the setwd(...) command with your working directory into the console window (bottom left panel). Copy this command to replace the default setwd(...) line in your R script.

...

Code Block
#### 4c. PCA, UMAP and t-SNE plots (plotting dimensionality reduction data) ####

# PCA plot
p <- DimPlot(data1, reduction = "pca", group.by = 'orig.ident', pt.size = 2, cols = alpha(c25[1:length(samples)],0.5)) + 
  theme_bw() +
  theme(axis.title=element_text(size=16), axis.text=element_text(size=14))
p

# The above generates a combined plot.
# Alternatively, you can separate the samples into 'facets' in a single plot, by running the following:
# NOTE: run either the combined or the split plot. Exporting the plot only exports the last plot you generated.
p <- DimPlot(data1, reduction = "pca", split.by = 'orig.ident', group.by = 'orig.ident', pt.size = 2, cols = alpha(c25[1:length(samples)],0.5)) + 
  theme_bw() +
  theme(axis.title=element_text(size=16), axis.text=element_text(size=14))
p

# Export as a pdf and tiff
tiff_exp <- paste0(sample, "_PCA_pre_filtration.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
pdf_exp <- paste0(sample, "_PCA_pre_filtration.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")


# UMAP plot, samples combined
p <- DimPlot(data1, reduction = "umap", group.by = 'orig.ident', pt.size = 2, cols = alpha(c25[1:length(samples)], 0.5)) + 
  theme_bw() +
  theme(legend.position="none", axis.title=element_text(size=16), axis.text=element_text(size=14))
p

# Or as faceted plot (as above, with PCA)
p <- DimPlot(data1, reduction = "umap", split.by = 'orig.ident', group.by = 'orig.ident', pt.size = 2, cols = alpha(c25[1:length(samples)],0.5)) + 
  theme_bw() +
  theme(axis.title=element_text(size=16), axis.text=element_text(size=14))
p

# Export as a pdf and tiff
tiff_exp <- paste0(sample, "_umap_pre_filtration.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
pdf_exp <- paste0(sample, "_umap_pre_filtration.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")


# tSNE plot, samples combined
p <- DimPlot(data1, reduction = "tsne", group.by = 'orig.ident', pt.size = 2, cols = alpha(c25[1:length(samples)], 0.5)) + 
  theme_bw() +
  theme(legend.position="none", axis.title=element_text(size=16), axis.text=element_text(size=14))
p

# Or as faceted plot
p <- DimPlot(data1, reduction = "tsne", split.by = 'orig.ident', group.by = 'orig.ident', pt.size = 2, cols = alpha(c25[1:length(samples)],0.5)) + 
  theme_bw() +
  theme(axis.title=element_text(size=16), axis.text=element_text(size=14))
p

# Export as a pdf and tiff
tiff_exp <- paste0(sample, "_tsne_pre_filtration.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
pdf_exp <- paste0(sample, "_tsne_pre_filtration.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")

4d. Visualise gene expression by marker

This section allows you to produce a side-by-side visualisation of gene expression for your aggregated samples, for specific genes.

...

Code Block
#### 5a. Select a sample to work with and import sample data ####

# Find any sample datasets you generated by running section 3
dir(pattern = "seurat_filtered.rds")

# Choose the sample name you wish to work with from the list above (just the name, without the '_seurat_filtered.rds'):
## **USER INPUT**
sample <- "Cerebellum"

# If you want to analyse a different sample, simply come back to this section, change the sample name, then re-run the following sections.
# Now import the data file for that sample:
data <- readRDS(paste0(sample, "_seurat_filtered.rds"))

# See a summary of your data object:
data

5b. Explanation of differential expression strategies

Bulk RNA-Seq usually examines differentially expressed genes between two treatments or tissues. scRNA-Seq is somewhat different, in that it is based on heterogeneity of gene expression within a group of cells, so differential expression can also be examined between clusters.

Given that there are usually multiple clusters, and differential expression analysis requires a comparison between two groups, there are different strategies used to combine the two group analysis with the multiple cluster analysis. Three strategies included in this workflow include:

  1. Comparing a single cluster to all other clusters/cells (section 5c). This enables one to examine how gene expression in a single cluster differs from the entire dataset.

  2. Comparing one cluster to another cluster (section 5d). This requires two clusters being initially chosen, then examining how gene expression differs between them. Note that to do every possible combination of cluster comparisons may require running this section many times. E.g. if your sample has 8 clusters, this represents 7+6+5+4+3+2+1 = you would have to run that section 28 times to compare every cluster to every other cluster.

  3. Doing a batch comparison of each cluster to all other clusters/cells (section 5e). Seurat has a function (FindAllMarkers()) to accomplish this, but this can take some time to run, as multiple clusters are being compared at once, and it also produces a multilevel dataset. This is an alternative to running each cluster one at a time, as in section 5c.

IMPORTANT: the first cluster you choose in each analysis below represents the baseline or control group. Genes are either upregulated (i.e. positive log fold change) or downregulated (-ve lfc) in comparison to this baseline group. This means that any downregulated genes (i.e. negative log fold change) are more highly expressed in this baseline cluster, but any upregulated genes (+ lfc) are more highly expressed in the cluster(s) being compared to this baseline cluster.

5c. Differential expression: one cluster vs all other cells

In this section you'll select a single cluster, then compare gene expression in this cluster to all other combined clusters. Any upregulated genes (positive log fold change) are more highly expressed in the combined clusters.

Code Block
#### 5c. Differential expression: one cluster vs all other cells ####

# First, we should have a look at the clusters we have to work with:
levels(x = data)

# You'll need to select one of these clusters as the baseline cluster (`declust <-`). 
# You can re-run the below code with a different baseline cluster if you want to look at other comparisons:
## **USER INPUT**
declust <- 0

# Then run the Seurat differential expression function:
DE_genes <- FindMarkers(data, ident.1 = declust, logfc.threshold = 0.2)

# NOTE: modify the 'logfc.threshold =' parameter so that you've captured all DE genes.
# See section '4e. Differential expression between samples' for an explanation

# Look at the bottom 6 genes, to see if you've captured all DE genes. If not, lower 'logfc.threshold =' and re-run
tail(DE_genes)

# We can see how many genes this found by:
nrow(DE_genes)

# View the top 20 DE genes by:
head(DE_genes, 20)

# You can extract just the **significantly** differentially expressed genes (adjusted p < 0.05) like so:
DE_genes_sig <- DE_genes[DE_genes$p_val_adj < 0.05, ]

# See how many significantly DE genes there are
nrow(DE_genes_sig)

# You can also extract genes based on log fold change as well. Enter a log fold change threshold here (e.g. lfc_threshold <- 1):
# By default, this is zero, which will not apply any log fold change filtration
lfc_threshold <- 0

# Then filter your data by this log fold change threshold.
DE_genes_sig_lfc <- DE_genes_sig[DE_genes_sig$avg_log2FC < -lfc_threshold | DE_genes_sig$avg_log2FC > lfc_threshold, ]