Table of Contents

Now you’ll be able to browse and copy files between your virtual Windows machine and the HPC.

Installing R and RStudio

Seurat is We’ll be analysing our data in an R packageenvironment, so first we need to install R and RStudio on the rVDI machine. You will be copying and pasting script from this workflow into RStudio.

...

1c. Running nfcore/scrnaseq as a PBS script

2. Downstream analysis

...

Seurat is:

A toolkit for quality control, analysis, and exploration of single cell RNA sequencing data. 'Seurat' aims to enable users to identify and interpret sources of heterogeneity from single cell transcriptomic measurements, and to integrate diverse types of single cell data.

...

in R

We’ll be using various R packages to analyse our 10x single cell data. In this section we’ll create an R script in RStudio and install the required R packages.

2a. Open RStudio and create a new R script

...

Open RStudio (you can type it in the Windows search bar)
Create a new R script: ‘File’ → “New File” → “R script”
Save this script where your samples folders are (‘File’ → ‘Save’). These should be on your H or W drive. Save the script file as ‘scrnaseq.R’

In the following sections you will be copying and running the R code into your scrnaseq.R script.

Cell Ranger (and nfcore/scrnaseq) generates a default folder and file output structure. There will be a main folder that contains all the sample subfolders (NOTE: this is where you must save your R script). Each sample folder will have an ‘outs’ subfolder. This ‘outs’ folder contains a ‘filtered_feature_bc_matrix’ folder, which contains the files that Seurat uses in its analysis.

...

You can manually set your working directory in RStudio by selecting ‘Session' -> 'Set working directory' -> 'Choose directory'. Choose the same directory as you saved your scrnaseq.R script, previous section. This will output the setwd(...) command with your working directory into the console window (bottom left panel). Copy this command to replace the default setwd(...) line in your R script.

...

Code Block

#### 2d. Loading required packages ####

# This section needs to be run every time

# Load packages
bioconductor_packages <- c("clusterProfiler", "pathview", "AnnotationHub", "org.Mm.eg.db")
cran_packages <- c("Seurat", "patchwork", "ggplot2", "tidyverse", "viridis", "plyr", "readxl", "scales")
lapply(cran_packages, require, character.only = TRUE)
lapply(bioconductor_packages, require, character.only = TRUE)

2e. Select a sample to work with and import the data into R

...

3. Analysis using the seurat package

Seurat is:

A toolkit for quality control, analysis, and exploration of single cell RNA sequencing data. 'Seurat' aims to enable users to identify and interpret sources of heterogeneity from single cell transcriptomic measurements, and to integrate diverse types of single cell data.

In this section 3 we’ll be using a variety of Seurat functions to analyse our scRNA-Seq data.

3a. Select a sample to work with and import the data into R

In this section we’ll choose one of our samples to analyse, then import that sample’s scRNA-Seq dataset into R. You can re-run this, and following sections, for each sample dataset that you have. Just replace the sample name in sample <- "xxxxxx" below.

Code Block

#### 3a. Choose a sample to work with and import the data for that sample into R ####

# Give the sample name here that you want to work with.
## **USER INPUT**
sample <- "Cerebellum"
# To see the available samples (choose a sample name from this list):
list.dirs(full.names = F, recursive = F)

# Use Seurat's 'Read10X()' function to read in the full sample database. Cell Ranger creates 3 main database files that need to be combined into a single Seurat object.
# Note: these datasets can be very large and take several minutes to import into R. 
mat <- Read10X(data.dir = paste0(sample, "/outs/filtered_feature_bc_matrix"))

# Have a look at the top 10 rows and columns to see if the data has been imported correctly. You should see gene IDs as rows and barcodes (i.e. cells) as columns
as.matrix(mat[1:10, 1:10])

# Now convert this to a Seurat object. Again, this may take several minutes to load and use a lot of memory
mat2 <- CreateSeuratObject(counts = mat, project = sample)

# You can see a summary of the data by simply running the Seurat object name
mat2

# Set a colour palette that can contrast multiple clusters when you plot them. 
# You can change these colours as you like. 
# You can see what R colours are available here: http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf
c25 <- c("dodgerblue2", "#E31A1C", "green4",  "#6A3D9A", "#FF7F00", "black", "gold1", "skyblue2", "#FB9A99", "palegreen2", "#CAB2D6", "#FDBF6F", "gray70", "khaki2", "maroon", "orchid1", "deeppink1", "blue1", "steelblue4", "darkturquoise", "green1", "yellow4", "yellow3", "darkorange4", "brown")

...

3b. Identify markers in cells

Now we're going to identify individual markers that were present (i.e. were expressed) in our dataset.

IMPORTANT: the gene symbols you provide in the following section have to exactly match the gene symbols in your dataset (including capitalisation).
Gene symbols are more like 'common names' and can vary between databases.
Your main gene identifiers are Ensembl IDs and we need to find the gene symbols that match these Ensembl IDs.
For example, the gene P2ry12 is also called ADPG-R, BDPLT8, HORK3 and various other IDs, depending on the database it's listed in.
In the Ensemble database it's listed as P2ry12 (not P2RY12, remember, case matters) and matches Ensembl ID ENSMUSG00000036353.
For this reason it's advisable to first search the Ensembl website for your markers of interest and for your organism, to ensure you are providing gene symbols that match the Ensembl IDs.
https://asia.ensembl.org/Mus_musculus/Info/Index

Code Block

#### 2f3b. Identify markers in cells ####

# Create a vector called 'markers' that contains each of the markers you want to examine. 
# These should be gene symbols. Replace the gene symbols below with your target markers.
## **USER INPUT**
markers <- c("P2ry12", "Tmem119", "Itgam")

# You can see if the markers you provided are present:
sum(row.names(mat) %in% markers)
# If you input 3 markers and the output from the above code = 3, then all are present. 
# If the result is 2 then 2 of the 3 markers you provided are found in your data, etc.

## USER INPUT
# You can see if an individual marker is present like so (substitute for a marker of choice):
sum(row.names(mat) == "P2ry12")
# Outputs 1 if the marker is present, 0 if it isn't

# Pull out just the read counts for your defined markers
y <- mat[row.names(mat) %in% markers, ]

# Now we can count the number of cells containing zero transcripts for each of the examined markers. 
# This enables an examination of the number of cells that have zero expression for these markers,
# and therefore the number of cells that can be considered non-target cells.

# First count all cells
# Then make a loop to cycle through all markers (defined in previously created 'markers' vector)
a <- length(colnames(y))
for (i in 1:length(markers)) {
  
  a <- c(a, sum(y[i,] == 0))
  
}
# Do a sum of the columns
y2 <- colSums(y)
# See if any zeros. If so, these cells are not target cells (as determined by absence of any target cell markers)
count <- c(a, sum(y2 == 0))
# Name the vector elements
names(count) <- c("Total_cells", markers, "All_zero")
# Generate the table
as.data.frame(count)

# The generated table shows:
# 1) the total number of cells for your sample ("Total_cells" row), 
# 2) the number of cells which had 0 expression for each marker, and,
# 3) the number of cell that had zero expression for all of the markers you provided ("All_zero" row).

...

3c. Processing expression data (dimensionality reduction)

There are a variety of methods to visualise expression in single cell data. The most commonly used methods - PCA, t-SNE and UMAP - involve 'dimensionality', i.e. converting expression to x-n dimensions (which can then be plotted) based on gene expression per cell.

...

The first 4 steps are completed in the code below (this may take a few minutes to run)

Code Block

#### 2g3c. Processing expression data (dimensionality reduction) ####

# Normalise data
mat3 <- NormalizeData(mat2)

# Identification of variable features
mat3 <- FindVariableFeatures(mat3, selection.method = "vst", nfeatures = nrow(mat3))

# Scaling the data
all.genes <- rownames(mat3)
mat3 <- ScaleData(mat3, features = all.genes)

# Perform linear dimensional reduction (PCA. This may take several minutes to run)
mat3 <- RunPCA(mat3, features = VariableFeatures(object = mat3))

...

3d. Plot of highly variable genes

Using the FindVariableFeatures results, we can visualise the most highly variable genes, including a count of variable and non variable genes in your dataset. The below code outputs the top 10 genes, but you can adjust this number as desired (i.e. in top_genes <- head(VariableFeatures(mat3), 10) change 10 to another number).

...

Seurat plots are based on the ggplot package. There are a multitude of other modifications you can make to a ggplot, too many to describe in this notebook. But there are plenty of online guides on how to modify ggplot plots. Here's an example: http://www.sthda.com/english/wiki/be-awesome-in-ggplot2-a-practical-guide-to-be-highly-effective-r-software-and-data-visualization

Code Block

#### 2h3d. Plot of highly variable genes ####

# Identify the 10 most highly variable genes
top_genes <- head(VariableFeatures(mat3), 10)

# plot variable features with labels
p <- VariableFeaturePlot(mat3, pt.size = 2, cols = c("black", "firebrick"))
p <- LabelPoints(plot = p, points = top_genes, repel = TRUE) +
theme_bw() +
theme(text = element_text(size = 17))
p

# You can save your plot as a 300dpi (i.e. publication quality) tiff or pdf file. 
# These files can be found in your working directory.
# You can adjust the width and height of the saved images by changing width = and height = in the code below.

# Export as a 300dpi tiff
tiff_exp <- paste0(sample, "_top_genes.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")

# Export as a pdf
pdf_exp <- paste0(sample, "_top_genes.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")

...

3e. PCA, UMAP and t-SNE plots (plotting dimensionality reduction data)

In the section 2g 3c we ran dimensionality reduction based on Principal Component Analysis (PCA).

There are other dimensionality reduction methods that are more commonly presented in singe cell papers, particularly Uniform Manifold Approximation and Projection (UMAP and t-distributed stochastic neighbour embedding (t-SNE) dimensionality reduction, so here we will generate and visualise dimensionality reduction from all 3 methods (PCA, UMAP, t-SNE).

Code Block

#### 2i3e. PCA, UMAP and t-SNE plots ####

# First we need to also run UMAP and tSNE dimensionality reduction and add these results to our main Seurat object.
# As with PCA, these may take several minutes to run
mat3 <- RunUMAP(mat3, dims = 1:3, verbose = F)
mat3 <- RunTSNE(mat3, dims = 1:3, verbose = F)

# Now we can plot all 3 results. You can modify the code to alter the appearance of these plots as described in section 2h.


# PCA plot

p <- DimPlot(mat3, reduction = "pca", pt.size = 2, label = TRUE, label.size = 6, label.color = "white", cols = c("firebrick")) +
theme_bw() +
theme(legend.position="none", axis.title=element_text(size=16), axis.text=element_text(size=14))
p

# Export as a TIFF and PDF
tiff_exp <- paste0(sample, "_PCA_pre_filtration.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
pdf_exp <- paste0(sample, "_PCA_pre_filtration.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")


# UMAP

p <- DimPlot(mat3, reduction = "umap", pt.size = 2, label = TRUE, label.size = 8, label.color = "black", cols = c("firebrick")) +
theme_bw() +
theme(legend.position="none", axis.title=element_text(size=16), axis.text=element_text(size=14))
p

tiff_exp <- paste0(sample, "_umap_pre_filtration.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
pdf_exp <- paste0(sample, "_umap_pre_filtration.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")


# t-SNE

p <- DimPlot(mat3, reduction = "tsne", pt.size = 2, label = TRUE, label.size = 8, label.color = "black", cols = c("firebrick")) +
theme_bw() +
theme(legend.position="none", axis.title=element_text(size=16), axis.text=element_text(size=14))
p

tiff_exp <- paste0(sample, "_tsne_pre_filtration.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
pdf_exp <- paste0(sample, "_tsne_pre_filtration.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")

...

3f. Remove low quality or outlier cells

From the Seurat website:

Seurat allows you to easily explore QC metrics and filter cells based on any user-defined criteria. A few QC metrics commonly used by the community include

...

For example, if your violin plot looked like the below figure, a reasonable nFeature_RNA maximum would be 4000 and a minimum of 200. This would remove outliers.

...

Code Block

#### 2j3f. Remove low quality or outlier cells ####

# Generate a violin plot.
VlnPlot(mat2, features = c("nFeature_RNA", "nCount_RNA"), ncol = 3)

# And a scatter plot
FeatureScatter(mat2, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")

# Use these plot to choose and enter a minimum (nFeature_RNA > xxx) and maxiumum (nFeature_RNA < xxxx) below
mat3_filt <- subset(mat3, subset = nFeature_RNA > 200 & nFeature_RNA < 4000)

...

3g. Visualise gene expression by marker

In the section 2i 3e (plotting dimensionality reduction data) we visualised total expression, i.e. all genes within each cell.

...

In addition to the dimensionality reduction plots, you can visualise expression for your selected markers with a heatmap (note: this requires at least 2 markers, preferably several, to be visually meaningful). You can change the colour range as you like by providing a high, centre and low colour (scale_fill_gradientn(colors = c("darkorange", "floralwhite", "dodgerblue4")))

Code Block

#### 2k3g. Visualise gene expression by marker ####

# PCA

p <- FeaturePlot(mat3_filt, features = markers, reduction = "pca", cols = c("red", "lightgrey"), pt.size = 1)
p

# Export as a 300dpi tiff
tiff_exp <- paste0(sample, "_pca_markers.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
# Export as a pdf
pdf_exp <- paste0(sample, "_pca_markers.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")


# UMAP

p <- FeaturePlot(mat3, features = markers, reduction = "umap", cols = c("red", "lightgrey"), pt.size = 1)
p

tiff_exp <- paste0(sample, "_umap_markers.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
pdf_exp <- paste0(sample, "_umap_markers.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")


# t-SNE

p <- FeaturePlot(mat3, features = markers, reduction = "tsne", cols = c("red", "lightgrey"), pt.size = 1)
p

tiff_exp <- paste0(sample, "_tsne_markers.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
pdf_exp <- paste0(sample, "_tsne_markers.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")


# Heatmap

p <- DoHeatmap(mat3, features = markers, group.bar = FALSE) + 
scale_fill_gradientn(colors = c("darkorange", "floralwhite", "dodgerblue4")) + 
theme(text = element_text(size = 16))
p

tiff_exp <- paste0(sample, "_hmap_markers.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
pdf_exp <- paste0(sample, "_hmap_markers.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")


# Clustered heatmap

# You can also cluster the cells and then plot a heatmap based on this cluster data, to see if expression of your chosen markers is related to clusters

# First, generate the clusters for your sample
# Choose the `resolution =` score carefully (between 0 and 1). A lower score means fewer clusters. 
# You can adjust this higher or lower to see how it affects your clusters.
mat3 <- FindNeighbors(mat3, dims = 1:10)
mat3 <- FindClusters(mat3, resolution = 0.5)

# Then generate the heatmap
p <- DoHeatmap(mat3, features = markers, raster = T) + 
scale_fill_gradientn(colors = c("darkorange", "floralwhite", "dodgerblue4")) + 
theme(text = element_text(size = 16)) + labs(color = "Dose (mg)")
p

tiff_exp <- paste0(sample, "_hmap_markers_clustered.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")

pdf_exp <- paste0(sample, "_hmap_markers_clustered.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")

...

3h. Remove non-target cells

In this section we will remove any 'non-target' cells from our dataset. 'Non-target' cells are defined as those with 0 reads (i.e. 0 expression) for our marker(s) of choice.

If you don't want to remove any cells based on expression of specific markers, skip this section

Code Block

#### 2l3h. Remove non-target cells ####

# View your chosen markers
zm <- mat[row.names(mat) %in% markers, ]
as.matrix(zm)

# Then we can again see a count of cells that have 0 reads (thus 0 expression) for each marker. 
# The 'All markers' row indicates the number of cells that have zero expression in any of the provided markers. 
# These are the cells that will be filtered out from your dataset
a <- length(colnames(zm))
for (i in 1:length(markers)) {a <- c(a, sum(zm[i,] == 0))}
a <- c(a, length(colnames(zm)) - sum(apply(as.matrix(zm) == 0, 2, sum) == 0))
names(a) <- c("Total cells", markers, "All markers")
as.data.frame(a)

# Enter the markers that you want to use in the filtration.
marker_rem <- c("P2ry12", "Tmem119")

# Remove cells from main Seurat object that have zero expression for **any** of these markers.
zm <- mat[row.names(mat) %in% marker_rem, ]
# This line does a sum of every column and then outputs column where this = 0 (if any cell contains reads, this will at least = 1).
zm_1 <- as.matrix(zm)[, apply(as.matrix(zm) == 0, 2, sum) == 0]
# Then we can filter the Seurat object to contain just these cell (i.e. barcode) IDs
mat3_filt <- subset(mat3_filt, cells = colnames(zm_1))

mat3_filt

...

3i. Clustering by gene expression

This section examines clustering by gene expression similarity for each sample. PCA, t-SNE and UMAP plots are used to visualize the gene expression patterns and clusters.

This section is split into three subsections - “i“1) Choosing the correct resolution”, where you’ll use the clustree tool to identify the optimal number of clusters, and “ii“2) Calculate the clusters” where you’ll use the clustree results to generate optimised clusters, and “iii“3) Plot the results“, where you’ll generate 'before and after' plots, to visualise how filtration (inc. removing the non-target cells) changed the data structure.

You can use this section to examine if your cell filtration had a meaningful effect on your data structure. If it didn't, you may want to choose a different set of markers or filtering parameters to filter with.

...

1) Choosing the correct resolution

A cluster represents a unique group of cells, based on gene expression patterns. But what consitutes 'unique'? When you calculate the clustering (using Seurat's FindClusters() function), it's important to use the correct resolution score to generate accurate, biologically meaningful clusters. Using a lower resolution score will generate fewer clusters (but you risk combining two clusters that should be distinct), a higher score will generate more clusters (but you risk falsely splitting a biologically relevant cluster of cells). Every single cell dataset is different (cell population similarity, sequencing depth, etc) and as such the optimal resolution score needs to be chosen for each dataset.

...

Read the clustree manual to understand how to interpret the generated tree: https://cran.r-project.org/web/packages/clustree/vignettes/clustree.html

Code Block

#### 2m(i). Choosing the correct resolution ####

# Install and load the clustree package:
install.packages("clustree")
library(clustree)

# Then generate a range of clusters, from 0 to 1, at 0.1 increments ("resolution = seq(0, 1, 0.1)").
mat3_clust <- FindNeighbors(mat3_filt, dims = 1:10)
mat3_clust <- FindClusters(mat3_clust, resolution = seq(0, 1, 0.1), verbose = F)

# Convert the results into a Seurat object, which can be used as input into "clustree()"
clus_seurat <- CreateSeuratObject(counts = mat3_clust@assays$RNA$counts, meta.data = mat3_clust[[]])
clus_seurat[['TSNE']] <- CreateDimReducObject(embeddings = Embeddings(object = mat3_clust, reduction = "pca"), key = "tSNE_")

# Generate the tree. 
# Refer to the clustree manual for tips on how to use this tree to choose the optimal `resolution`
# https://cran.r-project.org/web/packages/clustree/vignettes/clustree.html
clustree(clus_seurat, prefix = "RNA_snn_res.") + scale_color_manual(values=c25) + scale_edge_color_continuous(low = "blue", high = "red")

ii) Calculate the clusters

In this subsection we’ll re-cluster the data based on the clustree results.

Choose the the resolution = xx score in FindClusters() to the optimal resolution that we can see in the clustree plot (e.g. 0.5, or 0.6, etc.).

Code Block

#### 2m(ii). Calculate the clusters ####

# First you need to re-run the variable feature calculation, scaling, dimensionality reduction (PCA, t-SNE and UMAP). 
# This may take several minutes to run.
mat3_filt <- FindVariableFeatures(mat3_filt, selection.method = "vst", nfeatures = nrow(mat3_filt))
all.genes <- rownames(mat3_filt)
mat3_filt <- ScaleData(mat3_filt, features = all.genes)
mat3_filt <- RunPCA(mat3_filt, dims = 1:3, verbose = F)
mat3_filt <- RunUMAP(mat3_filt, dims = 1:3, verbose = F)
mat3_filt <- RunTSNE(mat3_filt, dims = 1:3, verbose = F)

# Then you generate a 'nearest neighbor' graph by calculating the neighborhood overlap (Jaccard index) 
# between every cell and identify clusters of cells based on shared nearest neighbor (SNN).
mat3_filt <- FindNeighbors(mat3_filt, dims = 1:10)
## **USER INPUT** Enter a resolution based off the clustree plot
mat3_filt <- FindClusters(mat3_filt, resolution = 0.6)

# You can see how many cells there are per cluster like so:
cellcount <- as.data.frame(table(mat3_filt@meta.data[4]))
names(cellcount) <- c("Cluster", "Cell_count")
cellcount

iii) Plot the results

Here we’ll plot ‘before and after’ filtration plots, to examine how filtration affected the data structure.

We’ll plot all clusters, for PCA, t-SNE and UMAP, and you can also choose an individual cluster to plot for any of these dimensionality reduction plot types.

Code Block#### 2m(iii). Plot the results #### ## PCA Plot.r-project.org/web/packages/clustree/vignettes/clustree.html

Code Block

#### 3i(1). Choosing the correct resolution ####

# Install and load the clustree package:
install.packages("clustree")
library(clustree)

# Then generate a range of clusters, from 0 to 1, at 0.1 increments ("resolution = seq(0, 1, 0.1)").
mat3_clust <- FindNeighbors(mat3_filt, dims = 1:10)
mat3_clust <- FindClusters(mat3_clust, resolution = seq(0, 1, 0.1), verbose = F)

# Convert the results into a Seurat object, which can be used as input into "clustree()"
clus_seurat <- CreateSeuratObject(counts = mat3_clust@assays$RNA$counts, meta.data = mat3_clust[[]])
clus_seurat[['TSNE']] <- CreateDimReducObject(embeddings = Embeddings(object = mat3_clust, reduction = "pca"), key = "tSNE_")

# Generate the tree. 
# Refer to the clustree manual for tips on how to use this tree to choose the optimal `resolution`
# https://cran.r-project.org/web/packages/clustree/vignettes/clustree.html
clustree(clus_seurat, prefix = "RNA_snn_res.") + scale_color_manual(values=c25) + scale_edge_color_continuous(low = "blue", high = "red")

2) Calculate the clusters

In this subsection we’ll re-cluster the data based on the clustree results.

Choose the the resolution = xx score in FindClusters() to the optimal resolution that we can see in the clustree plot (e.g. 0.5, or 0.6, etc.).

Code Block

#### 3i(2). Calculate the clusters ####

# First you need to re-run the variable feature calculation, scaling, dimensionality reduction (PCA, t-SNE and UMAP). 
# This may take several minutes to run.
mat3_filt <- FindVariableFeatures(mat3_filt, selection.method = "vst", nfeatures = nrow(mat3_filt))
all.genes <- rownames(mat3_filt)
mat3_filt <- ScaleData(mat3_filt, features = all.genes)
mat3_filt <- RunPCA(mat3_filt, dims = 1:3, verbose = F)
mat3_filt <- RunUMAP(mat3_filt, dims = 1:3, verbose = F)
mat3_filt <- RunTSNE(mat3_filt, dims = 1:3, verbose = F)

# Then you generate a 'nearest neighbor' graph by calculating the neighborhood overlap (Jaccard index) 
# between every cell and identify clusters of cells based on shared nearest neighbor (SNN).
mat3_filt <- FindNeighbors(mat3_filt, dims = 1:10)
## **USER INPUT** Enter a resolution based off the clustree plot
mat3_filt <- FindClusters(mat3_filt, resolution = 0.6)

# You can see how many cells there are per cluster like so:
cellcount <- as.data.frame(table(mat3_filt@meta.data[4]))
names(cellcount) <- c("Cluster", "Cell_count")
cellcount

3) Plot the results

Here we’ll plot ‘before and after’ filtration plots, to examine how filtration affected the data structure.

We’ll plot all clusters, for PCA, t-SNE and UMAP, and you can also choose an individual cluster to plot for any of these dimensionality reduction plot types.

Code Block

#### 3i(3). Plot the results ####

## PCA Plot ##

# Before filtration:
p <- DimPlot(mat3, reduction = "pca", pt.size = 2, cols = c25) + 
theme_bw() +
theme(legend.title=element_text(size=14), legend.text = element_text(size = 14), axis.title=element_text(size=18), axis.text=element_text(size=14)) + labs(color="Cluster")
p

# After filtration:
p <- DimPlot(mat3_filt, reduction = "pca", pt.size = 2, cols = c25) + 
theme_bw() +
theme(legend.title=element_text(size=14), legend.text = element_text(size = 14), axis.title=element_text(size=18), axis.text=element_text(size=14)) + labs(color="Cluster")
p

# Export as pdf and tiff
tiff_exp <- paste0(sample, "_pca_filtered.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
pdf_exp <- paste0(sample, "_pca_filtered.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")

## PCA: Plot one individual cluster ##

# Here you can visualise a single cluster by colouring it a specific colour.

# First define the colours, based on the cluster information in your seurat data.
# You can change the background colours ("gray70") and the highlighted cluster colour ("red") to whatever you like.
newcols <- rep("gray70", length(levels(mat3_filt))-1)
newcols <- c(newcols, "red")

# Enter which cluster you want to visualise in the `myclust` variable below.
# This is based on the cluster numbers in the previous plot.
## **USER INPUT**
myclust <- 3

# Plot the cluster, placing this cluster on top (the `order` parameter).
p <- DimPlot(mat3_filt, reduction = "pca", pt.size = 2, cols = newcols, order = myclust) + 
theme_bw() +
theme(legend.title=element_text(size=14), legend.text = element_text(size = 14), axis.title=element_text(size=18), axis.text=element_text(size=14)) + labs(color="Cluster")
p

# Export as a pdf and tiff
tiff_exp <- paste0(sample, "_clust_", myclust, "_pca_filtered.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
pdf_exp <- paste0(sample, "_clust_", myclust, "_pca_filtered.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")


## UMAP ##

# Before filtration:
p <- DimPlot(mat3, reduction = "pcaumap", pt.size = 2, cols = c25) + 
theme_bw() +
theme(legend.title=element_text(size=14), legend.text = element_text(size = 14), axis.title=element_text(size=1816), axis.text=element_text(size=14)) + labs(color="Cluster")
p

# After filtration:
p <- DimPlot(mat3_filt, reduction = "pcaumap", pt.size = 2, cols = c25) + 
theme_bw() +
theme(legend.title=element_text(size=14), legend.text = element_text(size = 14), axis.title=element_text(size=18), axis.text=element_text(size=14)) + labs(color="Cluster")
p

# Export as a pdf and tiff
tiff_exp <- paste0(sample, "_pcaumap_filtered.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
pdf_exp <- paste0(sample, "_pcaumap_filtered.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")

## PCA: Plot one individual cluster ##

# Here you can visualise a single cluster by colouring it a specific colour.

# First define the colours, based on the cluster information in your seurat data.
# You can change the background colours ("gray70") and the highlighted cluster colour ("red") to whatever you like. 20, height = 20, units = "cm")

## UMAP: Plot individual clusters ##

# Select your colours
newcols <- rep("gray70", length(levels(mat3_filt))-1)
newcols <- c(newcols, "red")

# EnterSelect whichyour cluster youand wantthen to visualise in the `myclust` variable below.
# This is based on the cluster numbers in the previous plot.
plot it
## **USER INPUT**
myclust <- 3

# Plot the cluster, placing this cluster on top (the `order` parameter). 3
p <- DimPlot(mat3_filt, reduction = "pcaumap", pt.size = 2, cols = newcols, order = myclust) + 
theme_bw() +
theme(legend.title=element_text(size=14), legend.text = element_text(size = 14), axis.title=element_text(size=18), axis.text=element_text(size=14)) + labs(color="Cluster")
p

# Export as a pdf and tiff
tiff_exp <- paste0(sample, "_clust_", myclust, "_pcaumap_filtered.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
pdf_exp <- paste0(sample, "_clust_", myclust, "_pcaumap_filtered.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")


## UMAPt_SNE ##

# Before filtration:
p <- DimPlot(mat3, reduction = "umaptsne", pt.size = 2, cols = c25) + 
theme_bw() +
theme(legend.title=element_text(size=14), legend.text = element_text(size = 14), axis.title=element_text(size=1618), axis.text=element_text(size=14)) + labs(color="Cluster")
p

# After filtration:
p <- DimPlot(mat3_filt, reduction = "umaptsne", pt.size = 2, cols = c25) + 
theme_bw() +
theme(legend.title=element_text(size=14), legend.text = element_text(size = 14), axis.title=element_text(size=18), axis.text=element_text(size=14)) + labs(color="Cluster")
p

# Export as a pdf and tiff
tiff_exp <- paste0(sample, "_umaptsne_filtered.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
pdf_exp <- paste0(sample, "_umaptsne_filtered.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")

## UMAPt_SNE: Plot individual clusters ##

# Select your colours
newcols <- rep("gray70", length(levels(mat3_filt))-1)
newcols <- c(newcols, "red")

# Select your cluster and then plot it
## **USER INPUT**
.
myclust <- 3
p <- DimPlot(mat3_filt, reduction = "umaptsne", pt.size = 2, cols = newcols, order = myclust) + 
theme_bw() +
theme(legend.title=element_text(size=14), legend.text = element_text(size = 14), axis.title=element_text(size=18), axis.text=element_text(size=14)) + labs(color="Cluster")
p

# Export as a pdf and tiff
tiff_exp <- paste0(sample, "_clust_", myclust, "_umaptsne_filtered.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
pdf_exp <- paste0(sample, "_clust_", myclust, "_umap_filtered.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")


## t_SNE ##

# Before filtration:
p pdf_exp <- DimPlotpaste0(mat3sample, reduction = "tsne_clust_", pt.size = 2, cols = c25) + 
theme_bw() +
theme(legend.title=element_text(size=14), legend.text = element_text(size = 14), axis.title=element_text(size=18), axis.text=element_text(size=14)) + labs(color="Cluster")
p

# Aftermyclust, "_tsne_filtered.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")


## Clusters - Heatmap of selected markers ##

# Recall which markers you previously defined:
markers

# Before filtration:
p <- DimPlotDoHeatmap(mat3_filt, reductionfeatures = "tsne"markers, pt.sizeraster = 2, cols = c25T) + 
themescale_fill_bwgradientn()colors += theme(legend.title=element_text(size=14), legend.text = element_text(size = 14), axis.title=element_text(size=18), axis.text=c("darkorange", "floralwhite", "dodgerblue4")) + 
theme(text = element_text(size =14 16)) + labs(color = "ClusterDose (mg)")
p

# Export as a pdf and tiff
tiff_exp <- paste0(sample, "_tsne_filtered.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
pdf After filtration:
p <- DoHeatmap(mat3_filt, features = markers, raster = T) + 
scale_fill_gradientn(colors = c("darkorange", "floralwhite", "dodgerblue4")) + 
theme(text = element_text(size = 16)) + labs(color = "Dose (mg)")
p

# Export as a pdf and tiff
tiff_exp <- paste0(sample, "_tsnehmap_filtered.pdftiff")
ggsave(file = pdftiff_exp, dpi = 300, compression = "lzw", device = "pdftiff", plot = p, width = 20, height = 20, units = "cm")

## t_SNE: Plot individual clusters ##

# Select your colours
newcols <- rep("gray70", length(levels(mat3_filt))-1)
newcols)
pdf_exp <- cpaste0(newcolssample, "red_hmap_filtered.pdf")
ggsave(file # Select your cluster and then plot it.
myclust <- 3
p <- DimPlot(mat3_filt, reduction = "tsne", pt.size = 2, cols = newcols, order = myclust) + 
theme_bw() +
theme(legend.title=element_text(size=14), legend.text = element_text(size = 14), axis.title=element_text(size=18), axis.text=element_text(size=14)) + labs(color="Cluster")
p

# Export as a pdf and tiff
tiff_exp <- paste0(sample, "_clust_", myclust, "_tsne_filtered.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
pdf= pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")

3j. Plot marker expression before and after filtration

Then we can see how the expression of specific markers changed before and after filtration.

Code Block

#### 3j. Plot marker expression before and after filtration ####

# Select your set of markers (you could re-enter the same ones you used earlier, or use a different set):
## **USER INPUT**
markers <- c("P2ry12", "Tmem119")

# Select which type of plot you want to generate ("pca", "umap" or "tsne"):
## **USER INPUT**
redplot <- "pca"

## Before filtration ##

p <- FeaturePlot(mat3, features = markers, reduction = redplot, cols = c("lightgrey", "red"), pt.size = 1)
p

# Export as a pdf and tiff
tiff_exp <- paste0(sample, "_clust_", myclustredplot, "_tsne_filteredmarkers.pdftiff")
ggsave(file = pdftiff_exp, dpi = 300, compression = "lzw", device = "pdftiff", plot = p, width = 20, height = 20, units = "cm")


## Clusters - Heatmap of selected markers ##

# Recall which markers you previously defined:
markers

# Before filtration:
p pdf_exp <- DoHeatmappaste0(mat3sample, features = markers, raster = T) + 
scale_fill_gradientn(colors = c("darkorange", "floralwhite", "dodgerblue4")) + 
theme(text = element_text(size = 16)) + labs(color = "Dose (mg)")
p

# After filtration:"_", redplot, "_markers.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")


## After filtration ##

p <- DoHeatmapFeaturePlot(mat3_filt, features = markers, raster = T) + 
scale_fill_gradientn(colors = c("darkorange", "floralwhite", "dodgerblue4")) + 
theme(text = element_text(size = 16)) + labs(color = "Dose (mg)"reduction = redplot, cols = c("lightgrey", "red"), pt.size = 1)
p

# Export as a pdf and tiff
tiff_exp <- paste0(sample, "_hmap", redplot, "_markers_filtered.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")
pdf_exp <- paste0(sample, "_", redplot, "_hmapmarkers_filtered.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")

2o. Output filtered results

Now will export the filtered dataset (non-target cells removed, other filtration applied), for analysis in the next sections of this analysis workflow.

You need to run this entire '5. Filtering cells using markers' Notebook once for every sample you have, because the other sections rely on the output created here for each sample.

Version	Old Version 46	New Version 47
Changes made by	Paul Whatmore (Deactivated)	Paul Whatmore (Deactivated)
Saved on	Feb 20, 2024	Feb 20, 2024

Versions Compared

Key

Installing R and RStudio

1c. Running nfcore/scrnaseq as a PBS script

2. Downstream analysis

in R

2a. Open RStudio and create a new R script

2e. Select a sample to work with and import the data into R

3. Analysis using the seurat package

3a. Select a sample to work with and import the data into R

3b. Identify markers in cells

3c. Processing expression data (dimensionality reduction)

3d. Plot of highly variable genes

3e. PCA, UMAP and t-SNE plots (plotting dimensionality reduction data)

3f. Remove low quality or outlier cells

3g. Visualise gene expression by marker

3h. Remove non-target cells

3i. Clustering by gene expression

1) Choosing the correct resolution

ii) Calculate the clusters

iii) Plot the results

2) Calculate the clusters

3) Plot the results

3j. Plot marker expression before and after filtration

2o. Output filtered results

Content Comparison

Versions Compared

Key

Installing R and RStudio

1c. Running nfcore/scrnaseq as a PBS script

2. Downstream analysis

in R

2a. Open RStudio and create a new R script

2e. Select a sample to work with and import the data into R

3. Analysis using the seurat package

3a. Select a sample to work with and import the data into R

3b. Identify markers in cells

3c. Processing expression data (dimensionality reduction)

3d. Plot of highly variable genes

3e. PCA, UMAP and t-SNE plots (plotting dimensionality reduction data)

3f. Remove low quality or outlier cells

3g. Visualise gene expression by marker

3h. Remove non-target cells

3i. Clustering by gene expression

1) Choosing the correct resolution

ii) Calculate the clusters

iii) Plot the results

2) Calculate the clusters

3) Plot the results

3j. Plot marker expression before and after filtration

2o. Output filtered results