Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents

...

  1. Open RStudio (you can type it in the Windows search bar)

  2. Create a new R script: ‘File’ → “New File” → “R script”

  3. Save this script where your samples folders are (‘File’ → ‘Save’). These should be on your H or W drive. Save the script file as scrnaseq.R

In the following sections you will be copying and running the R code into your scrnaseq.R script.

Cell Ranger (and nfcore/scrnaseq) generates a default folder and file output structure. There will be a main folder that contains all the sample subfolders (NOTE: this is where you must save your R script). Each sample folder will have an ‘outs’ subfolder. This ‘outs’ folder contains a ‘filtered_feature_bc_matrix’ folder, which contains the files that Seurat uses in its analysis.

...

You can manually set your working directory in RStudio by selecting ‘Session' -> 'Set working directory' -> 'Choose directory'. Choose the same directory as you saved your scrnaseq.R script, previous section. This will output the setwd(...) command with your working directory into the console window (bottom left panel). Copy this command to replace the default setwd(...) line in your R script.

...

Code Block
#### 2l. Remove non-target cells ####

# View your chosen markers
zm <- mat[row.names(mat) %in% markers, ]
as.matrix(zm)

# Then we can again see a count of cells that have 0 reads (thus 0 expression) for each marker. 
# The 'All markers' row indicates the number of cells that have zero expression in any of the provided markers. 
# These are the cells that will be filtered out from your dataset
a <- length(colnames(zm))
for (i in 1:length(markers)) {a <- c(a, sum(zm[i,] == 0))}
a <- c(a, length(colnames(zm)) - sum(apply(as.matrix(zm) == 0, 2, sum) == 0))
names(a) <- c("Total cells", markers, "All markers")
as.data.frame(a)

# Enter the markers that you want to use in the filtration.
marker_rem <- c("P2ry12", "Tmem119")

# Remove cells from main Seurat object that have zero expression for **any** of these markers.
zm <- mat[row.names(mat) %in% marker_rem, ]
# This line does a sum of every column and then outputs column where this = 0 (if any cell contains reads, this will at least = 1).
zm_1 <- as.matrix(zm)[, apply(as.matrix(zm) == 0, 2, sum) == 0]
# Then we can filter the Seurat object to contain just these cell (i.e. barcode) IDs
mat3_filt <- subset(mat3_filt, cells = colnames(zm_1))

mat3_filt

2m. Clustering by gene expression

This section examines clustering by gene expression similarity for each sample. PCA, t-SNE and UMAP plots are used to visualize the gene expression patterns and clusters.

This section is split into two subsections - “i) Choosing the correct resolution”, where you’ll use the clustree tool to identify the optimal number of clusters, and “ii) Calculate the clusters” where you’ll use the clustree results to generate optimised clusters, and then generate 'before and after' plots, to visualise how removing the non-target cells changed the data structure.

You can use this section to examine if your cell filtration had a meaningful effect on your data structure. If it didn't, you may want to choose a different set of markers or filtering parameters to filter with.

i) Choosing the correct resolution

A cluster represents a unique group of cells, based on gene expression patterns. But what consitutes 'unique'? When you calculate the clustering (using Seurat's FindClusters() function), it's important to use the correct resolution score to generate accurate, biologically meaningful clusters. Using a lower resolution score will generate fewer clusters (but you risk combining two clusters that should be distinct), a higher score will generate more clusters (but you risk falsely splitting a biologically relevant cluster of cells). Every single cell dataset is different (cell population similarity, sequencing depth, etc) and as such the optimal resolution score needs to be chosen for each dataset.

The package clustree generates a tree based on multiple resolution scores, which can help you in picking the optimal score.

Read the clustree manual to understand how to interpret the generated tree: https://cran.r-project.org/web/packages/clustree/vignettes/clustree.html

Code Block
#### 2m(i). Choosing the correct resolution ####

# Install and load the clustree package:
install.packages("clustree")
library(clustree)

# Then generate a range of clusters, from 0 to 1, at 0.1 increments ("resolution = seq(0, 1, 0.1)").
mat3_clust <- FindNeighbors(mat3_filt, dims = 1:10)
mat3_clust <- FindClusters(mat3_clust, resolution = seq(0, 1, 0.1), verbose = F)

# Convert the results into a Seurat object, which can be used as input into "clustree()"
clus_seurat <- CreateSeuratObject(counts = mat3_clust@assays$RNA@counts, meta.data = mat3_clust[[]])
clus_seurat[['TSNE']] <- CreateDimReducObject(embeddings = Embeddings(object = mat3_clust, reduction = "pca"), key = "tSNE_")

# Generate the tree. 
# Refer to the clustree manual for tips on how to use this tree to choose the optimal `resolution`
# https://cran.r-project.org/web/packages/clustree/vignettes/clustree.html
clustree(clus_seurat, prefix = "RNA_snn_res.") + scale_color_manual(values=c25) + scale_edge_color_continuous(low = "blue", high = "red")

ii) Calculate the clusters

Code Block
#### 2m(b). Calculate the clusters ####