Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents

...

  1. Open RStudio (you can type it in the Windows search bar)

  2. Create a new R script: ‘File’ → “New File” → “R script”

  3. Save this script where your samples folders are (‘File’ → ‘Save’). These should be on your H or W drive. Save the script file as scrnaseq.R

In the following sections you will be copying and running the R code into your scrnaseq.R script.

Cell Ranger (and nfcore/scrnaseq) generates a default folder and file output structure. There will be a main folder that contains all the sample subfolders (NOTE: this is where you must save your R script). Each sample folder will have an ‘outs’ subfolder. This ‘outs’ folder contains a ‘filtered_feature_bc_matrix’ folder, which contains the files that Seurat uses in its analysis.

...

You can manually set your working directory in RStudio by selecting ‘Session' -> 'Set working directory' -> 'Choose directory'. Choose the same directory as you saved your scrnaseq.R script, previous section. This will output the setwd(...) command with your working directory into the console window (bottom left panel). Copy this command to replace the default setwd(...) line in your R script.

...

There are other dimensionality reduction methods that are more commonly presented in singe cell papers, particularly Uniform Manifold Approximation and Projection (UMAP and t-distributed stochastic neighbour embedding (t-SNE) dimensionality reduction, so here we will generate and visualise dimensionality reduction from all 3 methods (PCA, UMAP, t-SNE).

Code Block
#### 2i. PCA, UMAP and t-SNE plots ####

# First we need to also run UMAP and tSNE dimensionality reduction and add these results to our main Seurat
object.

...

 object.
# As with PCA, these may take several minutes to run
mat3 <- RunUMAP(mat3, dims = 1:3, verbose = F)
mat3 <- RunTSNE(mat3, dims = 1:3, verbose = F)

...



# Now we can plot all 3 results. You can modify the code to alter the appearance of these plots as described in section 2h.

PCA

...




# PCA plot

p <- DimPlot(mat3, reduction = "pca", pt.size = 2, label = TRUE, label.size = 6, label.color = "white", cols = c("firebrick")) +
theme_bw() +
theme(legend.position="none", axis.title=element_text(size=16), axis.text=element_text(size=14))
p

# Export as a TIFF and PDF
tiff_exp <- paste0(sample, "_PCA_pre_filtration.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")

...


pdf_exp <- paste0(sample, "_PCA_pre_filtration.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")

...

UMAP

...


# UMAP

p <- DimPlot(mat3, reduction = "umap", pt.size = 2, label = TRUE, label.size = 8, label.color = "black", cols = c("firebrick")) +
theme_bw() +
theme(legend.position="none", axis.title=element_text(size=16), axis.text=element_text(size=14))
p

tiff_exp <- paste0(sample, "_umap_pre_filtration.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")

...

pdf_exp <- paste0(sample, "_umap_pre_filtration.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")

...

tSNE

...




# t-SNE

p <- DimPlot(mat3, reduction = "tsne", pt.size = 2, label = TRUE, label.size = 8, label.color = "black", cols = c("firebrick")) +
theme_bw() +
theme(legend.position="none", axis.title=element_text(size=16), axis.text=element_text(size=14))
p

tiff_exp <- paste0(sample, "_tsne_pre_filtration.tiff")
ggsave(file = tiff_exp, dpi = 300, compression = "lzw", device = "tiff", plot = p, width = 20, height = 20, units = "cm")

...


pdf_exp <- paste0(sample, "_tsne_pre_filtration.pdf")
ggsave(file = pdf_exp, device = "pdf", plot = p, width = 20, height = 20, units = "cm")

2j. Remove low quality or outlier cells

...

So in this section we can filter out cells that have very low and very high gene counts.

First you’ll need to visualise the spread of genes and reads using a violin plot and/or scatter plot.

You can then choose to filter out the top and bottom 'outliers' based on these violin and scatter plots, by entering a maximum and minimum nFeature_RNA below.

This max/min number can vary greatly depending on the sequencing depth of your samples and other factors. Use the violin plot to guide your decision.

For example, if your violin plot looked like the below figure, a reasonable nFeature_RNA maximum would be 4000 and a minimum of 200. This would remove outliers.

...

Code Block
#### 2j. Remove low quality or outlier cells ####

# Generate a violin plot.
VlnPlot(mat2, features = c("nFeature_RNA", "nCount_RNA"), ncol = 3)

# And a scatter plot
FeatureScatter(mat2, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")

# Use these plot to choose and enter a minimum (nFeature_RNA > xxx) and maxiumum (nFeature_RNA < xxxx) below
mat3_filt <- subset(mat3, subset = nFeature_RNA > 200 & nFeature_RNA < 4000)

2k. Visualise gene expression by marker