Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents

...

It is also an R package, so we will be using RStudio (which you installed earlier) to run the analysis script.

2a. Open RStudio and create a new R script

RStudio is a GUI (graphical user interface) for R. It makes navigating R easier.

  1. Open RStudio (you can type it in the Windows search bar)

  2. Create a new R script: ‘File’ → “New File” → “R script”

  3. Save this script where your samples folders are (‘File’ → ‘Save’). These should be on your H or W drive. Save the script file as scrnaseq.R

In the following sections you will be copying and running the R code into your scrnaseq.R script.

Cell Ranger (and nfcore/scrnaseq) generates a default folder and file output structure. There will be a main folder that contains all the sample subfolders (NOTE: this is where you must save your R script). Each sample folder will have an ‘outs’ subfolder. This ‘outs’ folder contains a ‘filtered_feature_bc_matrix’ folder, which contains the files that Seurat uses in its analysis.

2b. Set your working directory

In R, your working directory is where your data files are read in to R from and where any output files are deposited. For our purposes we need to set the working directory to the location on the HPC where your scRNASeq dataset is.

...

You can manually set your working directory in RStudio by selecting ‘Session' -> 'Set working directory' -> 'Choose directory'. Choose the same directory as you saved your scrnaseq.R script, previous section. This will output the setwd(...) command with your working directory into the console window (bottom left panel). Copy this command to replace the default setwd(...) line in your R script.

...

Code Block
####  Set your working directory ####

# Change the below to the directory that contains your sample folders (you may have to browse H or W drive to find this)
# **USER INPUT**
setwd("H:/sam_dando/dataset1/count")

# You can see the sample subdirectories by:
list.dirs(full.names = F, recursive = F)
# You should see directories that are names after your samples. 
# If you don't see this, browse through your H or W drives to find the correct path for your sample directories.

2c. Installing packages

This will install all the required packages and dependencies and may take 30 minutes or more to complete. It may prompt you occasionally to update packages - select 'a' for all if/when this occurs.

Code Block
#### Installing required packages ####

# This section only needs to be run once on a computer. 
# One the packages are installed, they need to be loaded every time they will be used (next section)

# Create vector of required package names
bioconductor_packages <- c("clusterProfiler", "pathview", "AnnotationHub", "org.Mm.eg.db")
cran_packages <- c("Seurat", "patchwork", "ggplot2", "tidyverse", "viridis", "plyr", "readxl", "scales")

# Compares installed packages to above packages and returns a vector of missing packages
new_packages <- bioconductor_packages[!(bioconductor_packages %in% installed.packages()[,"Package"])]
new_cran_packages <- cran_packages[!(cran_packages %in% installed.packages()[,"Package"])]

# Install missing bioconductor packages
if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")
BiocManager::install(new_packages)

# Install missing cran packages
if (length(new_cran_packages)) install.packages(new_cran_packages, repos = "http://cran.us.r-project.org")

# Update all installed packages to the latest version
update.packages(bioconductor_packages, ask = FALSE)
update.packages(cran_packages, ask = FALSE, repos = "http://cran.us.r-project.org")

2d. Loading packages

Code Block
#### Loading required packages ####

# This section needs to be run every time

# Load packages
bioconductor_packages <- c("clusterProfiler", "pathview", "AnnotationHub", "org.Mm.eg.db")
cran_packages <- c("Seurat", "patchwork", "ggplot2", "tidyverse", "viridis", "plyr", "readxl", "scales")
lapply(cran_packages, require, character.only = TRUE)
lapply(bioconductor_packages, require, character.only = TRUE)

 

2e. Select a sample to work with and import the data into R

Code Block
#### Choose a sample to work with and import the data for that sample into R ####

# Give the sample name here that you want to work with.
## **USER INPUT**
sample <- "Cerebellum"
# To see the available samples:
list.dirs(full.names = F, recursive = F)

# Use Seurat's 'Read10X()' function to read in the full sample database. Cell Ranger creates 3 main database files that need to be combined into a single Seurat object.
# Note: these datasets can be very large and take several minutes to import into R. They also can use a lot of memory, so make sure your computer is up to the job (i.e. has at least 16GB of RAM)
mat <- Read10X(data.dir = sample)

# Have a look at the top 10 rows and columns to see if the data has been imported correctly. You should see gene IDs as rows and barcodes (i.e. cells) as columns
as.matrix(mat[1:10, 1:10])

# Now convert this to a Seurat object. Again, this may take several minutes to load and use a lot of memory
mat2 <- CreateSeuratObject(counts = mat, project = sample)

# You can see a summary of the data by simply running the Seurat object name
mat2

# Set a colour palette that can contrast multiple clusters when you plot them. 
# You can change these colours as you like. 
# You can see what R colours are available here: http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf
c25 <- c("dodgerblue2", "#E31A1C", "green4",  "#6A3D9A", "#FF7F00", "black", "gold1", "skyblue2", "#FB9A99", "palegreen2", "#CAB2D6", "#FDBF6F", "gray70", "khaki2", "maroon", "orchid1", "deeppink1", "blue1", "steelblue4", "darkturquoise", "green1", "yellow4", "yellow3", "darkorange4", "brown")
Cell Ranger (and nfcore/scrnaseq
)
generates a default directory and file output structure for each sample, which we’ll use in R to complete our analysis. Each sample will have a directory named after the sample, an ‘outs’ subdirectory under this. This ‘outs’ directory contains various files and subdirectories. The subdirectory that contains the count matrix data we need for Seurat analysis is called ‘filtered_feature_bc_matrix’.