nf-core/scrnaseq: A single-cell RNAseq pipeline for 10X genomics data
Prepared by the eResearch Office, QUT.
This page provides a guide to QUT users on how to install and run the NextFlow scrnaseq: Introductionworkflow on the HPC.
Further details on setting up, running and output of the nf-core/scrnaseq workflow can be found here:
https://nf-co.re/scrnaseq/1.1.0
https://nf-co.re/scrnaseq/1.1.0/usage
https://nf-co.re/scrnaseq/1.1.0/parameters
https://nf-co.re/scrnaseq/1.1.0/output
Table of contents
Analysis environment
Nextflow is designed to be run on Linux command-line. This guide assumes the QUT HPC (high performance computing cluster) will be used for this. When running a complex analysis (e.g. a Nextflow workflow) on the HPC, commands are submitted into a job queue using a system called PBS.
If you don’t have a HPC account or haven’t used the HPC/PBS before, click on the below link for details on how to create a HP account and how to use the PBS job submission system:
This nf-core/scrnaseq guide is designed so that people with even minimal Linux command-line experience can use it. However, some familiarity with using Linux at a command line is recommended. If you have no experience at all using Linux commands, we advise you watch an introductory video or video series on the topic, such as:
Linux Command Line Tutorial For Beginners 1 - Introduction
Install Nextflow
The nf-core/scrnaseq workflow requires Nextflow to be installed in your account on the HPC. Find details on how to install and test Nextflow here, including preparing a nextflow.config file and running a PBS pro submission script for Nextflow pipelines.
Additional information available here: Docs: Installation of nf-core dependencies
Test the nf-core/scrnaseq workflow
The nf-core/scrnaseq workflow contains a built-in dataset to test. The below code chunk is the basic command to run the nf-core/scrnaseq test.
nextflow run nf-core/scrnaseq --fasta https://github.com/nf-core/test-datasets/raw/scrnaseq/reference/GRCm38.p6.genome.chr19.fa --gtf https://github.com/nf-core/test-datasets/raw/scrnaseq/reference/gencode.vM19.annotation.chr19.gtf -profile test,singularity
However, as mentioned, commands should be submitted to the queue, rather than copy and pasted directly. See the below section for how to submit the above as a PBS job.
Submitting as a PBS job
To create a PBS job script to run the above nf-core/scrnaseq test, follow these steps:
Load a text editor, such as nano (copy and paste the following code into the command-line)
module load nano
2. Create an empty file called ‘launch.pbs’ using nano
nano launch.pbs
3. Copy and paste the following into nano
4. Exit and save the file. Control-x exits nano which then asks you if you want to save.
5. The job can then be submitted by:
Viewing the workflow progress
When submitted as a PBS job, Nextflow splits the analysis up into multiple sub-processes that are run and queued according to available HPC memory and CPUs. Nextflow Tower has been set up on the HPC so you can observe sub-processes as they are submitted, run and completed, as well as a multitude of other details.
To access Nextflow Tower, click on the below link.
This will ask you for an email address. Submit your QUT email address and you will be sent a link to access Tower.
You can now watch the progress of your nf-core/scrnaseq test workflow (or any other Nextflow workflows you are running). When all the sub-processes are complete and the nf-core/scrnaseq turns green, the test is successful.
Full workflow
nextflow run nf-core/scrnaseq
-profile test,singularity --aligner kallisto -resume --fasta https://github.com/nf-core/test-datasets/raw/scrnaseq/reference/GRCm38.p6.genome.chr19.fa --gtf https://github.com/nf-core/test-datasets/raw/scrnaseq/reference/gencode.vM19.annotation.chr19.gtf
Mouse genome
Mouse GTF
http://ftp.ensembl.org/pub/release-104/gtf/mus_musculus/Mus_musculus.GRCm39.104.chr.gtf.gz