...
Installed conda3 or miniconda3 ( https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html )
Basic unix Unix command line knowledge (example: https://researchcomputing.princeton.edu/education/external-online-resources/linux ; https://swcarpentry.github.io/shell-novice/ )
Familiarity with one unix text editors (example Vi/Vim or Nano):
Have an HPC account on QUT’s lyraHPC compute. Apply for a new HPC account here.
R tutorials:
...
The nf-core/rnaseq workflow requires Nextflow to be installed in your account on the HPC. Find details on how to install and test Nextflow here prepare . Prepare a cextflownextflow.config file and run a PBS pro submission script for Nextflow pipelines.
Additional information is available here: https://nf-co.re/usage/installation
...
Usage: https://nf-co.re/rnaseq/3.0/usage
GithubGitHub: https://github.com/nf-core/rnaseq
Pipeline
...
Summary
FastQC - Raw read QC
UMI-tools extract - UMI barcode extraction
TrimGalore - Adapter and quality trimming
SortMeRNA - Removal of ribosomal RNA (optional)
STAR and Salmon - Fast spliced aware genome alignment and transcriptome quantification
STAR via RSEM - Alignment and quantification of expression levels
HISAT2 - Memory efficient splice aware alignment to a reference
SAMtools - Sort and index alignments
UMI-tools dedup - UMI-based deduplication
picard MarkDuplicates - Duplicate read marking
StringTie - Transcript assembly and quantification
BEDTools and bedGraphToBigWig - Create bigWig coverage files
RSeQC - Various RNA-seq QC metrics
Qualimap - Various RNA-seq QC metrics
dupRadar - Assessment of technical/biological read duplication
Preseq - Estimation of library complexity
featureCounts - Read counting relative to gene biotype
DESeq2 - PCA plot and sample pairwise distance heatmap and dendrogram
MultiQC - Present QC for raw reads, alignment, read counting, and sample similiaritysimilarity
Pseudo-alignment and quantification
Salmon - Wicked fast gene and isoform quantification relative to the transcriptome
Workflow reporting and genomes
Reference genome files - Saving reference genome indices/files
Pipeline information - Report metrics generated during the workflow execution
...
Download and run the workflow using a minimal data provided by nf-core/rnaseq. We recommend using singularity as the profile for QUT’s HPC. Other Another profile option can be ‘conda’‘conda.’ Note: the profile option ‘docker’ is not available unavailable on the HPC.
Code Block |
---|
nextflow run nf-core/rnaseq -profile test,singularity --outdir results -r 3.10.1 |
Running the pipeline using custom data
Prepare a sample sheet file that specifies the input files to be used. To do this, we use an nf-core script to generate the ‘samplesheet.csv’ file as follows:
Code Block |
---|
#download script and make executable
wget -L https://raw.githubusercontent.com/nf-core/rnaseq/master/bin/fastq_dir_to_samplesheet.py
chmod +x fastq_dir_to_samplesheet.py
#generate the samplesheet.csv file
fastq_dir_to_samplesheet.py /path/to/directory/containing/fastq_files/ samplesheet.csv \
--strandedness reverse \
--read1_extension R1.fastq.gz \
--read2_extension R2.fastq.gz |
Example of a typical command to run a an RNA-seq analysis for mouse samples:
Code Block |
---|
nextflow run nf-core/rnaseq --input samplesheet.csv \ ---input index.csv \outdir results \ -r 3.10.1 \ --genome GRCm38 GRCh38 \ -profile singularity \ --aligner star_salmon_rsem \ --clip_r1 10 \ --clip_r2 10 \ -profile singularity --three_prime_clip_r1 1 \ -r 3.3 --three_prime_clip_r2 1 |
Note, if the running was interrupted or you did not complete a particular step, or you want to modify a parameter for a particular step, instead of re-running all process processes again, nextflow enables you to “-resume” the workflow.
Code Block |
---|
nextflow run nf-core/rnaseq --input samplesheet.csv \ ---input index.csv \outdir results \ -r 3.10.1 \ --genome GRCm38 GRCh38 \ -profile singularity \ --aligner star_salmon \rsem \ --clip_r1 10 \ --clip_r2 10 \ -profile singularity--three_prime_clip_r1 1 \ -r 3.3 --three_prime_clip_r2 1 \ -resume |
Preparing a ‘samplesheet.csv’ file
...