Table of Contents |
---|
Aim
...
Demonstrate how to run the nextflow nf-core/rnaseq pipeline in the HPC cluster. Initially, running a test and then processing real-life data.
...
To execute the above command in the HPC cluster, prepare a PBS Pro submission script as follows:
Code Block |
---|
#!/bin/bash -l #PBS -N nfrna2nfrnaseq_test #PBS -l select=1:ncpus=2:mem=4gb #PBS -l walltime=24:00:00 #work on current directory (folder) cd $PBS_O_WORKDIR #load java and set up memory settings to run nextflow module load java NXF_OPTS='-Xms1g -Xmx4g' nextflow run nf-core/rnaseq -profile test,singularity --outdir results -r 3.12.0 |
...
Case 2: Run RNA-seq QC check
Preparing a ‘samplesheet.csv’ file
Prepare a sample sheet file that specifies the input files to be used. To do this, we use an nf-core script to generate the ‘samplesheet.csv’ file as follows (setting strandedness to auto allows the pipeline to determine the strandedness of your RNA-seq data automatically):
...
Code Block |
---|
#!/bin/bash -l #PBS -N nfrnaseq_QC #PBS -l select=1:ncpus=2:mem=4gb #PBS -l walltime=24:00:00 #work on current directory (folder) cd $PBS_O_WORKDIR #load java and set up memory settings to run nextflow module load java NXF_OPTS='-Xms1g -Xmx4g' nextflow run nf-core/rnaseq \ -profile singularity \ -r 3.12.0 \ --input samplesheet.csv \ --outdir results \ --genome GRCh38GRCm38-local \ --skip_trimming \ --skip_alignment \ --skip_pseudo_alignment |
We recommend running the nextflow nf-core/rnaseq pipeline once and then assessing the fastqc results folder to assess if sequence biases are present in the 5'-end and 3'-end ends of the sequences. Version 3.12.0 allows running the pipeline to do quality assessment only, without any alignment, read counting, or trimming. To execute that option, add the following flags to your nextflow run nf-core/rnaseq command: --skip_trimming
, --skip_alignment
, and --skip_pseudo_alignment
.
Submitting the job
Once you have created the folder for the run, the samplesheet.csv file, and launch.pbs, you are ready to submit.
Submit the run with this command
Code Block |
---|
qsub launch.pbs |
Monitoring the Run
You can use the command
Code Block |
---|
qstat -u $USER |
Alternatively, use the command
Code Block |
---|
qjobs |
to check on the jobs, you are running. Nextflow will launch additional jobs during the run.
You can also check the .nextflow.config, and launch.pbs.log file for details on what is going on.
Assessing QC report
Evaluate the nucleotide distributions in the 5'-end and 3'-end of the sequenced reads (Read1 and Read 2). Define how many nucleotides should be trimmed from each region of the sequences. This will inform the parameter setting for Case 3 below.
Case 3: Run RNA-seq pipeline
Adjusting the Trim Galore options
When the initial trimming is done, verify if any more clipping needs to be done and run the nf-core/rnaseq pipeline that will perform all the steps. For example:
Code Block |
---|
#!/bin/bash -l
#PBS -N nfRNAseq
#PBS -l select=1:ncpus=2:mem=4gb
#PBS -l walltime=48:00:00
cd $PBS_O_WORKDIR
module load java
NXF_OPTS='-Xms1g -Xmx4g'
nextflow run nf-core/rnaseq --input samplesheet.csv \
--outdir results \
-r 3.12.0 \
--genome GRCm38-local \
-profile singularity \
--aligner star_salmon \
--extra_trimgalore_args "--quality 30 --clip_r1 10 --clip_r2 10 --three_prime_clip_r1 1 --three_prime_clip_r2 1 " |
Submitting the job
Once you have created the folder for the run, the samplesheet.csv file, and ‘launch.pbs', you are ready to submit.
Submit the run with this command (On Lyra)
Code Block |
---|
qsub launch.pbs |
Monitoring the Run
You can use the command
Code Block |
---|
qstat -u $USER |
...
Code Block |
---|
qjobs |
to check on the jobs , you are running. Nextflow will launch additional jobs during the run.
You can also check the .nextflow.log file for details on what is going on.
Outputs
The pipeline will produce two folders, one called “work,” where all the processing is done, and another called “results,” where we can find the pipeline's outputs. The content of the results folder is as follows:
Code Block |
---|
results/
├── fastqc
├── multiqc
│ └── star_salmon
├── pipeline_info
├── star_salmon
│ ├── bigwig
│ ├── CD49fmNGFRm_rep1
│ ├── CD49fmNGFRm_rep2
│ ├── CD49fmNGFRm_rep3
│ ├── CD49fpNGFRp_rep1
│ ├── CD49fpNGFRp_rep2
│ ├── CD49fpNGFRp_rep3
│ ├── deseq2_qc
│ ├── dupradar
│ ├── featurecounts
│ ├── log
│ ├── MTEC_rep1
│ ├── MTEC_rep2
│ ├── MTEC_rep3
│ ├── picard_metrics
│ ├── qualimap
│ ├── rseqc
│ ├── samtools_stats
│ └── stringtie
└── trimgalore
└── fastqc |
The quantification of the gene and transcript expressions can be found in the ‘star_salmon’ directory.
Code Block |
---|
cd results/star_salmon |
The following feature count tables are generated:
Code Block |
---|
#gene level expression salmon.merged.gene_counts_length_scaled.rds salmon.merged.gene_counts_length_scaled.tsv salmon.merged.gene_counts.rds salmon.merged.gene_counts_scaled.rds salmon.merged.gene_counts_scaled.tsv salmon.merged.gene_counts.tsv <--- This file will be used for differential expression analysis using DESeq2 salmon.merged.gene_tpm.tsv #transcript level expression salmon.merged.transcript_counts.rds salmon.merged.transcript_counts.tsv salmon.merged.transcript_tpm.tsv |