Content Comparison

Overview of today’s session:

...

inspect the results from Session 2
run an advanced RNA-seq pipeline to measure the expression of genes
(optional) run statistical analysis to identify differentially expressed genes

Task 1: Evaluation of RNA-seq results using a basic (generic) nextflow pipeline

The nextflow/RNA-seq pipeline automatically generates two output folders:

...

Connect to the work folder via HPC-FS (See session 2). Browse to the fastqc output folder: run1_star_salmon → results → multiqc.

Task 2: Run the nextflow nf-core/rnaseq pipeline by including advanced filtering parameters

Requirements:

index.csv → a file that provides a list of sample IDs and their associated FASTQ files (read 1 and read 2)
launch.pbs → a script to submit the job to the HPC cluster

Example index.csv file for nf-core/rnaseq version 3.3:

Code Block

group,fastq_1,fastq_2,strandedness
control_r1,/work/kenna_team/data/raw_data/SRR1039508_1.fastq.gz,/work/kenna_team/data/raw_data/SRR1039508_2.fastq.gz,unstranded
dex_r1,/work/kenna_team/data/raw_data/SRR1039509_1.fastq.gz,/work/kenna_team/data/raw_data/SRR1039509_2.fastq.gz,unstranded
control_r2,/work/kenna_team/data/raw_data/SRR1039512_1.fastq.gz,/work/kenna_team/data/raw_data/SRR1039512_2.fastq.gz,unstranded
dex_r2,/work/kenna_team/data/raw_data/SRR1039513_1.fastq.gz,/work/kenna_team/data/raw_data/SRR1039513_2.fastq.gz,unstranded
control_r3,/work/kenna_team/data/raw_data/SRR1039516_1.fastq.gz,/work/kenna_team/data/raw_data/SRR1039516_2.fastq.gz,unstranded
dex_r3,/work/kenna_team/data/raw_data/SRR1039517_1.fastq.gz,/work/kenna_team/data/raw_data/SRR1039517_2.fastq.gz,unstranded
control_r4,/work/kenna_team/data/raw_data/SRR1039520_1.fastq.gz,/work/kenna_team/data/raw_data/SRR1039520_2.fastq.gz,unstranded
dex_r4,/work/kenna_team/data/raw_data/SRR1039521_1.fastq.gz,/work/kenna_team/data/raw_data/SRR1039521_2.fastq.gz,unstranded

Example of a launch.pbs script with advanced parameter options:

Code Block

#!/bin/bash -l
#PBS -N nfrnaseq
#PBS -l select=1:ncpus=2:mem=4gb
#PBS -l walltime=24:00:00

#Use the current directory to run the workflow
cd $PBS_O_WORKDIR

module load java
NXF_OPTS='-Xms1g -Xmx4g'

#run the nextflow RNA-seq pipeline:
nextflow run nf-core/rnaseq -profile singularity -r 3.3 --aligner star_salmon --input index.csv --genome GRCh38 --clip_r1 10 --clip_r2 10 --three_prime_clip_r1 2 --three_prime_clip_r2 2 --save_trimmed -resume

Session 3 exercises:

Run the nf-core/rnaseq pipeline using the Airway smooth muscle public data (PMID: 24926665. GEO: GSE52778) - aligner option set to ‘star_salmon’
Same as above but aligner option set to ‘star_rsem’

Create a new working folder:

Code Block
mkdir session3 cd session3 mkdir run1_star_salmon cd run1_star_salmon cd ..

Copy index.csv and launch.pbs files to the newly created folder

Code Block
cp /work/kenna_team/scripts/star_salmon/session3/* .

Check that files were copied into the new working folder

Code Block
ls -a ./ ../ index.csv launch.pbs #verify the content of index.csv cat index.csv #also check the PBS Pro submission script cat launch.pbs

Run the workflow:

Code Block
qsub launch.pbs

Monitor the progress of the workflow:

Code Block
qjobs

or

Code Block
qstats -u userID

Repeat the above process for ‘star_rsem’

The only variation is copying the index.csv and launch.pbs script. As follows:

Code Block
cp /work/kenna_team/scripts/star_rsem/session3/* .

Version	Old Version 3	New Version 4
Changes made by	Roberto Barrero Gumiel	Roberto Barrero Gumiel
Saved on	Oct 07, 2021	Oct 07, 2021

Versions Compared

Key

Task 1: Evaluation of RNA-seq results using a basic (generic) nextflow pipeline

Task 2: Run the nextflow nf-core/rnaseq pipeline by including advanced filtering parameters

Session 3 exercises:

Create a new working folder:

Copy index.csv and launch.pbs files to the newly created folder