5. RNA-seq pipeline

Overview

  • Use modified launch script to run the full pipeline, including trimming parameters based on the QC output.

  • Inspect precomputed results

Run full nf-core/rnaseq pipeline

STEP1: copy metadata (sample sheet.csv) into the working folder (run2_RNAseq)

cp $HOME/workshop/2024-2/session4_RNAseq/data/mouse/samplesheet.csv $HOME/workshop/2024-2/session4_RNAseq/runs/run2_RNAseq cd $HOME/workshop/2024-2/session4_RNAseq/runs/run2_RNAseq
  • Line 1: Copy the samplesheet.csv file to the working directory

  • Line 2: move to the working directory

Copy the PBS Pro script to run the nf-core/rnaseq pipeline:

cp $HOME/workshop/2024-2/session4_RNAseq/scripts/launch_nf-core_RNAseq_pipeline.pbs $HOME/workshop/2024-2/session4_RNAseq/runs/run2_RNAseq

NOTE: if you had issues with the above lines. Alternatively, run the following code to copy the sample sheet.csv and launch files:

cp /work/training/2024/rnaseq/data/samplesheet.csv $HOME/workshop/2024-2/session4_RNAseq/runs/run2_RNAseq cp /work/training/2024/rnaseq/scripts/launch_nf-core_RNAseq_pipeline.pbs cd $HOME/workshop/2024-2/session4_RNAseq/runs/run2_RNAseq

Adjusting the Trim Galore (read trimming) options

Print the content of the launch_RNAseq.pbs script:

cat launch_nf-core_RNAseq_pipeline.pbs
image-20241013-041718.png

Submitting the job

qsub launch_nf-core_RNAseq_pipeline.pbs

Monitoring the Run

qjobs

Outputs

The pipeline will produce two folders, one called “work,” where all the processing is done, and another called “results,” where we can find the pipeline's outputs. The content of the results folder is as follows:

/work/training/2024/rnaseq/runs/run3_RNAseq/results/ ├── fastqc │   ├── SRR20622172_fastqc.html │   ├── SRR20622172_fastqc.zip │   ├── SRR20622173_fastqc.html │   ├── SRR20622173_fastqc.zip │   ├── SRR20622174_fastqc.html │   ├── SRR20622174_fastqc.zip │   ├── SRR20622175_fastqc.html │   ├── SRR20622175_fastqc.zip │   ├── SRR20622176_fastqc.html │   ├── SRR20622176_fastqc.zip │   ├── SRR20622177_fastqc.html │   ├── SRR20622177_fastqc.zip │   ├── SRR20622178_fastqc.html │   ├── SRR20622178_fastqc.zip │   ├── SRR20622179_fastqc.html │   ├── SRR20622179_fastqc.zip │   ├── SRR20622180_fastqc.html │   └── SRR20622180_fastqc.zip ├── multiqc │   └── star_salmon ├── pipeline_info │   ├── execution_report_2024-08-08_12-45-46.html │   ├── execution_timeline_2024-08-08_12-45-46.html │   ├── execution_trace_2024-08-08_12-45-46.txt │   ├── params_2024-08-08_14-01-19.json │   ├── pipeline_dag_2024-08-08_12-45-46.html │   └── software_versions.yml ├── star_salmon │   ├── bigwig │   ├── deseq2_qc │   ├── dupradar │   ├── featurecounts │   ├── log │   ├── metadata.xlsx │   ├── picard_metrics │   ├── qualimap │   ├── rseqc │   ├── salmon.merged.gene_counts_length_scaled.rds │   ├── salmon.merged.gene_counts_length_scaled.tsv │   ├── salmon.merged.gene_counts.rds │   ├── salmon.merged.gene_counts_scaled.rds │   ├── salmon.merged.gene_counts_scaled.tsv │   ├── salmon.merged.gene_counts.tsv │   ├── salmon.merged.gene_lengths.tsv │   ├── salmon.merged.gene_tpm.tsv │   ├── salmon.merged.transcript_counts.rds │   ├── salmon.merged.transcript_counts.tsv │   ├── salmon.merged.transcript_lengths.tsv │   ├── salmon.merged.transcript_tpm.tsv │   ├── samtools_stats │   ├── SRR20622172 │   ├── SRR20622172.markdup.sorted.bam │   ├── SRR20622172.markdup.sorted.bam.bai │   ├── SRR20622173 │   ├── SRR20622173.markdup.sorted.bam │   ├── SRR20622173.markdup.sorted.bam.bai │   ├── SRR20622174 │   ├── SRR20622174.markdup.sorted.bam │   ├── SRR20622174.markdup.sorted.bam.bai │   ├── SRR20622175 │   ├── SRR20622175.markdup.sorted.bam │   ├── SRR20622175.markdup.sorted.bam.bai │   ├── SRR20622176 │   ├── SRR20622176.markdup.sorted.bam │   ├── SRR20622176.markdup.sorted.bam.bai │   ├── SRR20622177 │   ├── SRR20622177.markdup.sorted.bam │   ├── SRR20622177.markdup.sorted.bam.bai │   ├── SRR20622178 │   ├── SRR20622178.markdup.sorted.bam │   ├── SRR20622178.markdup.sorted.bam.bai │   ├── SRR20622179 │   ├── SRR20622179.markdup.sorted.bam │   ├── SRR20622179.markdup.sorted.bam.bai │   ├── SRR20622180 │   ├── SRR20622180.markdup.sorted.bam │   ├── SRR20622180.markdup.sorted.bam.bai │   ├── stringtie │   └── tx2gene.tsv └── trimgalore ├── fastqc ├── SRR20622172.fastq.gz_trimming_report.txt ├── SRR20622173.fastq.gz_trimming_report.txt ├── SRR20622174.fastq.gz_trimming_report.txt ├── SRR20622175.fastq.gz_trimming_report.txt ├── SRR20622176.fastq.gz_trimming_report.txt ├── SRR20622177.fastq.gz_trimming_report.txt ├── SRR20622178.fastq.gz_trimming_report.txt ├── SRR20622179.fastq.gz_trimming_report.txt └── SRR20622180.fastq.gz_trimming_report.txt

The quantification of the gene and transcript expressions can be found in the ‘star_salmon’ directory.

cd results/star_salmon

The following feature count tables are generated:

#gene level expression salmon.merged.gene_counts_length_scaled.rds salmon.merged.gene_counts_length_scaled.tsv salmon.merged.gene_counts.rds salmon.merged.gene_counts_scaled.rds salmon.merged.gene_counts_scaled.tsv salmon.merged.gene_counts.tsv <--- This file will be used for differential expression analysis using DESeq2 salmon.merged.gene_tpm.tsv #transcript level expression salmon.merged.transcript_counts.rds salmon.merged.transcript_counts.tsv salmon.merged.transcript_tpm.tsv