Copy and paste the code below to the terminal:
cp $HOME/workshop/2024/rnaseq/data/samplesheet.csv $HOME/workshop/2024-2/session4_RNAseq/runs/run3_RNAseq cd $HOME/workshop/2024-2/session4_RNAseq/runs/run3_RNAseq
Line 1: Copy the samplesheet.csv file to the working directory
Line 2: move to the working directory
Copy the PBS Pro script to run the nf-core/rnaseq pipeline:
cp $HOME/workshop/2024/rnaseq/scripts/launch_nf-core_RNAseq_pipeline.pbs $HOME/workshop/2024-2/session4_RNAseq/runs/run3_RNAseq
Adjusting the Trim Galore (read trimming) options
Print the content of the launch_RNAseq.pbs
script:
cat launch_nf-core_RNAseq_pipeline.pbs
#!/bin/bash -l #PBS -N nfRNAseq #PBS -l select=1:ncpus=2:mem=4gb #PBS -l walltime=48:00:00
#work on current directory cd $PBS_O_WORKDIR
#load java and set up memory settings to run nextflow module load java export NXF_OPTS='-Xms1g -Xmx4g'
nextflow run nf-core/rnaseq --input samplesheet.csv \ --outdir results \ -r 3.14.0 \ --genome GRCm38-local \ -profile singularity \ --aligner star_salmon \ --extra_trimgalore_args "--quality 30 --clip_r1 10 --clip_r2 10 --three_prime_clip_r1 1 --three_prime_clip_r2 1 " |
---|
Submitting the job
qsub launch_nf-core_RNAseq_pipeline.pbs
Monitoring the Run
qjobs
Outputs
The pipeline will produce two folders, one called “work,” where all the processing is done, and another called “results,” where we can find the pipeline's outputs. The content of the results folder is as follows:
/work/training/2024/rnaseq/runs/run3_RNAseq/results/ ├── fastqc │ ├── SRR20622172_fastqc.html │ ├── SRR20622172_fastqc.zip │ ├── SRR20622173_fastqc.html │ ├── SRR20622173_fastqc.zip │ ├── SRR20622174_fastqc.html │ ├── SRR20622174_fastqc.zip │ ├── SRR20622175_fastqc.html │ ├── SRR20622175_fastqc.zip │ ├── SRR20622176_fastqc.html │ ├── SRR20622176_fastqc.zip │ ├── SRR20622177_fastqc.html │ ├── SRR20622177_fastqc.zip │ ├── SRR20622178_fastqc.html │ ├── SRR20622178_fastqc.zip │ ├── SRR20622179_fastqc.html │ ├── SRR20622179_fastqc.zip │ ├── SRR20622180_fastqc.html │ └── SRR20622180_fastqc.zip ├── multiqc │ └── star_salmon ├── pipeline_info │ ├── execution_report_2024-08-08_12-45-46.html │ ├── execution_timeline_2024-08-08_12-45-46.html │ ├── execution_trace_2024-08-08_12-45-46.txt │ ├── params_2024-08-08_14-01-19.json │ ├── pipeline_dag_2024-08-08_12-45-46.html │ └── software_versions.yml ├── star_salmon │ ├── bigwig │ ├── deseq2_qc │ ├── dupradar │ ├── featurecounts │ ├── log │ ├── metadata.xlsx │ ├── picard_metrics │ ├── qualimap │ ├── rseqc │ ├── salmon.merged.gene_counts_length_scaled.rds │ ├── salmon.merged.gene_counts_length_scaled.tsv │ ├── salmon.merged.gene_counts.rds │ ├── salmon.merged.gene_counts_scaled.rds │ ├── salmon.merged.gene_counts_scaled.tsv │ ├── salmon.merged.gene_counts.tsv │ ├── salmon.merged.gene_lengths.tsv │ ├── salmon.merged.gene_tpm.tsv │ ├── salmon.merged.transcript_counts.rds │ ├── salmon.merged.transcript_counts.tsv │ ├── salmon.merged.transcript_lengths.tsv │ ├── salmon.merged.transcript_tpm.tsv │ ├── samtools_stats │ ├── SRR20622172 │ ├── SRR20622172.markdup.sorted.bam │ ├── SRR20622172.markdup.sorted.bam.bai │ ├── SRR20622173 │ ├── SRR20622173.markdup.sorted.bam │ ├── SRR20622173.markdup.sorted.bam.bai │ ├── SRR20622174 │ ├── SRR20622174.markdup.sorted.bam │ ├── SRR20622174.markdup.sorted.bam.bai │ ├── SRR20622175 │ ├── SRR20622175.markdup.sorted.bam │ ├── SRR20622175.markdup.sorted.bam.bai │ ├── SRR20622176 │ ├── SRR20622176.markdup.sorted.bam │ ├── SRR20622176.markdup.sorted.bam.bai │ ├── SRR20622177 │ ├── SRR20622177.markdup.sorted.bam │ ├── SRR20622177.markdup.sorted.bam.bai │ ├── SRR20622178 │ ├── SRR20622178.markdup.sorted.bam │ ├── SRR20622178.markdup.sorted.bam.bai │ ├── SRR20622179 │ ├── SRR20622179.markdup.sorted.bam │ ├── SRR20622179.markdup.sorted.bam.bai │ ├── SRR20622180 │ ├── SRR20622180.markdup.sorted.bam │ ├── SRR20622180.markdup.sorted.bam.bai │ ├── stringtie │ └── tx2gene.tsv └── trimgalore ├── fastqc ├── SRR20622172.fastq.gz_trimming_report.txt ├── SRR20622173.fastq.gz_trimming_report.txt ├── SRR20622174.fastq.gz_trimming_report.txt ├── SRR20622175.fastq.gz_trimming_report.txt ├── SRR20622176.fastq.gz_trimming_report.txt ├── SRR20622177.fastq.gz_trimming_report.txt ├── SRR20622178.fastq.gz_trimming_report.txt ├── SRR20622179.fastq.gz_trimming_report.txt └── SRR20622180.fastq.gz_trimming_report.txt
The quantification of the gene and transcript expressions can be found in the ‘star_salmon’ directory.
cd results/star_salmon
The following feature count tables are generated:
#gene level expression salmon.merged.gene_counts_length_scaled.rds salmon.merged.gene_counts_length_scaled.tsv salmon.merged.gene_counts.rds salmon.merged.gene_counts_scaled.rds salmon.merged.gene_counts_scaled.tsv salmon.merged.gene_counts.tsv <--- This file will be used for differential expression analysis using DESeq2 salmon.merged.gene_tpm.tsv #transcript level expression salmon.merged.transcript_counts.rds salmon.merged.transcript_counts.tsv salmon.merged.transcript_tpm.tsv