Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

To execute the above command in the HPC cluster, prepare a PBS Pro submission script as follows:

...

Code Block
#!/bin/bash -l
#PBS -N nfrnaseq_QC
#PBS -l select=1:ncpus=2:mem=4gb
#PBS -l walltime=24:00:00

#work on current directory (folder)
cd $PBS_O_WORKDIR

#load java and set up memory settings to run nextflow
module load java
NXF_OPTS='-Xms1g -Xmx4g'

nextflow run nf-core/rnaseq \
      -profile singularity \
      -r 3.12.0 \
      --input samplesheet.csv \
      --outdir results \
      --genome GRCh38GRCm38-local \
      --skip_trimming \
      --skip_alignment \
      --skip_pseudo_alignment

We recommend running the nextflow nf-core/rnaseq pipeline once and then assessing the fastqc results folder to assess if sequence biases are present in the 5'-end and 3'-end ends of the sequences. Version 3.12.0 allows running the pipeline to do quality assessment only, without any alignment, read counting, or trimming. To execute that option, add the following flags to your nextflow run nf-core/rnaseq command: --skip_trimming, --skip_alignment, and --skip_pseudo_alignment.

...

Once you have created the folder for the run, the samplesheet.csv file, nextflow.config, and launch.pbs, you are ready to submit.

...

Code Block
#!/bin/bash -l
#PBS -N nfRNAseq
#PBS -l select=1:ncpus=2:mem=4gb
#PBS -l walltime=48:00:00

cd $PBS_O_WORKDIR
module load java
NXF_OPTS='-Xms1g -Xmx4g'

nextflow run nf-core/rnaseq --input samplesheet.csv \
        --outdir results \
        -r 3.12.0 \
        --genome GRCh38GRCm38-local \
        -profile singularity \
        --aligner star_salmon \
        --extra_trimgalore_args "--quality 30 --clip_r1 10 --clip_r2 10 --three_prime_clip_r1 1 --three_prime_clip_r2 1 "

...

Once you have created the folder for the run, the samplesheet.csv file, nextflow.config, and launch‘launch.pbs', you are ready to submit.

Submit the run with this command (On Lyra)

Code Block
qsub launch.pbs

...

Code Block
qjobs

to check on the jobs , you are running. Nextflow will launch additional jobs during the run.

You can also check the .nextflow.log file for details on what is going on.

Outputs

The pipeline will produce two folders, one called “work,” where all the processing is done, and another called “results,” where we can find the pipeline's outputs. The content of the results folder is as follows:

Code Block
results/
├── fastqc
├── multiqc
│   └── star_salmon
├── pipeline_info
├── star_salmon
│   ├── bigwig
│   ├── CD49fmNGFRm_rep1
│   ├── CD49fmNGFRm_rep2
│   ├── CD49fmNGFRm_rep3
│   ├── CD49fpNGFRp_rep1
│   ├── CD49fpNGFRp_rep2
│   ├── CD49fpNGFRp_rep3
│   ├── deseq2_qc
│   ├── dupradar
│   ├── featurecounts
│   ├── log
│   ├── MTEC_rep1
│   ├── MTEC_rep2
│   ├── MTEC_rep3
│   ├── picard_metrics
│   ├── qualimap
│   ├── rseqc
│   ├── samtools_stats
│   └── stringtie
└── trimgalore
    └── fastqc

The quantification of the gene and transcript expressions can be found in the ‘star_salmon’ directory.

Code Block
cd results/star_salmon

The following feature count tables are generated:

Code Block
#gene level expression
salmon.merged.gene_counts_length_scaled.rds
salmon.merged.gene_counts_length_scaled.tsv
salmon.merged.gene_counts.rds
salmon.merged.gene_counts_scaled.rds
salmon.merged.gene_counts_scaled.tsv
salmon.merged.gene_counts.tsv   <--- This file will be used for differential expression analysis using DESeq2
salmon.merged.gene_tpm.tsv

#transcript level expression
salmon.merged.transcript_counts.rds
salmon.merged.transcript_counts.tsv
salmon.merged.transcript_tpm.tsv