Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents

...

Use the terminal to log into the HPC and create a /small_RNAseq/ folder to run the nf-core/rnaseq smrnaseq pipeline. For example:

Code Block
mkdir -p $HOME/workshop/small_RNAseq/scripts
cp /work/training/2024/smallRNAseq/scripts/* $HOME/workshop/small_RNAseq/scripts/
ls -l $HOME/workshop/small_RNAseq/scripts/
  • Line 1: The -p indicates create ' parental directories as required. Thus the line 1 command creates both /workshop/ and the subfolder /workshop/scripts/

  • Line 2: Copies all files from /work/training/datasets2024/workshopsmallRNAseq/scripts/ as noted by an asterisk to the newly created folder $HOME/workshop/small_RNAseq/scripts/

  • Line 3: List the files in the script folder

Copy multiple subdirectories and files using rsync

Code Block
mkdir -p $HOME/workshop/small_RNAseq/data/
rsync -rv /work/training/2024/smallRNAseq/data/ $HOME/workshop/small_RNAseq/data/
  • Line 1: The first command creates the folder /scriptsdata/

  • Line 2: rsync copies all subfolders and files from the specified source folder to the selected destination folder. The -r = recursively will copy directories and files; -v = verbose messages of the transfer of files

Create a folder for running the nf-core small RNA-seq pipeline

...

Code Block
mkdir -p $HOME/workshop/small_RNAseq
mkdir $HOME/workshop/small_RNAseq/run1_test
mkdir -p $HOME/workshop/small_RNAseq/run2_smallRNAseq_human
cd $HOME/workshop/small_RNAseq/
  • Lines 1-42: create sub-folders for each exercise

  • Line 53: change the directory to the folder “small_RNAseq”

Exercise 1: Running a test with nf-core sample data

First, let’s assess the execution of the nf-core/rnaseq smrnaseq pipeline by running a test using sample data.

...

Code Block
cd $HOME/workshop/small_RNAseq/run1_test
cp $HOME/workshop/small_RNAseq/scripts/launch_nf-core_smallRNAseq_human_test.pbs .

View the content of the script as follows:

Code Block
cat launch_nf-core_smallRNAseq_human_test.pbs

#!/bin/bash -l

#PBS -N nfsmrnaseq sRNAseq_test

#PBS -l select=1:ncpus=2:mem=4gb

#PBS -l walltime=24:00:00

#work on current directory (folder) #PBS -m abe

cd $PBS_O_WORKDIR #work in current directory (folder)

module load java #load java and set up memory settings to run nextflow module load java

export NXF_OPTS='-Xms1g -Xmx4g'

# run the test

nextflow run nf-core/smrnaseq -r 2.3.1 \

-profile test,singularity \

--outdir results -r 2.1.0 # run the test

where:

  • nextflow command: nextflow run

  • pipeline name: nf-core/smrnaseq

  • pipeline version: -r 2.3.1.0

  • container type and sample data: -profile test,singularity

  • output directory: --outdir results

Submitting the job

Now we can submit the small RNAseq test job to the HPC scheduler:

Code Block
qsub launch_nf-core_smallRNAseq_human_test.pbs

Monitoring the Run

Code Block
qjobs

Exercise 2: Running the small RNA pipeline using public human data

The pipeline requires preparing at least 2 files:

  • Metadata file (samplesheet.csv) thatspecifies the “sample name” and “location of FASTQ files” ('Read 1').

  • PBS Pro script (launch_nf-core_smallRNAseq_human.pbs) with instructions to run the pipeline

...

How to create the metadata file (samplesheet.csv):

Change to the data folder directory:

...

Code Block
cp /work/training/smallRNAseq/scripts/create_nf-core_smallRNAseq_samplesheet.sh $HOME/workshop/small_RNAseq/data/human
  • Note: you could replace ‘$HOME/workshop/data’ /small_RNAseq/data/human’ with “.” A dot indicates ‘current directory’ and will copy the file to the directory where you are currently located

View the content of the script:

Code Block
cat create_nf-core_smallRNAseq_samplesheet.sh

#!/bin/bash -l

#User defined variables.

##########################################################

DIR='$HOME/workshop/small_RNAseq/data/human'

INDEX='samplesheet.csv'

##########################################################

#load python module

module load python/3.10.8-gcccore-12.2.0

#fetch the script to create the sample metadata table

wget -L https://raw.githubusercontent.com/nf-core/rnaseq/master/bin/fastq_dir_to_samplesheet.py

chmod +x fastq_dir_to_samplesheet.py

#generate initial sample metadata file

./fastq_dir_to_samplesheet.py  $DIR index.csv \

        --strandedness auto \

        --read1_extension .fastq.gz

#format index file

cat index.csv | awk -F "," '{print $1 "," $2}' > ${INDEX}

#Remove intermediate files:

rm index.csv fastq_dir_to_samplesheet.py

Let’s generate the metadata file by running the following command:

Code Block
sh create_nf-core_RNAseqsmallRNAseq_samplesheet.sh

Check the newly created samplesheet.csv file:

Code Block
ls -l
cat samplesheet.cvs

sample,fastq_1

SRR20753704ERR409878,/home/workthomsonv/trainingworkshop/smallRNAseqsmall_RNAseq/data/human/SRR20753704ERR409878.fastq.gzSRR20753705

ERR409879,/home/workthomsonv/trainingworkshop/smallRNAseqsmall_RNAseq/data/human/SRR20753705ERR409879.fastq.gzSRR20753706

ERR409880,/home/workthomsonv/trainingworkshop/smallRNAseqsmall_RNAseq/data/SRR20753706human/ERR409880.fastq.gz

ERR409881,/home/thomsonv/workshop/small_RNAseq/data/human/ERR409881.fastq.gzSRR20753707,/

work/training/smallRNAseq/data/SRR20753707ERR409882,/home/thomsonv/workshop/small_RNAseq/data/human/ERR409882.fastq.gz

ERR409883,/home/thomsonv/workshop/small_RNAseq/data/human/ERR409883.fastq.gz

SRR20753708ERR409884,/home/workthomsonv/trainingworkshop/smallRNAseqsmall_RNAseq/data/human/SRR20753708ERR409884.fastq.gzSRR20753709

ERR409885,/home/workthomsonv/trainingworkshop/smallRNAseqsmall_RNAseq/data/human/SRR20753709ERR409885.fastq.gzSRR20753716,/work/training/smallRNAseq/data/SRR20753716

ERR409886,/home/thomsonv/workshop/small_RNAseq/data/human/ERR409886.fastq.gz

ERR409887,/home/thomsonv/workshop/small_RNAseq/data/human/ERR409887.fastq.gz

ERR409888,/home/thomsonv/workshop/small_RNAseq/data/human/ERR409888.fastq.gz

ERR409889,/home/thomsonv/workshop/small_RNAseq/data/human/ERR409889.fastq.gz

ERR409890,/home/thomsonv/workshop/small_RNAseq/data/human/ERR409890.fastq.gz

ERR409891,/home/thomsonv/workshop/small_RNAseq/data/human/ERR409891.fastq.gz

ERR409892,/home/thomsonv/workshop/small_RNAseq/data/human/ERR409892.fastq.gz

ERR409893,/home/thomsonv/workshop/small_RNAseq/data/human/ERR409893.fastq.gz

ERR409894,/home/thomsonv/workshop/small_RNAseq/data/human/ERR409894.fastq.gz

ERR409895,/home/thomsonv/workshop/small_RNAseq/data/human/ERR409895.fastq.gz

SRR20753717ERR409896,/home/workthomsonv/trainingworkshop/smallRNAseqsmall_RNAseq/data/human/SRR20753717ERR409896.fastq.gzSRR20753718

ERR409897,/home/workthomsonv/trainingworkshop/smallRNAseqsmall_RNAseq/data/SRR20753718human/ERR409897.fastq.gzSRR20753719

ERR409898,/home/workthomsonv/trainingworkshop/smallRNAseqsmall_RNAseq/data/SRR20753719human/ERR409898.fastq.gzSRR20753720

ERR409899,/home/workthomsonv/trainingworkshop/smallRNAseqsmall_RNAseq/data/human/SRR20753720ERR409899.fastq.gzSRR20753721

ERR409900,/home/workthomsonv/trainingworkshop/smallRNAseqsmall_RNAseq/data/human/SRR20753721ERR409900.fastq.gz

Copy the PBS Pro script for running the full small RNAseq pipeline (launch_nf-core_smallRNAseq_human.pbs)

Copy and paste the code below to the terminal:

Code Block
cp $HOME/workshop/small_RNAseq/data/human/samplesheet.csv $HOME/workshop/small_RNAseq/run2_smallRNAseq_human
cp $HOME/workshop/small_RNAseq/scripts/launch_nf-core_smallRNAseq_human.pbs $HOME/workshop/small_RNAseq/run2_smallRNAseq_human
cd $HOME/workshop/small_RNAseq/run2_smallRNAseq_human
  • Line 1: Copy the samplesheet.csv file to the working directory

  • Line 2: copy Copy the launch_nf-core_smallRNAseq_human.pbs submission script to the working directory

  • Line 3: move Move to the working directory

View the content of the launch_nf-core_RNAseq_QC.pbs script:

Code Block
cat launch_nf-core_smallRNAseq_human.pbs

#!/bin/bash -l

#PBS -N nfsmallRNAseqsRNAseq_human

#PBS -l select=1:ncpus=2:mem=4gb

#PBS -l walltime=24:00:00

#PBS -m abe

cd $PBS_O_WORKDIR #run the tasks in the current working directory

cd $PBS_O_WORKDIR

module load java #load java and assign up to 4GB RAM memory for nextflow to usemodule load java

export NXF_OPTS='-Xms1g -Xmx4g'

#run the small RNAseq pipeline

nextflow run nf-core/smrnaseq -r 2.3.1 .0 \

        -profile singularity \

        --outdir results \

        --input samplesheet.csv \

        --genome GRCh38-local \

        --mirtrace_species hsa \

        --three_prime_adapter 'TGGAATTCTCGGGTGCCAAGG' \

        --fastp_min_length 18 \

        --fastp_max_length 30 \

        --hairpin /work/training/smallRNAseq/data/mirbase/hairpin.fa \

        --mature /work/training/smallRNAseq/data/mirbase/mature.fa \

        --mirna_gtf /work/training/smallRNAseq/data/mirbase/hsa.gff3 \

        -resume #run the small RNAseq pipeline

Submit the job to the HPC cluster:

...

Note: the “mature_counts.csv” needs to be transposed prior running the statistical analysis. This can be done either user the R script or using a script called “transpose_csv.py”.

Let’s initially create a “DESeq2” folder and copy the files needed for the statistical analysis:

...

Code Block
python transpose_csv.py --input mature_counts.csv --out mature_counts.txt

Differential expression analysis using RStudio

Differential expression analysis for smRNA-Seq is similar to regular RNA-Seq. Since you have already done the step-wise analysis in session 3, in this session we will streamline the analysis by running a single R script.

...

a. Open Windows Explorer.

b. Go to: H:\workshop\small_RNAseq

c. Create a new folder here called ‘DESeq2’ (NOTE: R is case-sensitive, so it must be named exactly like this)

...

c. Hit the save button and save this file in the working directory you created above (H:\workshop\small_RNAseq\DESeq2). Name the R script ‘DESeq2.R’.

...

Using R Studio, create a Text File and paste in the contents of this script.

Save it as launch_R.pbs in H:\workshop\small_RNAseq\DESeq2 (Same folder as DESeq2.R (Remember, H: is pointed at your HPC Home Folder.

...