Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Overview

  • Create a metadata “samplesheet.csv” for small RNAseq datasets.

  • Learn to use a “nextflow.config” file in the working directory to override Nextflow parameters (e.g., specify where to find the pipeline assets).

  • Learn how to prepare a PBS script to run the expression profiling of small RNAs against the reference miRBase database annotated microRNAs.

Preparing the pipeline inputs

The pipeline requires preparing at least 2 files:

...

PBS Pro script (launch_nf-core_RNAseq_QC.pbs) with instructions to run the pipeline

...

Nextflow.config - revision 2.3.1 of the nf-core/smrnaseq pipeline may not be able to identify the location of reference adapter sequences, thus, we will use a local nextflow.config file to tell Nextflow where to find the reference adapters necessary to trim the raw small_RNA-Seq data

A. Create the metadata file (samplesheet.csv):

Change to the data folder directory:

Code Block
cd $HOME/workshop/2024-2/session6_smallRNAseq/data/human_disease

Copy the bash script to the working folder

Code Block
cp /work/training/2024/smallRNAseq/scripts/create_nf-core_smallRNAseq_samplesheet.sh $HOME/workshop/2024-2/session6_smallRNAseq/data/human_disease
  • Note: you could replace ‘$HOME/workshop/data’ with “.” A dot indicates ‘current directory’ and will copy the file to the directory where you are currently located

View the content of the script:

Code Block
cat create_nf-core_smallRNAseq_samplesheet.sh

...

NOTE: modify ‘read1_extension’ as appropriate for your data. For example: _1.fastq.gz or _R1_001.fastq.gz or _R1.fq.gz , etc

Let’s generate the metadata file by running the following command:

Code Block
sh create_nf-core_smallRNAseq_samplesheet.sh $HOME/workshop/2024-2/session6_smallRNAseq/data/human_disease

Check the newly created samplesheet.csv file:

Code Block
cat samplesheet.csv

sample,fastq_1

ERR409878,/work/training/2024/smallRNAseq/data/human_disease/ERR409878.fastq.gz

ERR409879,/work/training/2024/smallRNAseq/data/human_disease/ERR409879.fastq.gz

ERR409880,/work/training/2024/smallRNAseq/data/human_disease/ERR409880.fastq.gz

ERR409881,/work/training/2024/smallRNAseq/data/human_disease/ERR409881.fastq.gz

ERR409882,/work/training/2024/smallRNAseq/data/human_disease/ERR409882.fastq.gz

ERR409883,/work/training/2024/smallRNAseq/data/human_disease/ERR409883.fastq.gz

ERR409884,/work/training/2024/smallRNAseq/data/human_disease/ERR409884.fastq.gz

ERR409885,/work/training/2024/smallRNAseq/data/human_disease/ERR409885.fastq.gz

ERR409886,/work/training/2024/smallRNAseq/data/human_disease/ERR409886.fastq.gz

ERR409887,/work/training/2024/smallRNAseq/data/human_disease/ERR409887.fastq.gz

ERR409888,/work/training/2024/smallRNAseq/data/human_disease/ERR409888.fastq.gz

ERR409889,/work/training/2024/smallRNAseq/data/human_disease/ERR409889.fastq.gz

ERR409890,/work/training/2024/smallRNAseq/data/human_disease/ERR409890.fastq.gz

ERR409891,/work/training/2024/smallRNAseq/data/human_disease/ERR409891.fastq.gz

ERR409892,/work/training/2024/smallRNAseq/data/human_disease/ERR409892.fastq.gz

ERR409893,/work/training/2024/smallRNAseq/data/human_disease/ERR409893.fastq.gz

ERR409894,/work/training/2024/smallRNAseq/data/human_disease/ERR409894.fastq.gz

ERR409895,/work/training/2024/smallRNAseq/data/human_disease/ERR409895.fastq.gz

ERR409896,/work/training/2024/smallRNAseq/data/human_disease/ERR409896.fastq.gz

ERR409897,/work/training/2024/smallRNAseq/data/human_disease/ERR409897.fastq.gz

ERR409898,/work/training/2024/smallRNAseq/data/human_disease/ERR409898.fastq.gz

ERR409899,/work/training/2024/smallRNAseq/data/human_disease/ERR409899.fastq.gz

ERR409900,/work/training/2024/smallRNAseq/data/human_disease/ERR409900.fastq.gz

B. Prepare PBS Pro script to run the nf-core/smrnaseq pipeline

Copy the PBS Pro script for running the full small RNAseq pipeline (launch_nf-core_smallRNAseq_miRBase.pbs)

Copy and paste the code below to the terminal:

Code Block
cp $HOME/workshop/2024-2/session6_smallRNAseq/data/human_disease/samplesheet.csv $HOME/workshop/2024-2/session6_smallRNAseq/runs/run1_human_miRBase
cp /work/training/2024/smallRNAseq/scripts/launch_nf-core_smallRNAseq_miRBase.pbs $HOME/workshop/2024-2/session6_smallRNAseq/runs/run1_human_miRBase
cp /work/training/2024/smallRNAseq/scripts/nextflow.config $HOME/workshop/2024-2/session6_smallRNAseq/runs/run1_human_miRBase
cd $HOME/workshop/2024-2/session6_smallRNAseq/runs/run1_human_miRBase
  • Line 1: Copy the samplesheet.csv file to the working directory

  • Line 2: Copy the launch_nf-core_smallRNAseq_human.pbs submission script to the working directory

  • Line 3: Copy the nextflow.config file from shared folder to my working directory.

  • Line 4: move to the working directory

View the content of the launch_nf-core_RNAseq_QC.pbs script:

Code Block
cat launch_nf-core_smallRNAseq_miRBase.pbs 

#!/bin/bash -l

#PBS -N nfsmallRNAseq

#PBS -l select=1:ncpus=2:mem=4gb

#PBS -l walltime=24:00:00

#run the tasks in the current working directory

cd $PBS_O_WORKDIR

#load java and assign up to 4GB RAM memory for nextflow to use

module load java

export NXF_OPTS='-Xms1g -Xmx4g'

 

#run the small RNAseq pipeline

nextflow run nf-core/smrnaseq -r 2.3.1 \

        -profile singularity \

        --outdir results \

        --input samplesheet.csv \

        --genome GRCh38-local \

        --mirtrace_species hsa \

        --three_prime_adapter 'TGGAATTCTCGGGTGCCAAGG' \

        --fastp_min_length 18 \

        --fastp_max_length 30 \

        --hairpin /work/training/smallRNAseq/data/mirbase/hairpin.fa \

        --mature /work/training/smallRNAseq/data/mirbase/mature.fa \

        --mirna_gtf /work/training/smallRNAseq/data/mirbase/hsa.gff3 \

        -resume

Submit the job to the HPC cluster:

Code Block
qsub launch_nf-core_smallRNAseq_miRBase.pbs

Monitor the progress:

Code Block
qjobs

The job will take several hours to run, hence we will use precomputed results for the statistical analysis in the next section.

STEP1: copy metadata (samplesheet.csv) into the working folder (run2_RNAseq)

Code Block
cp $HOME/workshop/2024-2/session4_RNAseq/data/mouse/samplesheet.csv $HOME/workshop/2024-2/session4_RNAseq/runs/run2_RNAseq
cd $HOME/workshop/2024-2/session4_RNAseq/runs/run2_RNAseq
  • Line 1: Copy the samplesheet.csv file to the working directory

  • Line 2: move to the working directory

Copy the PBS Pro script to run the nf-core/rnaseq pipeline:

Code Block
cp $HOME/workshop/2024-2/session4_RNAseq/scripts/launch_nf-core_RNAseq_pipeline.pbs $HOME/workshop/2024-2/session4_RNAseq/runs/run2_RNAseq

NOTE: if you had issues with the above lines. Alternatively, run the following code to copy the sample sheet.csv and launch files:

Code Block
cp /work/training/2024/rnaseq/data/samplesheet.csv $HOME/workshop/2024-2/session4_RNAseq/runs/run2_RNAseq
cp /work/training/2024/rnaseq/scripts/launch_nf-core_RNAseq_pipeline.pbs
cd $HOME/workshop/2024-2/session4_RNAseq/runs/run2_RNAseq

Adjusting the Trim Galore (read trimming) options

Print the content of the launch_RNAseq.pbs script:

Code Block
cat launch_nf-core_RNAseq_pipeline.pbs

Submitting the job

Code Block
qsub launch_nf-core_RNAseq_pipeline.pbs

Monitoring the Run

Code Block
qjobs

Outputs

The pipeline will produce two folders, one called “work,” where all the processing is done, and another called “results,” where we can find the pipeline's outputs. The content of the results folder is as follows:

The quantification of the gene and transcript expressions can be found in the ‘star_salmon’ directory.

Code Block
cd results/star_salmon

The following feature count tables are generated:

Copying data for hands-on exercises

Before we start using the HPC, let’s start an interactive session:

Code Block
qsub -I -S /bin/bash -l walltime=10:00:00 -l select=1:ncpus=1:mem=4gb

Get a copy of the scripts to be used in this module

Use the terminal to log into the HPC and create a /RNAseq/ folder to run the nf-core/rnaseq pipeline. For example:

Code Block
mkdir -p $HOME/workshop/small_RNAseq/scripts
cp /work/training/smallRNAseq/scripts/* $HOME/workshop/small_RNAseq/scripts/
ls -l $HOME/workshop/small_RNAseq/scripts/
  • Line 1: The -p indicates create 'parental directories as required. Thus the line 1 command creates both /workshop/ and the subfolder /workshop/scripts/

  • Line 2: Copies all files from /work/datasets/workshop/scripts/ as noted by an asterisk to the newly created folder $HOME/workshop/scripts/

  • Line 3: List the files in the script folder

Copy multiple subdirectories and files using rsync

Code Block
mkdir -p $HOME/workshop/small_RNAseq/data/
rsync -rv /work/training/smallRNAseq/data/ $HOME/workshop/small_RNAseq/data/
  • Line 1: The first command creates the folder /scripts/

  • Line 2: rsync copies all subfolders and files from the specified source folder to the selected destination folder. The -r = recursively will copy directories and files; -v = verbose messages of the transfer of files

Create a folder for running the nf-core small RNA-seq pipeline

Let’s create a “runs” folder to run the nf-core/rnaseq pipeline:

Code Block
mkdir -p $HOME/workshop/small_RNAseq
mkdir $HOME/workshop/small_RNAseq/run1_test
mkdir $HOME/workshop/small_RNAseq/run2_smallRNAseq_human
cd $HOME/workshop/small_RNAseq/
  • Lines 1-4: create sub-folders for each exercise

  • Line 5: change the directory to the folder “small_RNAseq”

Exercise 1: Running a test with nf-core sample data

First, let’s assess the execution of the nf-core/rnaseq pipeline by running a test using sample data.

Copy the launch_nf-core_smallRNAseq_test.pbs to the working directory

Code Block
cd $HOME/workshop/small_RNAseq/run1_test
cp $HOME/workshop/small_RNAseq/scripts/launch_nf-core_smallRNAseq_test.pbs .

View the content of the script as follows:

Code Block
cat launch_nf-core_smallRNAseq_test.pbs

#!/bin/bash -l

#PBS -N nfsmrnaseq

#PBS -l select=1:ncpus=2:mem=4gb

#PBS -l walltime=24:00:00

#work on current directory (folder)

cd $PBS_O_WORKDIR

#load java and set up memory settings to run nextflow

module load java

export NXF_OPTS='-Xms1g -Xmx4g'

# run the test

nextflow run nf-core/smrnaseq -profile test,singularity --outdir results -r 2.1.0

where:

  • nextflow command: nextflow run

  • pipeline name: nf-core/smrnaseq

  • pipeline version: -r 2.1.0

  • container type and sample data: -profile test,singularity

  • output directory: --outdir results

Submitting the job

Now we can submit the small RNAseq test job to the HPC scheduler:

Code Block
qsub launch_nf-core_smallRNAseq_test.pbs

Monitoring the Run

Code Block
qjobs

Exercise 2: Running the small RNA pipeline using public human data

The pipeline requires preparing at least 2 files:

...

PBS Pro script (launch_nf-core_smallRNAseq_human.pbs) with instructions to run the pipeline

Create the metadata file (samplesheet.csv):

Change to the data folder directory:

Code Block
cd $HOME/workshop/small_RNAseq/data/human
pwd

Copy the bash script to the working folder

Code Block
cp /work/training/smallRNAseq/scripts/create_nf-core_smallRNAseq_samplesheet.sh $HOME/workshop/small_RNAseq/data/human
  • Note: you could replace ‘$HOME/workshop/data’ with “.” A dot indicates ‘current directory’ and will copy the file to the directory where you are currently located

View the content of the script:

Code Block
cat create_nf-core_smallRNAseq_samplesheet.sh

#!/bin/bash -l

#User defined variables.

##########################################################

DIR='$HOME/workshop/small_RNAseq/data/human'

INDEX='samplesheet.csv'

##########################################################

#load python module

module load python/3.10.8-gcccore-12.2.0

#fetch the script to create the sample metadata table

wget -L https://raw.githubusercontent.com/nf-core/rnaseq/master/bin/fastq_dir_to_samplesheet.py

chmod +x fastq_dir_to_samplesheet.py

#generate initial sample metadata file

./fastq_dir_to_samplesheet.py  $DIR index.csv \

        --strandedness auto \

        --read1_extension .fastq.gz

#format index file

cat index.csv | awk -F "," '{print $1 "," $2}' > ${INDEX}

#Remove intermediate files:

rm index.csv fastq_dir_to_samplesheet.py

Let’s generate the metadata file by running the following command:

Code Block
sh create_RNAseq_samplesheet.sh

Check the newly created samplesheet.csv file:

Code Block
ls -l
cat samplesheet.cvs

sample,fastq_1

SRR20753704,/work/training/smallRNAseq/data/SRR20753704.fastq.gz

SRR20753705,/work/training/smallRNAseq/data/SRR20753705.fastq.gz

SRR20753706,/work/training/smallRNAseq/data/SRR20753706.fastq.gz

SRR20753707,/work/training/smallRNAseq/data/SRR20753707.fastq.gz

SRR20753708,/work/training/smallRNAseq/data/SRR20753708.fastq.gz

SRR20753709,/work/training/smallRNAseq/data/SRR20753709.fastq.gz

SRR20753716,/work/training/smallRNAseq/data/SRR20753716.fastq.gz

SRR20753717,/work/training/smallRNAseq/data/SRR20753717.fastq.gz

SRR20753718,/work/training/smallRNAseq/data/SRR20753718.fastq.gz

SRR20753719,/work/training/smallRNAseq/data/SRR20753719.fastq.gz

SRR20753720,/work/training/smallRNAseq/data/SRR20753720.fastq.gz

SRR20753721,/work/training/smallRNAseq/data/SRR20753721.fastq.gz

 

Copy the PBS Pro script for running the full small RNAseq pipeline (launch_nf-core_smallRNAseq_human.pbs)

Copy and paste the code below to the terminal:

Code Block
cp $HOME/workshop/small_RNAseq/data/human/samplesheet.csv $HOME/workshop/small_RNAseq/run2_smallRNAseq_human
cp $HOME/workshop/small_RNAseq/scripts/launch_nf-core_smallRNAseq_human.pbs $HOME/workshop/small_RNAseq/run2_smallRNAseq_human
cd $HOME/workshop/small_RNAseq/run2_smallRNAseq_human
  • Line 1: Copy the samplesheet.csv file to the working directory

  • Line 2: copy the launch_nf-core_smallRNAseq_human.pbs submission script to the working directory

  • Line 3: move to the working directory

View the content of the launch_nf-core_RNAseq_QC.pbs script:

Code Block
cat launch_nf-core_smallRNAseq_human.pbs

#!/bin/bash -l

#PBS -N nfsmallRNAseq

#PBS -l select=1:ncpus=2:mem=4gb

#PBS -l walltime=24:00:00

#PBS -m abe

 

#run the tasks in the current working directory

cd $PBS_O_WORKDIR

#load java and assign up to 4GB RAM memory for nextflow to use

module load java

export NXF_OPTS='-Xms1g -Xmx4g'

 

#run the small RNAseq pipeline

nextflow run nf-core/smrnaseq -r 2.1.0 \

        -profile singularity \

        --outdir results \

        --input samplesheet.csv \

        --genome GRCh38-local \

        --mirtrace_species hsa \

        --three_prime_adapter 'TGGAATTCTCGGGTGCCAAGG' \

        --fastp_min_length 18 \

        --fastp_max_length 30 \

        --hairpin /work/training/smallRNAseq/data/mirbase/hairpin.fa \

        --mature /work/training/smallRNAseq/data/mirbase/mature.fa \

        --mirna_gtf /work/training/smallRNAseq/data/mirbase/hsa.gff3 \

        -resume

Submit the job to the HPC cluster:

Code Block
qsub launch_nf-core_smallRNAseq_human.pbs

Monitor the progress:

Code Block
qjobs

The job will take several hours to run, hence we will use precomputed results for the statistical analysis in the next section.

Precomputed results:

We ran the small RNA seq samples and the results can be found at:

Code Block
/work/training/smallRNAseq/runs/run2_smallRNAseq_human

The results of the miRNA profiling can be found in the folder call “edger”:

Code Block
results/
├── edger
├── fastp
├── fastqc
├── genome
├── index
├── mirdeep
├── mirdeep2
├── mirtop
├── mirtrace
├── multiqc
├── pipeline_info
├── samtools
└── unmapped

inside the “edger” folder find the “mature_counts.csv” file:

Code Block
hairpin_counts.csv
hairpin_CPM_heatmap.pdf
hairpin_edgeR_MDS_distance_matrix.txt
hairpin_edgeR_MDS_plot_coordinates.txt
hairpin_edgeR_MDS_plot.pdf
hairpin_log2CPM_sample_distances_dendrogram.pdf
hairpin_log2CPM_sample_distances_heatmap.pdf
hairpin_log2CPM_sample_distances.txt
hairpin_logtpm.csv
hairpin_logtpm.txt
hairpin_normalized_CPM.txt
hairpin_unmapped_read_counts.txt
mature_counts.csv       <-- we will use this file for the statistical analysis in the next section
mature_counts.txt
mature_CPM_heatmap.pdf
mature_edgeR_MDS_distance_matrix.txt
mature_edgeR_MDS_plot_coordinates.txt
mature_edgeR_MDS_plot.pdf
mature_log2CPM_sample_distances_dendrogram.pdf
mature_log2CPM_sample_distances_heatmap.pdf
mature_log2CPM_sample_distances.txt
mature_logtpm.csv
mature_logtpm.txt
mature_normalized_CPM.txt
mature_unmapped_read_counts.txt

Note: the “mature_counts.csv” needs to be transposed prior running the statistical analysis. This can be done either user the R script or using a script called “transpose_csv.py”.

Let’s initially create a “DESeq2” folder and copy the files needed for the statistical analysis:

Code Block
mkdir -p $HOME/workshop/small_RNAseq/DESeq2
cp $HOME/workshop/small_RNAseq/scripts/transpose_csv.py $HOME/workshop/small_RNAseq/DESeq2
cp $HOME/workshop/small_RNAseq/data/human/metadata_microRNA.txt $HOME/workshop/small_RNAseq/DESeq2
cp /work/training/smallRNAseq/runs/deprecated/run2_smallRNAseq_human/results/edger/mature_counts.csv $HOME/workshop/small_RNAseq/DESeq2
cd $HOME/workshop/small_RNAseq/DESeq2

To transpose the initial “mature_counst.csv” file do the following:

...

Overview

  • Create a metadata “samplesheet.csv” for small RNAseq datasets.

  • Learn to use a “nextflow.config” file in the working directory to override Nextflow parameters (e.g., specify where to find the pipeline assets).

  • Learn how to prepare a PBS script to run the expression profiling of small RNAs against the reference miRBase database annotated microRNAs.

Preparing the pipeline inputs

The pipeline requires preparing at least 2 files:

  • Metadata file (samplesheet.csv) thatspecifies the name of the samples, location of FASTQ files ('Read 1' and ‘Read 2’), and strandedness (forward, reverse, or auto. Note: auto is used when the strandedness of the data is unknown)

  • PBS Pro script (launch_nf-core_RNAseq_QC.pbs) with instructions to run the pipeline

  • Nextflow.config - revision 2.3.1 of the nf-core/smrnaseq pipeline may not be able to identify the location of reference adapter sequences, thus, we will use a local nextflow.config file to tell Nextflow where to find the reference adapters necessary to trim the raw small_RNA-Seq data

A. Create the metadata file (samplesheet.csv):

Change to the data folder directory:

Code Block
cd $HOME/workshop/2024-2/session6_smallRNAseq/data/human_disease

Copy the bash script to the working folder

Code Block
cp /work/training/2024/smallRNAseq/scripts/create_nf-core_smallRNAseq_samplesheet.sh $HOME/workshop/2024-2/session6_smallRNAseq/data/human_disease
  • Note: you could replace ‘$HOME/workshop/data’ with “.” A dot indicates ‘current directory’ and will copy the file to the directory where you are currently located

View the content of the script:

Code Block
cat create_nf-core_smallRNAseq_samplesheet.sh

...

NOTE: modify ‘read1_extension’ as appropriate for your data. For example: _1.fastq.gz or _R1_001.fastq.gz or _R1.fq.gz , etc

Let’s generate the metadata file by running the following command:

Code Block
sh create_nf-core_smallRNAseq_samplesheet.sh $HOME/workshop/2024-2/session6_smallRNAseq/data/human_disease

Check the newly created samplesheet.csv file:

Code Block
cat samplesheet.csv

sample,fastq_1

ERR409878,/work/training/2024/smallRNAseq/data/human_disease/ERR409878.fastq.gz

ERR409879,/work/training/2024/smallRNAseq/data/human_disease/ERR409879.fastq.gz

ERR409880,/work/training/2024/smallRNAseq/data/human_disease/ERR409880.fastq.gz

ERR409881,/work/training/2024/smallRNAseq/data/human_disease/ERR409881.fastq.gz

ERR409882,/work/training/2024/smallRNAseq/data/human_disease/ERR409882.fastq.gz

ERR409883,/work/training/2024/smallRNAseq/data/human_disease/ERR409883.fastq.gz

ERR409884,/work/training/2024/smallRNAseq/data/human_disease/ERR409884.fastq.gz

ERR409885,/work/training/2024/smallRNAseq/data/human_disease/ERR409885.fastq.gz

ERR409886,/work/training/2024/smallRNAseq/data/human_disease/ERR409886.fastq.gz

ERR409887,/work/training/2024/smallRNAseq/data/human_disease/ERR409887.fastq.gz

ERR409888,/work/training/2024/smallRNAseq/data/human_disease/ERR409888.fastq.gz

ERR409889,/work/training/2024/smallRNAseq/data/human_disease/ERR409889.fastq.gz

ERR409890,/work/training/2024/smallRNAseq/data/human_disease/ERR409890.fastq.gz

ERR409891,/work/training/2024/smallRNAseq/data/human_disease/ERR409891.fastq.gz

ERR409892,/work/training/2024/smallRNAseq/data/human_disease/ERR409892.fastq.gz

ERR409893,/work/training/2024/smallRNAseq/data/human_disease/ERR409893.fastq.gz

ERR409894,/work/training/2024/smallRNAseq/data/human_disease/ERR409894.fastq.gz

ERR409895,/work/training/2024/smallRNAseq/data/human_disease/ERR409895.fastq.gz

ERR409896,/work/training/2024/smallRNAseq/data/human_disease/ERR409896.fastq.gz

ERR409897,/work/training/2024/smallRNAseq/data/human_disease/ERR409897.fastq.gz

ERR409898,/work/training/2024/smallRNAseq/data/human_disease/ERR409898.fastq.gz

ERR409899,/work/training/2024/smallRNAseq/data/human_disease/ERR409899.fastq.gz

ERR409900,/work/training/2024/smallRNAseq/data/human_disease/ERR409900.fastq.gz

B. Prepare PBS Pro script to run the nf-core/smrnaseq pipeline

Copy the PBS Pro script for running the full small RNAseq pipeline (launch_nf-core_smallRNAseq_miRBase.pbs)

Copy and paste the code below to the terminal:

Code Block
cp $HOME/workshop/2024-2/session6_smallRNAseq/data/human_disease/samplesheet.csv $HOME/workshop/2024-2/session6_smallRNAseq/runs/run1_human_miRBase
cp /work/training/2024/smallRNAseq/scripts/launch_nf-core_smallRNAseq_miRBase.pbs $HOME/workshop/2024-2/session6_smallRNAseq/runs/run1_human_miRBase
cp /work/training/2024/smallRNAseq/scripts/nextflow.config $HOME/workshop/2024-2/session6_smallRNAseq/runs/run1_human_miRBase
cd $HOME/workshop/2024-2/session6_smallRNAseq/runs/run1_human_miRBase
  • Line 1: Copy the samplesheet.csv file to the working directory

  • Line 2: Copy the launch_nf-core_smallRNAseq_human.pbs submission script to the working directory

  • Line 3: Copy the nextflow.config file from shared folder to my working directory.

  • Line 4: move to the working directory

View the content of the launch_nf-core_RNAseq_QC.pbs script:

Code Block
cat launch_nf-core_smallRNAseq_miRBase.pbs 

...

TIP: when running the nf-core/smrnaseq pipeline (release 2.3.1) the pipeline is not able to find the location of the reference adapter sequences for trimming of the raw small RNAseq pipeline, so we need to specify where to find the folder where the adapter sequences file is located. To do this, we prepare a “nextflow.config” file (see below). This file should be already in your working directory. Print the content as follows:

Code Block
cat nextflow.config
Code Block
singularity {
    runOptions = '-B $HOME/.nextflow/assets/nf-core/smrnaseq/assets'
}

Note: if a config file is placed in the working folder it can override parameters define by the global ~/.nextflow/config file or the config file define as part of the pipeline.

Submit the job to the HPC cluster:

Code Block
qsub launch_nf-core_smallRNAseq_miRBase.pbs

Monitor the progress:

Code Block
qjobs

The job will take several hours to run, hence we will use precomputed results for the statistical analysis in the next section.

Outputs

The pipeline will produce two folders, one called “work,” where all the processing is done, and another called “results,” where we can find the pipeline's outputs. The content of the results folder is as follows:

Code Block
results/
├── bowtie_index
│   ├── mirna_hairpin
│   └── mirna_mature
├── fastp
│   └── on_raw
├── fastqc
│   ├── raw
│   └── trimmed
├── mirna_quant
│   ├── bam
│   ├── edger_qc <-- Expression mature miRNA (mature_counts.csv) and precursor-miRNAs (haripin_counts.csv) counts can be found in this subfolder. 
│   ├── mirtop
│   ├── reference
│   └── seqcluster
├── mirtrace
│   ├── mirtrace-report.html
│   ├── mirtrace-results.json
│   ├── mirtrace-stats-contamination_basic.tsv
│   ├── mirtrace-stats-contamination_detailed.tsv
│   ├── mirtrace-stats-length.tsv
│   ├── mirtrace-stats-mirna-complexity.tsv
│   ├── mirtrace-stats-phred.tsv
│   ├── mirtrace-stats-qcstatus.tsv
│   ├── mirtrace-stats-rnatype.tsv
│   ├── qc_passed_reads.all.collapsed
│   └── qc_passed_reads.rnatype_unknown.collapsed
├── multiqc
│   ├── multiqc_data
│   ├── multiqc_plots
│   └── multiqc_report.html
└── pipeline_info
    ├── execution_report_2024-08-20_16-55-53.html
    ├── execution_timeline_2024-08-20_16-55-53.html
    ├── execution_trace_2024-08-20_16-55-53.txt
    ├── nf_core_smrnaseq_software_mqc_versions.yml
    ├── params_2024-08-20_16-56-04.json
    └── pipeline_dag_2024-08-20_16-55-53.html

The quantification of the mature miRNA and hairpin expressions can be found in the /results/mirna_quant/edger_qc directory.

Code Block
cd /results/mirna_quant/edger_qc
Code Block
hairpin_counts.csv
hairpin_CPM_heatmap.pdf
hairpin_edgeR_MDS_distance_matrix.txt
hairpin_edgeR_MDS_plot_coordinates.txt
hairpin_edgeR_MDS_plot.pdf
hairpin_log2CPM_sample_distances_dendrogram.pdf
hairpin_log2CPM_sample_distances_heatmap.pdf
hairpin_log2CPM_sample_distances.txt
hairpin_logtpm.csv
hairpin_logtpm.txt
hairpin_normalized_CPM.txt
hairpin_unmapped_read_counts.txt
mature_counts.csv <-- Expression mature miRNAs. This file will be used to identify differentially expressed miRNAs (Session 7)
mature_CPM_heatmap.pdf
mature_edgeR_MDS_distance_matrix.txt
mature_edgeR_MDS_plot_coordinates.txt
mature_edgeR_MDS_plot.pdf
mature_log2CPM_sample_distances_dendrogram.pdf
mature_log2CPM_sample_distances_heatmap.pdf
mature_log2CPM_sample_distances.txt
mature_logtpm.csv
mature_logtpm.txt
mature_normalized_CPM.txt
mature_unmapped_read_counts.txt

Let’s inspect the mature.csv file. Let’s use the ‘cat’ command to print it on the screen:

Code Block
cat mature_counts.csv
Code Block
"","hsa-let-7a-5p","hsa-let-7a-3p","hsa-let-7a-2-3p","hsa-let-7b-5p","hsa-let-7b-3p","hsa-let-7c-5p","hsa-let-7c-3p","hsa-let-7d-5p","hsa-let-7d-3p","hsa-
"ERR409882",364608,341,16,59417,1998,68342,44,14861,3790,29486,207,211184,228,1462,7002,2,49664,1,1091,174,326,43,6,468,7,1482,1615,9,17256,534,573,6526,0
"ERR409879",305651,184,6,52115,1476,58425,30,12397,2659,23604,201,198778,151,1013,5486,1,48381,4,945,202,194,40,7,368,3,1097,1317,6,12662,561,372,3693,2,1
"ERR409881",712880,165,9,83857,2335,162724,83,30556,4503,68044,385,456864,348,1818,9893,0,111712,5,1495,259,174,48,6,318,2,1466,2220,4,17865,466,551,10360
"ERR409884",182178,111,3,27892,913,39989,21,7751,1886,13902,159,127386,132,743,3651,3,40311,0,629,117,97,21,11,305,2,1147,902,2,8313,368,242,2276,0,1146,4
"ERR409889",568269,257,13,92339,2239,100021,45,20819,3511,44172,207,276474,259,1376,12407,5,83908,5,1971,467,403,70,30,1082,7,3082,3172,14,24112,819,421,6
"ERR409894",314053,137,9,44708,1220,74145,74,12313,2827,25295,196,196866,158,896,4681,3,43677,1,806,138,131,22,7,296,3,1181,1169,5,11145,611,360,3742,5,12
"ERR409887",178201,48,4,25678,733,41506,27,7833,1613,15724,121,123391,98,497,3288,0,39434,1,445,97,65,15,3,150,2,539,461,3,5837,186,161,2958,2,847,3,1544,
"ERR409880",318121,136,3,46347,1260,65606,39,11095,2269,24585,200,191072,194,1118,5599,2,67420,3,1242,155,168,22,2,505,6,1708,1836,3,11293,482,359,3652,1,
"ERR409890",332579,105,7,40131,955,73537,38,13528,2029,31807,158,207846,175,962,5146,0,42402,0,659,149,102,20,4,219,3,964,1086,4,11957,423,385,6017,4,1556

Note: the “mature_counts.csv” needs to be transposed prior running the statistical analysis. This can be done either user the R script or using a script called “transpose_csv.py”.

Let’s copy the transpose_csv.py script to the working folder:

Code Block
cp /work/training/2024/smallRNAseq/scripts/transpose_csv.py .

The check how to use the script do the following:

Code Block
python transpose_csv.py --help
Code Block
usage: transpose_csv.py [-h] --input INPUT --output OUTPUT

Transpose a CSV file and generate a tab-delimited TXT file.

optional arguments:
  -h, --help       show this help message and exit
  --input INPUT    Input CSV file containing mature miRNA counts.
  --output OUTPUT  Output tab-delimited TXT file.

To transpose the initial “mature_counst.csv” file do the following:

Code Block
python transpose_csv.py --input mature_counts.csv --out mature_counts.txt

Let’s now print the transposed mature counts table:

Code Block
cat mature_counts.txt
Code Block
microRNA	ERR409882	ERR409879	ERR409881	ERR409884	ERR409889	ERR409894	ERR409887	ERR409880	ERR409890ERR409878	ERR409885	ERR409886	ERR409891	ERR409899	ERR409893	ERR409896	ERR409895	ERR409888	ERR409892ERR409898	ERR409897	ERR409883	ERR409900
hsa-let-7a-5p	364608	305651	712880	182178	568269	314053	178201	318121	332579	432950	546049	208284	351586	289926	417695	421395	531417	320229	249354	186910	242774	287209	1258946
hsa-let-7a-3p	341	184	165	111	257	137	48	136	105	288	205	85	100	47	114	102	106	167	88	85	101	262	439
hsa-let-7a-2-3p	16	6	9	3	13	9	4	3	7	17	12	4	7	2	5	9	9	123	13
hsa-let-7b-5p	59417	52115	83857	27892	92339	44708	25678	46347	40131	61357	59795	27498	37230	43602	62467	45870	75630	56350	28694	25897	28396	100340	174494
hsa-let-7b-3p	1998	1476	2335	913	2239	1220	733	1260	955	2535	1771	894	1118	999	1404	1294	1670	2293	1343	764	793	1180	4682
hsa-let-7c-5p	68342	58425	162724	39989	100021	74145	41506	65606	73537	69994	128501	47783	75857	66085	86159	84951	118227	53364	62697	41282	46267	117557	271459
hsa-let-7c-3p	44	30	83	21	45	74	27	39	38	41	70	33	34	38	54	55	77	290	23	28	108	155
hsa-let-7d-5p	14861	12397	30556	7751	20819	12313	7833	11095	13528	15326	22746	7913	14405	14122	19831	17317	27318	10690	9305	8698	10109	9076	55680
hsa-let-7d-3p	3790	2659	4503	1886	3511	2827	1613	2269	2029	3311	3875	1999	2370	2978	3305	2460	5372	3015	2857	1747	1800	3443	8451
hsa-let-7e-5p	29486	23604	68044	13902	44172	25295	15724	24585	31807	29935	46550	18482	29525	26036	39737	33998	51987	20686	20070	15501	19495	50124	114605
hsa-let-7e-3p	207	201	385	159	207	196	121	200	158	229	393	195	198	150	248	154	285	199	185	141	168	512	686
hsa-let-7f-5p	211184	198778	456864	127386	276474	196866	123391	191072	207846	250192	376484	129113	231751	223462	288225	276677	425085	154352	133796	126870	155630	150954	802971
hsa-let-7f-1-3p	228	151	348	132	259	158	98	194	175	233	261	116	156	118	187	174	243	159	192	124	103	226	560
hsa-let-7f-2-3p	1462	1013	1818	743	1376	896	497	1118	962	1387	1999	783	1062	701	1100	1127	1215	642	1024	578	799	1847	3221
hsa-miR-15a-5p	7002	5486	9893	3651	12407	4681	3288	5599	5146	9213	7931	3166	4055	5151	5804	5614	12518	4751	4126	3567	3704	4998	16596
hsa-miR-15a-3p	2	1	0	3	5	3	0	2	0	6	0	0	1	0	1	0	1	09
hsa-miR-16-5p	49664	48381	111712	40311	83908	43677	39434	67420	42402	73300	73543	36662	45433	50446	54609	55368	106992	56124	49628	34812	35133	42637	177920
hsa-miR-16-1-3p	1	4	5	0	5	1	1	3	0	3	1	2	1	1	3	1	3	011	15
hsa-miR-17-5p	1091	945	1495	629	1971	806	445	1242	659	1301	1079	662	666	548	805	681	822	934	590	537	561	1311	2793
hsa-miR-17-3p	174	202	259	117	467	138	97	155	149	247	216	153	110	157	161	134	311	132	113	136	133	380	420
hsa-miR-18a-5p	326	194	174	97	403	131	65	168	102	327	137	104	104	76	115	98	101	222	69	85	74	649	455
hsa-miR-18a-3p	43	40	48	21	70	22	15	22	20	38	44	23	18	25	40	22	62	421	24	34	17	86
hsa-miR-19a-5p	6	7	6	11	30	7	3	2	4	12	3	4	7	2	4	2	6	010
hsa-miR-19a-3p	468	368	318	305	1082	296	150	505	219	542	473	399	298	232	247	237	307	358	346	253	259	817	772
hsa-miR-19b-1-5p	7	3	2	2	7	3	2	6	3	5	2	3	1	0	1	1	113	10
hsa-miR-19b-3p	1482	1097	1466	1147	3082	1181	539	1708	964	1773	1656	1216	1054	766	884	878	1046	1284	1312	788	912	5402	3276
hsa-miR-20a-5p	1615	1317	2220	902	3172	1169	461	1836	1086	2163	1632	884	1199	855	1156	1097	1036	1176	649	673	869	1269	4785
hsa-miR-20a-3p	9	6	4	2	14	5	3	3	4	6	6	7	3	1	2	5	7	96
hsa-miR-21-5p	17256	12662	17865	8313	24112	11145	5837	11293	11957	21365	20088	14302	13403	11423	12353	15500	17497	10842	8743	7668	9397	21234	33370
hsa-miR-21-3p	534	561	466	368	819	611	186	482	423	803	398	595	296	540	464	376	510	470	306	261	267	490	1435
hsa-miR-22-5p	573	372	551	242	421	360	161	359	385	483	562	219	293	160	357	346	231	406	363	229	229	998	1543
hsa-miR-22-3p	6526	3693	10360	2276	6428	3742	2958	3652	6017	4830	7321	4210	4172	4826	6177	5011	9282	3041	5602	3335	2297	10176	16041
hsa-miR-23a-5p	0	2	6	0	7	5	2	1	4	1	1	2	1	9	4	1	7	012	10
hsa-miR-23a-3p	1785	1388	3082	1146	2153	1286	847	2136	1556	1966	2835	2553	1797	1339	1637	1648	2208	1208	2269	894	1054	4353	5098
hsa-miR-24-1-5p	24	15	11	4	23	2	3	11	7	25	14	6	7	2	9	6	8	111	7	2	60	36
hsa-miR-24-3p	5206	4549	6172	2715	6773	3471	1544	4085	3320	5937	4608	3329	2943	2039	3436	2830	2510	3465	3630	3148	1886	23971	13975
hsa-miR-24-2-5p	1	0	9	3	2	3	2	3	1	3	7	6	3	3	11	2	1	114	7
hsa-miR-25-5p	6	2	5	1	6	5	2	1	1	4	1	2	0	2	1	1	4	07
hsa-miR-25-3p	10678	8254	14145	4943	16696	7599	4914	9122	7057	11815	10600	5989	5872	7239	8869	7667	14988	8566	5127	3997	5421	12219	27213
hsa-miR-26a-5p	942607	674879	1353549	460648	1173421	1104148	481683	1026301	905383	1514327	1284627	1028272	954763	588897	719907	838025	918553	861348	1194165	539098	578238	1940621	2160732
hsa-miR-26a-1-3p	33	13	38	11	23	15	5	13	19	12	19	21	11	7	26	15	114	4	11	92	67
hsa-miR-26b-5p	8873	8470	12404	6004	16179	7036	4039	10340	7269	13698	16293	6699	6824	5658	6723	8201	10241	8101	6964	5297	5465	12834	21565
hsa-miR-26b-3p	117	120	161	55	191	64	47	86	88	139	165	86	73	81	85	74	176	886	52	62	93	260
hsa-miR-27a-5p	10	5	6	3	3	47	3	6	4	13	5	12	4	5	12	2	7	347	5	3	15	22
hsa-miR-27a-3p	6316	6048	11563	4314	8880	4538	3946	7333	7578	8045	10429	11134	5353	7110	7001	7213	11208	4270	8606	4711	4313	23050	18861
hsa-miR-28-5p	1189	1003	2766	714	2114	963	798	1139	1316	1423	2938	1438	1285	1523	1462	1706	3236	894	1527	826	943	1703	3740
hsa-miR-28-3p	9931	8512	22673	5850	17423	10445	6004	12452	11139	12577	20995	14581	12398	12654	12736	12438	25137	8456	9078	4997	7356	19199	36389
hsa-miR-29a-5p	139	119	221	78	197	133	48	154	137	129	278	95	129	56	115	114	136	6165	76	76	402	302
hsa-miR-29a-3p	47851	40318	86030	31781	67338	40611	24682	43362	52719	43606	87923	43894	39369	32490	41001	40846	60529	26618	64033	30538	27862	171798	127958
hsa-miR-30a-5p	53741	58354	125902	46942	92089	61011	33253	92094	76030	69153	138612	76104	54456	41981	58085	62219	82200	31718	61622	27745	49837	268845	163743
hsa-miR-30a-3p	5559	4462	11786	3361	6961	6289	3173	7023	5990	6561	12540	4597	5335	4929	6342	5095	9863	3229	6463	2249	3918	16800	17563
hsa-miR-31-5p	203	200	445	273	239	161	134	182	235	193	407	107	191	189	205	135	240	9180	210	140	572	859
hsa-miR-31-3p	16	9	19	1	5	5	3	5	6	3	7	1	6	0	10	4	6	153	15
hsa-miR-32-5p	824	539	651	344	1123	336	205	575	356	895	663	293	333	324	431	450	604	354	267	305	247	491	1286
hsa-miR-32-3p	45	34	52	21	61	21	17	29	28	51	45	19	34	27	28	46	37	321	12	13	22	146
hsa-miR-33a-5p	652	573	664	327	1196	541	185	494	498	841	478	339	222	276	526	325	525	181	341	285	309	3858	1399
hsa-miR-33a-3p	114	83	116	44	123	62	31	67	70	109	109	31	39	38	83	57	72	753	34	49	115	233
hsa-miR-92a-1-5p	13	2	13	4	4	14	1	3	3	2	5	2	5	3	4	4	711	2	2	3	31
hsa-miR-92a-3p	42246	38723	50223	20238	68324	35052	19661	34443	24237	42942	41887	22290	22354	41671	37906	29984	79714	41615	17825	19775	19460	39084	108578
hsa-miR-93-5p	6901	5416	7821	3750	9831	4443	2768	5608	3821	7290	5275	3226	3474	3914	5811	4263	8262	6879	2346	2739	2781	10365	19463
hsa-miR-93-3p	28	10	38	8	42	9	6	15	18	19	20	22	15	6	25	10	19	122	12	9	47	47
hsa-miR-95-5p	84	53	122	46	126	58	26	68	88	83	113	62	76	39	65	82	77	386	32	48	546	217
hsa-miR-95-3p	1110	969	2741	778	1742	1128	644	1520	1625	1217	2296	994	1364	889	1325	1264	1222	735	1296	818	824	3110	4329
hsa-miR-96-5p	26	9	41	12	23	9	47	35	36	17	18	217	27	37	16	24	97	119	9	9	56	58
hsa-miR-98-5p	50824	30986	80693	21549	45532	38319	21471	33600	39626	48053	58479	21535	42277	31044	46855	45473	54780	35260	25804	22597	25577	51970	142624
hsa-miR-98-3p	719	484	911	349	492	393	221	427	478	485	1016	320	645	568	656	668	1103	258	449	299	332	292	1828
hsa-miR-99a-5p	15965	13513	33096	12728	25485	14486	9571	20179	18562	13461	23895	22426	15968	14763	15562	12327	16006	11674	24577	8407	8378	64336	54743
hsa-miR-99a-3p	302	365	621	291	422	335	185	376	358	436	591	258	303	210	318	287	439	240	294	178	251	1491	1198
hsa-miR-100-5p	79481	37070	63987	30825	107620	30986	18025	44661	40955	44899	60082	52166	39781	40535	41072	28735	28319	53908	38033	20461	14628	91004	131887
hsa-miR-100-3p	154	89	184	37	166	144	36	62	62	180	168	40	89	68	93	111	101	101	86	29	48	65	319
hsa-miR-101-5p	449	636	966	420	894	557	310	586	529	627	901	538	456	481	574	496	829	243	728	374	412	753	1422
hsa-miR-101-3p	19431	18440	29284	14575	31144	18471	7449	17823	18165	24323	34198	9800	13607	10769	15929	17565	21584	9448	9053	8167	13225	57684	46489
hsa-miR-29b-1-5p	126	99	298	100	157	143	45	118	143	112	198	62	135	94	151	129	119	83	119	80	62	417	520
hsa-miR-29b-3p	9231	10383	16612	7804	18075	9233	3548	12501	11360	9461	18227	6548	9338	4033	8918	6990	7204	3688	7076	5152	6062	35669	28289
hsa-miR-29b-2-5p	43	33	86	31	59	46	23	61	47	34	48	29	46	31	53	27	223	41	26	21	180	164
hsa-miR-103a-2-5p	47	32	74	21	36	45	22	46	53	36	64	26	25	20	25	30	218	56	21	18	244	149
hsa-miR-103a-3p	57341	56545	120575	45558	74052	60887	33067	65870	58842	65720	82673	35489	52492	42411	73148	55872	72334	50436	50157	29897	36824	208477	252692
hsa-miR-103a-1-5p	6	2	4	0	1	4	2	3	4	4	0	0	1	0	3	4	222	4
hsa-miR-105-5p	94	67	195	23	93	96	64	50	103	74	177	43	95	93	129	124	188	3109	59	41	370	360
hsa-miR-105-3p	7	5	16	4	14	6	7	5	11	13	12	5	7	2	8	5	11	650	44
hsa-miR-106a-5p	99	71	304	88	191	114	43	249	146	128	125	164	83	83	130	104	159	682	63	128	292	575
hsa-miR-106a-3p	0	0	0	0	0	0	0	0	0	0	0	0	1	0	1	0	0	00
hsa-miR-107	12864	12924	25287	10703	16544	12143	8198	13385	13381	13838	20856	7270	10120	10615	15478	12161	22185	9413	10800	6737	8071	37333	47477
hsa-miR-16-2-3p	13	13	23	3	28	11	11	22	15	22	19	11	8	21	15	14	68	110	10	32
hsa-miR-192-5p	10850	10721	16893	7812	16012	8565	5717	11078	9463	14552	16540	8147	7275	8285	9508	10099	11859	7911	7783	5007	6835	23833	26345
hsa-miR-192-3p	3	1	5	1	1	1	1	3	2	7	5	4	1	2	1	1	1	24
hsa-miR-196a-5p	0	0	19	3	11	11	0	9	6	0	5	12	1	8	0	0	30	011	0	1
hsa-miR-196a-1-3p	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	00
hsa-miR-197-5p	2	7	6	3	2	5	4	1	2	1	5	4	2	4	3	4	2	112	14
hsa-miR-197-3p	4641	5186	9789	4043	6037	4628	3665	6083	4062	5562	7040	6654	4797	5322	6538	4461	8605	6048	7292	3844	2793	15747	20189
hsa-miR-199a-5p	140	82	1008	145	269	206	165	646	153	249	885	715	383	207	241	300	643	9596	93	173	723	520
hsa-miR-199a-3p	939	837	5648	918	1410	1120	1047	3302	915	1580	6687	3161	2028	1693	1651	2319	4909	433	2284	496	1079	2042	3434
hsa-miR-208a-3p	0	0	0	0	0	0	0	1	0	1	0	1	0	0	0	0	1	01
hsa-miR-129-5p	6973	6508	13303	3986	6530	7258	3774	5453	7559	7194	7962	3709	6279	4931	10167	6221	6809	4274	4919	3070	4283	64162	33485
hsa-miR-129-1-3p	1734	1327	2143	840	1159	857	773	1283	1321	1454	1387	546	1278	486	1733	870	749	1014	1408	642	784	4986	4616
hsa-miR-148a-5p	89	66	372	67	142	114	63	176	85	120	403	132	115	70	100	135	181	576	41	72	266	286
hsa-miR-148a-3p	6130	5176	25339	4577	8570	6976	5268	12690	6384	7454	37987	11049	9217	4590	5783	8804	11497	2906	10277	2773	5167	16279	19534
hsa-miR-30c-5p	8513	6090	11549	4502	9475	6710	3768	8687	7097	7538	13893	6697	6948	4477	6579	5918	6878	5295	9006	3720	4738	18333	20643
hsa-miR-30c-2-3p	361	296	668	213	521	294	199	357	322	374	798	339	353	324	535	380	823	241	238	149	293	2099	1116