Content Comparison

Overview

Create a metadata “samplesheet.csv” for small RNAseq datasets.
Learn to use a “nextflow.config” file in the working directory to override Nextflow parameters (e.g., specify where to find the pipeline assets).
Learn how to prepare a PBS script to run the expression profiling of small RNAs against the reference miRBase database annotated microRNAs.

Preparing the pipeline inputs

The pipeline requires preparing at least 2 files:

...

PBS Pro script (launch_nf-core_RNAseq_QC.pbs) with instructions to run the pipeline

...

Nextflow.config - revision 2.3.1 of the nf-core/smrnaseq pipeline may not be able to identify the location of reference adapter sequences, thus, we will use a local nextflow.config file to tell Nextflow where to find the reference adapters necessary to trim the raw small_RNA-Seq data

A. Create the metadata file (samplesheet.csv):

Change to the data folder directory:

Code Block
cd $HOME/workshop/2024-2/session6_smallRNAseq/data/human_disease

Copy the bash script to the working folder

Code Block
cp /work/training/2024/smallRNAseq/scripts/create_nf-core_smallRNAseq_samplesheet.sh $HOME/workshop/2024-2/session6_smallRNAseq/data/human_disease

Note: you could replace ‘$HOME/workshop/data’ with “.” A dot indicates ‘current directory’ and will copy the file to the directory where you are currently located

View the content of the script:

Code Block
cat create_nf-core_smallRNAseq_samplesheet.sh

...

NOTE: modify ‘read1_extension’ as appropriate for your data. For example: _1.fastq.gz or _R1_001.fastq.gz or _R1.fq.gz , etc

Let’s generate the metadata file by running the following command:

Code Block
sh create_nf-core_smallRNAseq_samplesheet.sh $HOME/workshop/2024-2/session6_smallRNAseq/data/human_disease

Check the newly created samplesheet.csv file:

Code Block
cat samplesheet.csv

sample,fastq_1

ERR409878,/work/training/2024/smallRNAseq/data/human_disease/ERR409878.fastq.gz

ERR409879,/work/training/2024/smallRNAseq/data/human_disease/ERR409879.fastq.gz

ERR409880,/work/training/2024/smallRNAseq/data/human_disease/ERR409880.fastq.gz

ERR409881,/work/training/2024/smallRNAseq/data/human_disease/ERR409881.fastq.gz

ERR409882,/work/training/2024/smallRNAseq/data/human_disease/ERR409882.fastq.gz

ERR409883,/work/training/2024/smallRNAseq/data/human_disease/ERR409883.fastq.gz

ERR409884,/work/training/2024/smallRNAseq/data/human_disease/ERR409884.fastq.gz

ERR409885,/work/training/2024/smallRNAseq/data/human_disease/ERR409885.fastq.gz

ERR409886,/work/training/2024/smallRNAseq/data/human_disease/ERR409886.fastq.gz

ERR409887,/work/training/2024/smallRNAseq/data/human_disease/ERR409887.fastq.gz

ERR409888,/work/training/2024/smallRNAseq/data/human_disease/ERR409888.fastq.gz

ERR409889,/work/training/2024/smallRNAseq/data/human_disease/ERR409889.fastq.gz

ERR409890,/work/training/2024/smallRNAseq/data/human_disease/ERR409890.fastq.gz

ERR409891,/work/training/2024/smallRNAseq/data/human_disease/ERR409891.fastq.gz

ERR409892,/work/training/2024/smallRNAseq/data/human_disease/ERR409892.fastq.gz

ERR409893,/work/training/2024/smallRNAseq/data/human_disease/ERR409893.fastq.gz

ERR409894,/work/training/2024/smallRNAseq/data/human_disease/ERR409894.fastq.gz

ERR409895,/work/training/2024/smallRNAseq/data/human_disease/ERR409895.fastq.gz

...

Overview

Similar to exercise 6.4 we will:
- Use created “samplesheet.csv” metadata file for small RNAseq datasets in exercise 6.4.
- Use a “nextflow.config” file in the working directory to override Nextflow parameters (e.g., specify where to find the pipeline assets).
- Use a PBS script to run the expression profiling of miRNAs against MirGeneDB, a curated database that includes experimentally validated miRNAs.

Prepare pipeline inputs

Let’s move to the working directory:

Code Block
cd $HOME/workshop/2024-2/session6_smallRNAseq/runs/run2_human_MirGeneDB

Now, let’s copy the samplesheet.csv and nextflow.config files:

Code Block
cp $HOME/workshop/2024-2/session6_smallRNAseq/data/human_disease/

...

samplesheet.

...

ERR409897,/work/training/2024/smallRNAseq/data/human_disease/ERR409897.fastq.gz

...

csv .
cp $HOME/workshop/2024-2/session6_smallRNAseq/scripts/nextflow.config .

Create the metadata file (samplesheet.csv):

Copy the bash script to the working folder

Code Block
cp /work/training/2024/smallRNAseq/

...

scripts/create_nf-core_smallRNAseq_samplesheet.sh $HOME/workshop/2024-2/session6_smallRNAseq/data/human_disease

...

Note: you could replace ‘$HOME/workshop/data’ with “.” A dot indicates ‘current directory’ and will copy the file to the directory where you are currently located

B. Prepare PBS Pro script to run the nf-core/smrnaseq pipeline

Copy the PBS Pro script for running the full small RNAseq pipeline (launch_nf-core_smallRNAseq_miRBase.pbs)

Copy and paste the code below to the terminal:

Code Block

cp $HOME/workshop/2024-2/session6_smallRNAseq/data/human_disease/samplesheet.csv $HOME/workshop/2024-2/session6_smallRNAseq/runs/run1_human_miRBase
cp /work/training/2024/smallRNAseq/scripts/launch_nf-core_smallRNAseq_miRBase.pbs $HOME/workshop/2024-2/session6_smallRNAseq/runs/run1_human_miRBase
cp /work/training/2024/smallRNAseq/scripts/nextflow.config $HOME/workshop/2024-2/session6_smallRNAseq/runs/run1_human_miRBase
cd $HOME/workshop/2024-2/session6_smallRNAseq/runs/run1_human_miRBase

Line 1: Copy the samplesheet.csv file to the working directory
Line 2: Copy the launch_nf-core_smallRNAseq_human.pbs submission script to the working directory
Line 3: Copy the nextflow.config file from shared folder to my working directory.
Line 4: move to the working directory

View the content of the launch_nf-core_RNAseq_QC.pbs script:

...

TIP: when running the nf-core/smrnaseq pipeline (release 2.3.1) the pipeline is not able to find the location of the reference adapter sequences for trimming of the raw small RNAseq pipeline, so we need to specify where to find the folder where the adapter sequences file is located. To do this, we prepare a “nextflow.config” file (see below). This file should be already in your working directory. Print the content as follows:

...

Code Block
singularity { runOptions = '-B $HOME/.nextflow/assets/nf-core/smrnaseq/assets' }

Note: if a config file is placed in the working folder it can override parameters define by the global ~/.nextflow/config file or the config file define as part of the pipeline.

Submit the job to the HPC cluster:

...

Note: the “mature_counts.csv” needs to be transposed prior running the statistical analysis. This can be done either user the R script or using a script called “transpose_csv.py”.

Let’s copy the transpose_csv.py script to the working folder:

...

Version	Old Version 3	New Version 4
Changes made by	Roberto Barrero Gumiel	Roberto Barrero Gumiel
Saved on	Oct 27, 2024	Oct 27, 2024

Versions Compared

Key

Overview

Preparing the pipeline inputs

A. Create the metadata file (samplesheet.csv):

Overview

Prepare pipeline inputs

Create the metadata file (samplesheet.csv):

B. Prepare PBS Pro script to run the nf-core/smrnaseq pipeline

Content Comparison

Versions Compared

Key

Preparing the pipeline inputs

A. Create the metadata file (samplesheet.csv):

<span class="diff-html-added" data-a11y-before="Start of added content" data-a11y-after="End of added content" id="added-diff-45">#505968}</span>Overview

Prepare pipeline inputs

Create the metadata file (samplesheet.csv):

B. Prepare PBS Pro script to run the nf-core/smrnaseq pipeline

Overview