Overview
Similar to exercise 6.4 we will:
Use created “samplesheet.csv” metadata file for small RNAseq datasets in exercise 6.4.
Use a “nextflow.config” file in the working directory to override Nextflow parameters (e.g., specify where to find the pipeline assets).
Use a PBS script to run the expression profiling of miRNAs against MirGeneDB, a curated database that includes experimentally validated miRNAs.
Prepare pipeline inputs
Let’s move to the working directory:
Code Block |
---|
cd $HOME/workshop/2024-2/session6_smallRNAseq/runs/run2_human_MirGeneDB |
Now, let’s copy the samplesheet.csv, nextflow.config and the PBS script to run the pipeline agains MirGeneDB files:
Code Block |
---|
cp $HOME/workshop/2024-2/session6_smallRNAseq/data/human_disease/samplesheet.csv .
cp $HOME/workshop/2024-2/session6_smallRNAseq/scripts/nextflow.config .
cp $HOME/workshop/2024-2/session6_smallRNAseq/scripts/launch_nf-core_smallRNAseq_MirGeneDB.pbs . |
Create the metadata file (samplesheet.csv):
Copy the bash script to the working folder
Code Block |
---|
cp /work/training/2024/smallRNAseq/scripts/create_nf-core_smallRNAseq_samplesheet.sh $HOME/workshop/2024-2/session6_smallRNAseq/data/human_disease |
Note: you could replace ‘$HOME/workshop/data’ with “.” A dot indicates ‘current directory’ and will copy the file to the directory where you are currently located
B. Prepare PBS Pro script to run the nf-core/smrnaseq pipeline
Copy the PBS Pro script for running the full small RNAseq pipeline (launch_nf-core_smallRNAseq_miRBase.pbs)
Copy and paste the code below to the terminal:
...
Overview
Similar to exercise 6.4 we will:
Use created “samplesheet.csv” metadata file for small RNAseq datasets in exercise 6.4.
Use a “nextflow.config” file in the working directory to override Nextflow parameters (e.g., specify where to find the pipeline assets).
Use a PBS script to run the expression profiling of miRNAs against MirGeneDB, a curated database that includes experimentally validated miRNAs.
Prepare pipeline inputs
Let’s move to the working directory:
Code Block |
---|
cd $HOME/workshop/2024-2/session6_smallRNAseq/runs/run1run2_human_miRBase cp /work/training/2024/smallRNAseq/scripts/launch_nf-core_smallRNAseq_miRBase.pbs MirGeneDB |
Now, let’s copy the samplesheet.csv, nextflow.config and the PBS script to run the pipeline agains MirGeneDB files:
Code Block |
---|
cp $HOME/workshop/2024-2/session6_smallRNAseq/runsdata/run1_human_miRBase cp /work/training/2024/smallRNAseq/scripts/nextflow.configdisease/samplesheet.csv . cp $HOME/workshop/2024-2/session6_smallRNAseq/runs/run1_human_miRBase cdscripts/nextflow.config . cp $HOME/workshop/2024-2/session6_smallRNAseq/runs/run1_human_miRBase |
...
Line 1: Copy the samplesheet.csv file to the working directory
...
scripts/launch_nf-core_smallRNAseq_ |
...
Line 3: Copy the nextflow.config file from shared folder to my working directory.
...
Line 4: move to the working directory
...
MirGeneDB.pbs . |
Print the content of the launch _nf-core_RNAseq_QC.pbs script:
Code Block |
---|
cat launch_nf-core_smallRNAseq_miRBaseMirGeneDB.pbs |
...
TIP: when running the nf-core/smrnaseq pipeline (release 2.3.1) the pipeline is not able to find the location of the reference adapter sequences for trimming of the raw small RNAseq pipeline, so we need to specify where to find the folder where the adapter sequences file is located. To do this, we prepare a “nextflow.config” file (see below). This file should be already in your working directory. Print the content as follows:
...
|
---|
Note: if a config file is placed in the working folder it can override parameters define by the global ~/.nextflow/config file or the config file define as part of the pipeline.
Submit the job to the HPC cluster:
...
Note: the “mature_counts.csv” needs to be transposed prior running the statistical analysis. This can be done either user the R script or using a script called “transpose_csv.py”.
Let’s copy the transpose_csv.py script to the working folder:
...