Overview
Create a metadata “samplesheet.csv” for small RNAseq datasets.
Learn to use a “nextflow.config” file in the working directory to override Nextflow parameters (e.g., specify where to find the pipeline assets).
Learn how to prepare a PBS script to run the expression profiling of small RNAs against the reference miRBase database annotated microRNAs.
Preparing the pipeline inputs
The pipeline requires preparing at least 2 files:
...
PBS Pro script (launch_nf-core_RNAseq_QC.pbs) with instructions to run the pipeline
...
Nextflow.config - revision 2.3.1 of the nf-core/smrnaseq pipeline may not be able to identify the location of reference adapter sequences, thus, we will use a local nextflow.config file to tell Nextflow where to find the reference adapters necessary to trim the raw small_RNA-Seq data
A. Create the metadata file (samplesheet.csv):
Change to the data folder directory:
Code Block |
---|
cd $HOME/workshop/2024-2/session6_smallRNAseq/data/human_disease |
Copy the bash script to the working folder
Code Block |
---|
cp /work/training/2024/smallRNAseq/scripts/create_nf-core_smallRNAseq_samplesheet.sh $HOME/workshop/2024-2/session6_smallRNAseq/data/human_disease |
Note: you could replace ‘$HOME/workshop/data’ with “.” A dot indicates ‘current directory’ and will copy the file to the directory where you are currently located
View the content of the script:
Code Block |
---|
cat create_nf-core_smallRNAseq_samplesheet.sh |
...
NOTE: modify ‘read1_extension’ as appropriate for your data. For example: _1.fastq.gz or _R1_001.fastq.gz or _R1.fq.gz , etc
Let’s generate the metadata file by running the following command:
Code Block |
---|
sh create_nf-core_smallRNAseq_samplesheet.sh $HOME/workshop/2024-2/session6_smallRNAseq/data/human_disease |
Check the newly created samplesheet.csv file:
Code Block |
---|
cat samplesheet.csv |
sample,fastq_1 ERR409878,/work/training/2024/smallRNAseq/data/human_disease/ERR409878.fastq.gz ERR409879,/work/training/2024/smallRNAseq/data/human_disease/ERR409879.fastq.gz ERR409880,/work/training/2024/smallRNAseq/data/human_disease/ERR409880.fastq.gz ERR409881,/work/training/2024/smallRNAseq/data/human_disease/ERR409881.fastq.gz ERR409882,/work/training/2024/smallRNAseq/data/human_disease/ERR409882.fastq.gz ERR409883,/work/training/2024/smallRNAseq/data/human_disease/ERR409883.fastq.gz ERR409884,/work/training/2024/smallRNAseq/data/human_disease/ERR409884.fastq.gz ERR409885,/work/training/2024/smallRNAseq/data/human_disease/ERR409885.fastq.gz ERR409886,/work/training/2024/smallRNAseq/data/human_disease/ERR409886.fastq.gz ERR409887,/work/training/2024/smallRNAseq/data/human_disease/ERR409887.fastq.gz ERR409888,/work/training/2024/smallRNAseq/data/human_disease/ERR409888.fastq.gz ERR409889,/work/training/2024/smallRNAseq/data/human_disease/ERR409889.fastq.gz ERR409890,/work/training/2024/smallRNAseq/data/human_disease/ERR409890.fastq.gz ERR409891,/work/training/2024/smallRNAseq/data/human_disease/ERR409891.fastq.gz ERR409892,/work/training/2024/smallRNAseq/data/human_disease/ERR409892.fastq.gz ERR409893,/work/training/2024/smallRNAseq/data/human_disease/ERR409893.fastq.gz ERR409894,/work/training/2024/smallRNAseq/data/human_disease/ERR409894.fastq.gz ERR409895,/work/training/2024/smallRNAseq/data/human_disease/ERR409895.fastq.gz ERR409896,/work/training/2024/smallRNAseq/data/human_disease/ERR409896.fastq.gz ERR409897,/work/training/2024/smallRNAseq/data/human_disease/ERR409897.fastq.gz ERR409898,/work/training/2024/smallRNAseq/data/human_disease/ERR409898.fastq.gz ERR409899,/work/training/2024/smallRNAseq/data/human_disease/ERR409899.fastq.gz ERR409900,/work/training/2024/smallRNAseq/data/human_disease/ERR409900.fastq.gz |
---|
B. Prepare PBS Pro script to run the nf-core/smrnaseq pipeline
Copy the PBS Pro script for running the full small RNAseq pipeline (launch_nf-core_smallRNAseq_miRBase.pbs)
Copy and paste the code below to the terminal:
Code Block |
---|
cp $HOME/workshop/2024-2/session6_smallRNAseq/data/human_disease/samplesheet.csv $HOME/workshop/2024-2/session6_smallRNAseq/runs/run1_human_miRBase
cp /work/training/2024/smallRNAseq/scripts/launch_nf-core_smallRNAseq_miRBase.pbs $HOME/workshop/2024-2/session6_smallRNAseq/runs/run1_human_miRBase
cp /work/training/2024/smallRNAseq/scripts/nextflow.config $HOME/workshop/2024-2/session6_smallRNAseq/runs/run1_human_miRBase
cd $HOME/workshop/2024-2/session6_smallRNAseq/runs/run1_human_miRBase |
Line 1: Copy the samplesheet.csv file to the working directory
Line 2: Copy the launch_nf-core_smallRNAseq_human.pbs submission script to the working directory
Line 3: Copy the nextflow.config file from shared folder to my working directory.
Line 4: move to the working directory
View the content of the launch_nf-core_RNAseq_QC.pbs
script:
Code Block |
---|
cat launch_nf-core_smallRNAseq_miRBase.pbs |
#!/bin/bash -l
#PBS -N nfsmallRNAseq
#PBS -l select=1:ncpus=2:mem=4gb
#PBS -l walltime=24:00:00
#run the tasks in the current working directory
cd $PBS_O_WORKDIR
#load java and assign up to 4GB RAM memory for nextflow to use
module load java
export NXF_OPTS='-Xms1g -Xmx4g'
#run the small RNAseq pipeline
nextflow run nf-core/smrnaseq -r 2.3.1 \
-profile singularity \
--outdir results \
--input samplesheet.csv \
--genome GRCh38-local \
--mirtrace_species hsa \
--three_prime_adapter 'TGGAATTCTCGGGTGCCAAGG' \
--fastp_min_length 18 \
--fastp_max_length 30 \
--hairpin /work/training/smallRNAseq/data/mirbase/hairpin.fa \
--mature /work/training/smallRNAseq/data/mirbase/mature.fa \
--mirna_gtf /work/training/smallRNAseq/data/mirbase/hsa.gff3 \
-resume
Submit the job to the HPC cluster:
Code Block |
---|
qsub launch_nf-core_smallRNAseq_miRBase.pbs |
Monitor the progress:
Code Block |
---|
qjobs |
The job will take several hours to run, hence we will use precomputed results for the statistical analysis in the next section.
STEP1: copy metadata (samplesheet.csv) into the working folder (run2_RNAseq)
Code Block |
---|
cp $HOME/workshop/2024-2/session4_RNAseq/data/mouse/samplesheet.csv $HOME/workshop/2024-2/session4_RNAseq/runs/run2_RNAseq
cd $HOME/workshop/2024-2/session4_RNAseq/runs/run2_RNAseq |
Line 1: Copy the samplesheet.csv file to the working directory
Line 2: move to the working directory
Copy the PBS Pro script to run the nf-core/rnaseq pipeline:
Code Block |
---|
cp $HOME/workshop/2024-2/session4_RNAseq/scripts/launch_nf-core_RNAseq_pipeline.pbs $HOME/workshop/2024-2/session4_RNAseq/runs/run2_RNAseq |
NOTE: if you had issues with the above lines. Alternatively, run the following code to copy the sample sheet.csv and launch files:
Code Block |
---|
cp /work/training/2024/rnaseq/data/samplesheet.csv $HOME/workshop/2024-2/session4_RNAseq/runs/run2_RNAseq
cp /work/training/2024/rnaseq/scripts/launch_nf-core_RNAseq_pipeline.pbs
cd $HOME/workshop/2024-2/session4_RNAseq/runs/run2_RNAseq |
Adjusting the Trim Galore (read trimming) options
Print the content of the launch_RNAseq.pbs
script:
Code Block |
---|
cat launch_nf-core_RNAseq_pipeline.pbs |
Submitting the job
Code Block |
---|
qsub launch_nf-core_RNAseq_pipeline.pbs |
Monitoring the Run
Code Block |
---|
qjobs |
Outputs
The pipeline will produce two folders, one called “work,” where all the processing is done, and another called “results,” where we can find the pipeline's outputs. The content of the results folder is as follows:
The quantification of the gene and transcript expressions can be found in the ‘star_salmon’ directory.
Code Block |
---|
cd results/star_salmon |
The following feature count tables are generated:
Copying data for hands-on exercises
Before we start using the HPC, let’s start an interactive session:
Code Block |
---|
qsub -I -S /bin/bash -l walltime=10:00:00 -l select=1:ncpus=1:mem=4gb |
Get a copy of the scripts to be used in this module
Use the terminal to log into the HPC and create a /RNAseq/ folder to run the nf-core/rnaseq pipeline. For example:
Code Block |
---|
mkdir -p $HOME/workshop/small_RNAseq/scripts
cp /work/training/smallRNAseq/scripts/* $HOME/workshop/small_RNAseq/scripts/
ls -l $HOME/workshop/small_RNAseq/scripts/ |
Line 1: The -p indicates create 'parental directories as required. Thus the line 1 command creates both /workshop/ and the subfolder /workshop/scripts/
Line 2: Copies all files from /work/datasets/workshop/scripts/ as noted by an asterisk to the newly created folder $HOME/workshop/scripts/
Line 3: List the files in the script folder
Copy multiple subdirectories and files using rsync
Code Block |
---|
mkdir -p $HOME/workshop/small_RNAseq/data/
rsync -rv /work/training/smallRNAseq/data/ $HOME/workshop/small_RNAseq/data/ |
Line 1: The first command creates the folder /scripts/
Line 2: rsync copies all subfolders and files from the specified source folder to the selected destination folder. The -r = recursively will copy directories and files; -v = verbose messages of the transfer of files
Create a folder for running the nf-core small RNA-seq pipeline
Let’s create a “runs” folder to run the nf-core/rnaseq pipeline:
Code Block |
---|
mkdir -p $HOME/workshop/small_RNAseq
mkdir $HOME/workshop/small_RNAseq/run1_test
mkdir $HOME/workshop/small_RNAseq/run2_smallRNAseq_human
cd $HOME/workshop/small_RNAseq/ |
Lines 1-4: create sub-folders for each exercise
Line 5: change the directory to the folder “small_RNAseq”
Exercise 1: Running a test with nf-core sample data
First, let’s assess the execution of the nf-core/rnaseq pipeline by running a test using sample data.
Copy the launch_nf-core_smallRNAseq_test.pbs
to the working directory
Code Block |
---|
cd $HOME/workshop/small_RNAseq/run1_test
cp $HOME/workshop/small_RNAseq/scripts/launch_nf-core_smallRNAseq_test.pbs . |
View the content of the script as follows:
Code Block |
---|
cat launch_nf-core_smallRNAseq_test.pbs |
#!/bin/bash -l #PBS -N nfsmrnaseq #PBS -l select=1:ncpus=2:mem=4gb #PBS -l walltime=24:00:00 #work on current directory (folder) cd $PBS_O_WORKDIR #load java and set up memory settings to run nextflow module load java export NXF_OPTS='-Xms1g -Xmx4g' # run the test nextflow run nf-core/smrnaseq -profile test,singularity --outdir results -r 2.1.0 |
---|
where:
nextflow command: nextflow run
pipeline name: nf-core/smrnaseq
pipeline version: -r 2.1.0
container type and sample data: -profile test,singularity
output directory: --outdir results
Submitting the job
Now we can submit the small RNAseq test job to the HPC scheduler:
Code Block |
---|
qsub launch_nf-core_smallRNAseq_test.pbs |
Monitoring the Run
Code Block |
---|
qjobs |
Exercise 2: Running the small RNA pipeline using public human data
The pipeline requires preparing at least 2 files:
...
PBS Pro script (launch_nf-core_smallRNAseq_human.pbs) with instructions to run the pipeline
Create the metadata file (samplesheet.csv):
Change to the data folder directory:
Code Block |
---|
cd $HOME/workshop/small_RNAseq/data/human
pwd |
Copy the bash script to the working folder
Code Block |
---|
cp /work/training/smallRNAseq/scripts/create_nf-core_smallRNAseq_samplesheet.sh $HOME/workshop/small_RNAseq/data/human |
Note: you could replace ‘$HOME/workshop/data’ with “.” A dot indicates ‘current directory’ and will copy the file to the directory where you are currently located
View the content of the script:
Code Block |
---|
cat create_nf-core_smallRNAseq_samplesheet.sh |
#!/bin/bash -l #User defined variables. ########################################################## DIR='$HOME/workshop/small_RNAseq/data/human' INDEX='samplesheet.csv' ########################################################## #load python module module load python/3.10.8-gcccore-12.2.0 #fetch the script to create the sample metadata table wget -L https://raw.githubusercontent.com/nf-core/rnaseq/master/bin/fastq_dir_to_samplesheet.py chmod +x fastq_dir_to_samplesheet.py #generate initial sample metadata file ./fastq_dir_to_samplesheet.py $DIR index.csv \ --strandedness auto \ --read1_extension .fastq.gz #format index file cat index.csv | awk -F "," '{print $1 "," $2}' > ${INDEX} #Remove intermediate files: rm index.csv fastq_dir_to_samplesheet.py |
---|
Let’s generate the metadata file by running the following command:
Code Block |
---|
sh create_RNAseq_samplesheet.sh |
Check the newly created samplesheet.csv file:
Code Block |
---|
ls -l
cat samplesheet.cvs |
sample,fastq_1 SRR20753704,/work/training/smallRNAseq/data/SRR20753704.fastq.gz SRR20753705,/work/training/smallRNAseq/data/SRR20753705.fastq.gz SRR20753706,/work/training/smallRNAseq/data/SRR20753706.fastq.gz SRR20753707,/work/training/smallRNAseq/data/SRR20753707.fastq.gz SRR20753708,/work/training/smallRNAseq/data/SRR20753708.fastq.gz SRR20753709,/work/training/smallRNAseq/data/SRR20753709.fastq.gz SRR20753716,/work/training/smallRNAseq/data/SRR20753716.fastq.gz SRR20753717,/work/training/smallRNAseq/data/SRR20753717.fastq.gz SRR20753718,/work/training/smallRNAseq/data/SRR20753718.fastq.gz SRR20753719,/work/training/smallRNAseq/data/SRR20753719.fastq.gz SRR20753720,/work/training/smallRNAseq/data/SRR20753720.fastq.gz SRR20753721,/work/training/smallRNAseq/data/SRR20753721.fastq.gz |
---|
Copy the PBS Pro script for running the full small RNAseq pipeline (launch_nf-core_smallRNAseq_human.pbs)
Copy and paste the code below to the terminal:
Code Block |
---|
cp $HOME/workshop/small_RNAseq/data/human/samplesheet.csv $HOME/workshop/small_RNAseq/run2_smallRNAseq_human
cp $HOME/workshop/small_RNAseq/scripts/launch_nf-core_smallRNAseq_human.pbs $HOME/workshop/small_RNAseq/run2_smallRNAseq_human
cd $HOME/workshop/small_RNAseq/run2_smallRNAseq_human |
Line 1: Copy the samplesheet.csv file to the working directory
Line 2: copy the launch_nf-core_smallRNAseq_human.pbs submission script to the working directory
Line 3: move to the working directory
View the content of the launch_nf-core_RNAseq_QC.pbs
script:
Code Block |
---|
cat launch_nf-core_smallRNAseq_human.pbs |
#!/bin/bash -l #PBS -N nfsmallRNAseq #PBS -l select=1:ncpus=2:mem=4gb #PBS -l walltime=24:00:00 #PBS -m abe
#run the tasks in the current working directory cd $PBS_O_WORKDIR #load java and assign up to 4GB RAM memory for nextflow to use module load java export NXF_OPTS='-Xms1g -Xmx4g'
#run the small RNAseq pipeline nextflow run nf-core/smrnaseq -r 2.1.0 \ -profile singularity \ --outdir results \ --input samplesheet.csv \ --genome GRCh38-local \ --mirtrace_species hsa \ --three_prime_adapter 'TGGAATTCTCGGGTGCCAAGG' \ --fastp_min_length 18 \ --fastp_max_length 30 \ --hairpin /work/training/smallRNAseq/data/mirbase/hairpin.fa \ --mature /work/training/smallRNAseq/data/mirbase/mature.fa \ --mirna_gtf /work/training/smallRNAseq/data/mirbase/hsa.gff3 \ -resume |
---|
Submit the job to the HPC cluster:
Code Block |
---|
qsub launch_nf-core_smallRNAseq_human.pbs |
Monitor the progress:
Code Block |
---|
qjobs |
The job will take several hours to run, hence we will use precomputed results for the statistical analysis in the next section.
Precomputed results:
We ran the small RNA seq samples and the results can be found at:
Code Block |
---|
/work/training/smallRNAseq/runs/run2_smallRNAseq_human |
The results of the miRNA profiling can be found in the folder call “edger”:
Code Block |
---|
results/
├── edger
├── fastp
├── fastqc
├── genome
├── index
├── mirdeep
├── mirdeep2
├── mirtop
├── mirtrace
├── multiqc
├── pipeline_info
├── samtools
└── unmapped |
inside the “edger” folder find the “mature_counts.csv” file:
Code Block |
---|
hairpin_counts.csv
hairpin_CPM_heatmap.pdf
hairpin_edgeR_MDS_distance_matrix.txt
hairpin_edgeR_MDS_plot_coordinates.txt
hairpin_edgeR_MDS_plot.pdf
hairpin_log2CPM_sample_distances_dendrogram.pdf
hairpin_log2CPM_sample_distances_heatmap.pdf
hairpin_log2CPM_sample_distances.txt
hairpin_logtpm.csv
hairpin_logtpm.txt
hairpin_normalized_CPM.txt
hairpin_unmapped_read_counts.txt
mature_counts.csv <-- we will use this file for the statistical analysis in the next section
mature_counts.txt
mature_CPM_heatmap.pdf
mature_edgeR_MDS_distance_matrix.txt
mature_edgeR_MDS_plot_coordinates.txt
mature_edgeR_MDS_plot.pdf
mature_log2CPM_sample_distances_dendrogram.pdf
mature_log2CPM_sample_distances_heatmap.pdf
mature_log2CPM_sample_distances.txt
mature_logtpm.csv
mature_logtpm.txt
mature_normalized_CPM.txt
mature_unmapped_read_counts.txt |
Note: the “mature_counts.csv” needs to be transposed prior running the statistical analysis. This can be done either user the R script or using a script called “transpose_csv.py”.
Let’s initially create a “DESeq2” folder and copy the files needed for the statistical analysis:
Code Block |
---|
mkdir -p $HOME/workshop/small_RNAseq/DESeq2
cp $HOME/workshop/small_RNAseq/scripts/transpose_csv.py $HOME/workshop/small_RNAseq/DESeq2
cp $HOME/workshop/small_RNAseq/data/human/metadata_microRNA.txt $HOME/workshop/small_RNAseq/DESeq2
cp /work/training/smallRNAseq/runs/deprecated/run2_smallRNAseq_human/results/edger/mature_counts.csv $HOME/workshop/small_RNAseq/DESeq2
cd $HOME/workshop/small_RNAseq/DESeq2 |
To transpose the initial “mature_counst.csv” file do the following:
...
Overview
Create a metadata “samplesheet.csv” for small RNAseq datasets.
Learn to use a “nextflow.config” file in the working directory to override Nextflow parameters (e.g., specify where to find the pipeline assets).
Learn how to prepare a PBS script to run the expression profiling of small RNAs against the reference miRBase database annotated microRNAs.
Preparing the pipeline inputs
The pipeline requires preparing at least 2 files:
Metadata file (samplesheet.csv) thatspecifies the name of the samples, location of FASTQ files ('Read 1' and ‘Read 2’), and strandedness (forward, reverse, or auto. Note: auto is used when the strandedness of the data is unknown)
PBS Pro script (launch_nf-core_RNAseq_QC.pbs) with instructions to run the pipeline
Nextflow.config - revision 2.3.1 of the nf-core/smrnaseq pipeline may not be able to identify the location of reference adapter sequences, thus, we will use a local nextflow.config file to tell Nextflow where to find the reference adapters necessary to trim the raw small_RNA-Seq data
A. Create the metadata file (samplesheet.csv):
Change to the data folder directory:
Code Block |
---|
cd $HOME/workshop/2024-2/session6_smallRNAseq/data/human_disease |
Copy the bash script to the working folder
Code Block |
---|
cp /work/training/2024/smallRNAseq/scripts/create_nf-core_smallRNAseq_samplesheet.sh $HOME/workshop/2024-2/session6_smallRNAseq/data/human_disease |
Note: you could replace ‘$HOME/workshop/data’ with “.” A dot indicates ‘current directory’ and will copy the file to the directory where you are currently located
View the content of the script:
Code Block |
---|
cat create_nf-core_smallRNAseq_samplesheet.sh |
...
NOTE: modify ‘read1_extension’ as appropriate for your data. For example: _1.fastq.gz or _R1_001.fastq.gz or _R1.fq.gz , etc
Let’s generate the metadata file by running the following command:
Code Block |
---|
sh create_nf-core_smallRNAseq_samplesheet.sh $HOME/workshop/2024-2/session6_smallRNAseq/data/human_disease |
Check the newly created samplesheet.csv file:
Code Block |
---|
cat samplesheet.csv |
sample,fastq_1 ERR409878,/work/training/2024/smallRNAseq/data/human_disease/ERR409878.fastq.gz ERR409879,/work/training/2024/smallRNAseq/data/human_disease/ERR409879.fastq.gz ERR409880,/work/training/2024/smallRNAseq/data/human_disease/ERR409880.fastq.gz ERR409881,/work/training/2024/smallRNAseq/data/human_disease/ERR409881.fastq.gz ERR409882,/work/training/2024/smallRNAseq/data/human_disease/ERR409882.fastq.gz ERR409883,/work/training/2024/smallRNAseq/data/human_disease/ERR409883.fastq.gz ERR409884,/work/training/2024/smallRNAseq/data/human_disease/ERR409884.fastq.gz ERR409885,/work/training/2024/smallRNAseq/data/human_disease/ERR409885.fastq.gz ERR409886,/work/training/2024/smallRNAseq/data/human_disease/ERR409886.fastq.gz ERR409887,/work/training/2024/smallRNAseq/data/human_disease/ERR409887.fastq.gz ERR409888,/work/training/2024/smallRNAseq/data/human_disease/ERR409888.fastq.gz ERR409889,/work/training/2024/smallRNAseq/data/human_disease/ERR409889.fastq.gz ERR409890,/work/training/2024/smallRNAseq/data/human_disease/ERR409890.fastq.gz ERR409891,/work/training/2024/smallRNAseq/data/human_disease/ERR409891.fastq.gz ERR409892,/work/training/2024/smallRNAseq/data/human_disease/ERR409892.fastq.gz ERR409893,/work/training/2024/smallRNAseq/data/human_disease/ERR409893.fastq.gz ERR409894,/work/training/2024/smallRNAseq/data/human_disease/ERR409894.fastq.gz ERR409895,/work/training/2024/smallRNAseq/data/human_disease/ERR409895.fastq.gz ERR409896,/work/training/2024/smallRNAseq/data/human_disease/ERR409896.fastq.gz ERR409897,/work/training/2024/smallRNAseq/data/human_disease/ERR409897.fastq.gz ERR409898,/work/training/2024/smallRNAseq/data/human_disease/ERR409898.fastq.gz ERR409899,/work/training/2024/smallRNAseq/data/human_disease/ERR409899.fastq.gz ERR409900,/work/training/2024/smallRNAseq/data/human_disease/ERR409900.fastq.gz |
---|
B. Prepare PBS Pro script to run the nf-core/smrnaseq pipeline
Copy the PBS Pro script for running the full small RNAseq pipeline (launch_nf-core_smallRNAseq_miRBase.pbs)
Copy and paste the code below to the terminal:
Code Block |
---|
cp $HOME/workshop/2024-2/session6_smallRNAseq/data/human_disease/samplesheet.csv $HOME/workshop/2024-2/session6_smallRNAseq/runs/run1_human_miRBase
cp /work/training/2024/smallRNAseq/scripts/launch_nf-core_smallRNAseq_miRBase.pbs $HOME/workshop/2024-2/session6_smallRNAseq/runs/run1_human_miRBase
cp /work/training/2024/smallRNAseq/scripts/nextflow.config $HOME/workshop/2024-2/session6_smallRNAseq/runs/run1_human_miRBase
cd $HOME/workshop/2024-2/session6_smallRNAseq/runs/run1_human_miRBase |
Line 1: Copy the samplesheet.csv file to the working directory
Line 2: Copy the launch_nf-core_smallRNAseq_human.pbs submission script to the working directory
Line 3: Copy the nextflow.config file from shared folder to my working directory.
Line 4: move to the working directory
View the content of the launch_nf-core_RNAseq_QC.pbs
script:
Code Block |
---|
cat launch_nf-core_smallRNAseq_miRBase.pbs |
...
TIP: when running the nf-core/smrnaseq pipeline (release 2.3.1) the pipeline is not able to find the location of the reference adapter sequences for trimming of the raw small RNAseq pipeline, so we need to specify where to find the folder where the adapter sequences file is located. To do this, we prepare a “nextflow.config” file (see below). This file should be already in your working directory. Print the content as follows:
Code Block |
---|
cat nextflow.config |
|
---|
Note: if a config file is placed in the working folder it can override parameters define by the global ~/.nextflow/config file or the config file define as part of the pipeline.
Submit the job to the HPC cluster:
Code Block |
---|
qsub launch_nf-core_smallRNAseq_miRBase.pbs |
Monitor the progress:
Code Block |
---|
qjobs |
The job will take several hours to run, hence we will use precomputed results for the statistical analysis in the next section.
Outputs
The pipeline will produce two folders, one called “work,” where all the processing is done, and another called “results,” where we can find the pipeline's outputs. The content of the results folder is as follows:
Code Block |
---|
results/
├── bowtie_index
│ ├── mirna_hairpin
│ └── mirna_mature
├── fastp
│ └── on_raw
├── fastqc
│ ├── raw
│ └── trimmed
├── mirna_quant
│ ├── bam
│ ├── edger_qc <-- Expression mature miRNA (mature_counts.csv) and precursor-miRNAs (haripin_counts.csv) counts can be found in this subfolder.
│ ├── mirtop
│ ├── reference
│ └── seqcluster
├── mirtrace
│ ├── mirtrace-report.html
│ ├── mirtrace-results.json
│ ├── mirtrace-stats-contamination_basic.tsv
│ ├── mirtrace-stats-contamination_detailed.tsv
│ ├── mirtrace-stats-length.tsv
│ ├── mirtrace-stats-mirna-complexity.tsv
│ ├── mirtrace-stats-phred.tsv
│ ├── mirtrace-stats-qcstatus.tsv
│ ├── mirtrace-stats-rnatype.tsv
│ ├── qc_passed_reads.all.collapsed
│ └── qc_passed_reads.rnatype_unknown.collapsed
├── multiqc
│ ├── multiqc_data
│ ├── multiqc_plots
│ └── multiqc_report.html
└── pipeline_info
├── execution_report_2024-08-20_16-55-53.html
├── execution_timeline_2024-08-20_16-55-53.html
├── execution_trace_2024-08-20_16-55-53.txt
├── nf_core_smrnaseq_software_mqc_versions.yml
├── params_2024-08-20_16-56-04.json
└── pipeline_dag_2024-08-20_16-55-53.html |
The quantification of the mature miRNA and hairpin expressions can be found in the /results/mirna_quant/edger_qc directory.
Code Block |
---|
cd /results/mirna_quant/edger_qc |
Code Block |
---|
hairpin_counts.csv
hairpin_CPM_heatmap.pdf
hairpin_edgeR_MDS_distance_matrix.txt
hairpin_edgeR_MDS_plot_coordinates.txt
hairpin_edgeR_MDS_plot.pdf
hairpin_log2CPM_sample_distances_dendrogram.pdf
hairpin_log2CPM_sample_distances_heatmap.pdf
hairpin_log2CPM_sample_distances.txt
hairpin_logtpm.csv
hairpin_logtpm.txt
hairpin_normalized_CPM.txt
hairpin_unmapped_read_counts.txt
mature_counts.csv <-- Expression mature miRNAs. This file will be used to identify differentially expressed miRNAs (Session 7)
mature_CPM_heatmap.pdf
mature_edgeR_MDS_distance_matrix.txt
mature_edgeR_MDS_plot_coordinates.txt
mature_edgeR_MDS_plot.pdf
mature_log2CPM_sample_distances_dendrogram.pdf
mature_log2CPM_sample_distances_heatmap.pdf
mature_log2CPM_sample_distances.txt
mature_logtpm.csv
mature_logtpm.txt
mature_normalized_CPM.txt
mature_unmapped_read_counts.txt |
Let’s inspect the mature.csv file. Let’s use the ‘cat’ command to print it on the screen:
Code Block |
---|
cat mature_counts.csv |
Code Block |
---|
"","hsa-let-7a-5p","hsa-let-7a-3p","hsa-let-7a-2-3p","hsa-let-7b-5p","hsa-let-7b-3p","hsa-let-7c-5p","hsa-let-7c-3p","hsa-let-7d-5p","hsa-let-7d-3p","hsa-
"ERR409882",364608,341,16,59417,1998,68342,44,14861,3790,29486,207,211184,228,1462,7002,2,49664,1,1091,174,326,43,6,468,7,1482,1615,9,17256,534,573,6526,0
"ERR409879",305651,184,6,52115,1476,58425,30,12397,2659,23604,201,198778,151,1013,5486,1,48381,4,945,202,194,40,7,368,3,1097,1317,6,12662,561,372,3693,2,1
"ERR409881",712880,165,9,83857,2335,162724,83,30556,4503,68044,385,456864,348,1818,9893,0,111712,5,1495,259,174,48,6,318,2,1466,2220,4,17865,466,551,10360
"ERR409884",182178,111,3,27892,913,39989,21,7751,1886,13902,159,127386,132,743,3651,3,40311,0,629,117,97,21,11,305,2,1147,902,2,8313,368,242,2276,0,1146,4
"ERR409889",568269,257,13,92339,2239,100021,45,20819,3511,44172,207,276474,259,1376,12407,5,83908,5,1971,467,403,70,30,1082,7,3082,3172,14,24112,819,421,6
"ERR409894",314053,137,9,44708,1220,74145,74,12313,2827,25295,196,196866,158,896,4681,3,43677,1,806,138,131,22,7,296,3,1181,1169,5,11145,611,360,3742,5,12
"ERR409887",178201,48,4,25678,733,41506,27,7833,1613,15724,121,123391,98,497,3288,0,39434,1,445,97,65,15,3,150,2,539,461,3,5837,186,161,2958,2,847,3,1544,
"ERR409880",318121,136,3,46347,1260,65606,39,11095,2269,24585,200,191072,194,1118,5599,2,67420,3,1242,155,168,22,2,505,6,1708,1836,3,11293,482,359,3652,1,
"ERR409890",332579,105,7,40131,955,73537,38,13528,2029,31807,158,207846,175,962,5146,0,42402,0,659,149,102,20,4,219,3,964,1086,4,11957,423,385,6017,4,1556 |
Note: the “mature_counts.csv” needs to be transposed prior running the statistical analysis. This can be done either user the R script or using a script called “transpose_csv.py”.
Let’s copy the transpose_csv.py script to the working folder:
Code Block |
---|
cp /work/training/2024/smallRNAseq/scripts/transpose_csv.py . |
The check how to use the script do the following:
Code Block |
---|
python transpose_csv.py --help |
Code Block |
---|
usage: transpose_csv.py [-h] --input INPUT --output OUTPUT
Transpose a CSV file and generate a tab-delimited TXT file.
optional arguments:
-h, --help show this help message and exit
--input INPUT Input CSV file containing mature miRNA counts.
--output OUTPUT Output tab-delimited TXT file. |
To transpose the initial “mature_counst.csv” file do the following:
Code Block |
---|
python transpose_csv.py --input mature_counts.csv --out mature_counts.txt |
Let’s now print the transposed mature counts table:
Code Block |
---|
cat mature_counts.txt |
Code Block |
---|
microRNA ERR409882 ERR409879 ERR409881 ERR409884 ERR409889 ERR409894 ERR409887 ERR409880 ERR409890ERR409878 ERR409885 ERR409886 ERR409891 ERR409899 ERR409893 ERR409896 ERR409895 ERR409888 ERR409892ERR409898 ERR409897 ERR409883 ERR409900
hsa-let-7a-5p 364608 305651 712880 182178 568269 314053 178201 318121 332579 432950 546049 208284 351586 289926 417695 421395 531417 320229 249354 186910 242774 287209 1258946
hsa-let-7a-3p 341 184 165 111 257 137 48 136 105 288 205 85 100 47 114 102 106 167 88 85 101 262 439
hsa-let-7a-2-3p 16 6 9 3 13 9 4 3 7 17 12 4 7 2 5 9 9 123 13
hsa-let-7b-5p 59417 52115 83857 27892 92339 44708 25678 46347 40131 61357 59795 27498 37230 43602 62467 45870 75630 56350 28694 25897 28396 100340 174494
hsa-let-7b-3p 1998 1476 2335 913 2239 1220 733 1260 955 2535 1771 894 1118 999 1404 1294 1670 2293 1343 764 793 1180 4682
hsa-let-7c-5p 68342 58425 162724 39989 100021 74145 41506 65606 73537 69994 128501 47783 75857 66085 86159 84951 118227 53364 62697 41282 46267 117557 271459
hsa-let-7c-3p 44 30 83 21 45 74 27 39 38 41 70 33 34 38 54 55 77 290 23 28 108 155
hsa-let-7d-5p 14861 12397 30556 7751 20819 12313 7833 11095 13528 15326 22746 7913 14405 14122 19831 17317 27318 10690 9305 8698 10109 9076 55680
hsa-let-7d-3p 3790 2659 4503 1886 3511 2827 1613 2269 2029 3311 3875 1999 2370 2978 3305 2460 5372 3015 2857 1747 1800 3443 8451
hsa-let-7e-5p 29486 23604 68044 13902 44172 25295 15724 24585 31807 29935 46550 18482 29525 26036 39737 33998 51987 20686 20070 15501 19495 50124 114605
hsa-let-7e-3p 207 201 385 159 207 196 121 200 158 229 393 195 198 150 248 154 285 199 185 141 168 512 686
hsa-let-7f-5p 211184 198778 456864 127386 276474 196866 123391 191072 207846 250192 376484 129113 231751 223462 288225 276677 425085 154352 133796 126870 155630 150954 802971
hsa-let-7f-1-3p 228 151 348 132 259 158 98 194 175 233 261 116 156 118 187 174 243 159 192 124 103 226 560
hsa-let-7f-2-3p 1462 1013 1818 743 1376 896 497 1118 962 1387 1999 783 1062 701 1100 1127 1215 642 1024 578 799 1847 3221
hsa-miR-15a-5p 7002 5486 9893 3651 12407 4681 3288 5599 5146 9213 7931 3166 4055 5151 5804 5614 12518 4751 4126 3567 3704 4998 16596
hsa-miR-15a-3p 2 1 0 3 5 3 0 2 0 6 0 0 1 0 1 0 1 09
hsa-miR-16-5p 49664 48381 111712 40311 83908 43677 39434 67420 42402 73300 73543 36662 45433 50446 54609 55368 106992 56124 49628 34812 35133 42637 177920
hsa-miR-16-1-3p 1 4 5 0 5 1 1 3 0 3 1 2 1 1 3 1 3 011 15
hsa-miR-17-5p 1091 945 1495 629 1971 806 445 1242 659 1301 1079 662 666 548 805 681 822 934 590 537 561 1311 2793
hsa-miR-17-3p 174 202 259 117 467 138 97 155 149 247 216 153 110 157 161 134 311 132 113 136 133 380 420
hsa-miR-18a-5p 326 194 174 97 403 131 65 168 102 327 137 104 104 76 115 98 101 222 69 85 74 649 455
hsa-miR-18a-3p 43 40 48 21 70 22 15 22 20 38 44 23 18 25 40 22 62 421 24 34 17 86
hsa-miR-19a-5p 6 7 6 11 30 7 3 2 4 12 3 4 7 2 4 2 6 010
hsa-miR-19a-3p 468 368 318 305 1082 296 150 505 219 542 473 399 298 232 247 237 307 358 346 253 259 817 772
hsa-miR-19b-1-5p 7 3 2 2 7 3 2 6 3 5 2 3 1 0 1 1 113 10
hsa-miR-19b-3p 1482 1097 1466 1147 3082 1181 539 1708 964 1773 1656 1216 1054 766 884 878 1046 1284 1312 788 912 5402 3276
hsa-miR-20a-5p 1615 1317 2220 902 3172 1169 461 1836 1086 2163 1632 884 1199 855 1156 1097 1036 1176 649 673 869 1269 4785
hsa-miR-20a-3p 9 6 4 2 14 5 3 3 4 6 6 7 3 1 2 5 7 96
hsa-miR-21-5p 17256 12662 17865 8313 24112 11145 5837 11293 11957 21365 20088 14302 13403 11423 12353 15500 17497 10842 8743 7668 9397 21234 33370
hsa-miR-21-3p 534 561 466 368 819 611 186 482 423 803 398 595 296 540 464 376 510 470 306 261 267 490 1435
hsa-miR-22-5p 573 372 551 242 421 360 161 359 385 483 562 219 293 160 357 346 231 406 363 229 229 998 1543
hsa-miR-22-3p 6526 3693 10360 2276 6428 3742 2958 3652 6017 4830 7321 4210 4172 4826 6177 5011 9282 3041 5602 3335 2297 10176 16041
hsa-miR-23a-5p 0 2 6 0 7 5 2 1 4 1 1 2 1 9 4 1 7 012 10
hsa-miR-23a-3p 1785 1388 3082 1146 2153 1286 847 2136 1556 1966 2835 2553 1797 1339 1637 1648 2208 1208 2269 894 1054 4353 5098
hsa-miR-24-1-5p 24 15 11 4 23 2 3 11 7 25 14 6 7 2 9 6 8 111 7 2 60 36
hsa-miR-24-3p 5206 4549 6172 2715 6773 3471 1544 4085 3320 5937 4608 3329 2943 2039 3436 2830 2510 3465 3630 3148 1886 23971 13975
hsa-miR-24-2-5p 1 0 9 3 2 3 2 3 1 3 7 6 3 3 11 2 1 114 7
hsa-miR-25-5p 6 2 5 1 6 5 2 1 1 4 1 2 0 2 1 1 4 07
hsa-miR-25-3p 10678 8254 14145 4943 16696 7599 4914 9122 7057 11815 10600 5989 5872 7239 8869 7667 14988 8566 5127 3997 5421 12219 27213
hsa-miR-26a-5p 942607 674879 1353549 460648 1173421 1104148 481683 1026301 905383 1514327 1284627 1028272 954763 588897 719907 838025 918553 861348 1194165 539098 578238 1940621 2160732
hsa-miR-26a-1-3p 33 13 38 11 23 15 5 13 19 12 19 21 11 7 26 15 114 4 11 92 67
hsa-miR-26b-5p 8873 8470 12404 6004 16179 7036 4039 10340 7269 13698 16293 6699 6824 5658 6723 8201 10241 8101 6964 5297 5465 12834 21565
hsa-miR-26b-3p 117 120 161 55 191 64 47 86 88 139 165 86 73 81 85 74 176 886 52 62 93 260
hsa-miR-27a-5p 10 5 6 3 3 47 3 6 4 13 5 12 4 5 12 2 7 347 5 3 15 22
hsa-miR-27a-3p 6316 6048 11563 4314 8880 4538 3946 7333 7578 8045 10429 11134 5353 7110 7001 7213 11208 4270 8606 4711 4313 23050 18861
hsa-miR-28-5p 1189 1003 2766 714 2114 963 798 1139 1316 1423 2938 1438 1285 1523 1462 1706 3236 894 1527 826 943 1703 3740
hsa-miR-28-3p 9931 8512 22673 5850 17423 10445 6004 12452 11139 12577 20995 14581 12398 12654 12736 12438 25137 8456 9078 4997 7356 19199 36389
hsa-miR-29a-5p 139 119 221 78 197 133 48 154 137 129 278 95 129 56 115 114 136 6165 76 76 402 302
hsa-miR-29a-3p 47851 40318 86030 31781 67338 40611 24682 43362 52719 43606 87923 43894 39369 32490 41001 40846 60529 26618 64033 30538 27862 171798 127958
hsa-miR-30a-5p 53741 58354 125902 46942 92089 61011 33253 92094 76030 69153 138612 76104 54456 41981 58085 62219 82200 31718 61622 27745 49837 268845 163743
hsa-miR-30a-3p 5559 4462 11786 3361 6961 6289 3173 7023 5990 6561 12540 4597 5335 4929 6342 5095 9863 3229 6463 2249 3918 16800 17563
hsa-miR-31-5p 203 200 445 273 239 161 134 182 235 193 407 107 191 189 205 135 240 9180 210 140 572 859
hsa-miR-31-3p 16 9 19 1 5 5 3 5 6 3 7 1 6 0 10 4 6 153 15
hsa-miR-32-5p 824 539 651 344 1123 336 205 575 356 895 663 293 333 324 431 450 604 354 267 305 247 491 1286
hsa-miR-32-3p 45 34 52 21 61 21 17 29 28 51 45 19 34 27 28 46 37 321 12 13 22 146
hsa-miR-33a-5p 652 573 664 327 1196 541 185 494 498 841 478 339 222 276 526 325 525 181 341 285 309 3858 1399
hsa-miR-33a-3p 114 83 116 44 123 62 31 67 70 109 109 31 39 38 83 57 72 753 34 49 115 233
hsa-miR-92a-1-5p 13 2 13 4 4 14 1 3 3 2 5 2 5 3 4 4 711 2 2 3 31
hsa-miR-92a-3p 42246 38723 50223 20238 68324 35052 19661 34443 24237 42942 41887 22290 22354 41671 37906 29984 79714 41615 17825 19775 19460 39084 108578
hsa-miR-93-5p 6901 5416 7821 3750 9831 4443 2768 5608 3821 7290 5275 3226 3474 3914 5811 4263 8262 6879 2346 2739 2781 10365 19463
hsa-miR-93-3p 28 10 38 8 42 9 6 15 18 19 20 22 15 6 25 10 19 122 12 9 47 47
hsa-miR-95-5p 84 53 122 46 126 58 26 68 88 83 113 62 76 39 65 82 77 386 32 48 546 217
hsa-miR-95-3p 1110 969 2741 778 1742 1128 644 1520 1625 1217 2296 994 1364 889 1325 1264 1222 735 1296 818 824 3110 4329
hsa-miR-96-5p 26 9 41 12 23 9 47 35 36 17 18 217 27 37 16 24 97 119 9 9 56 58
hsa-miR-98-5p 50824 30986 80693 21549 45532 38319 21471 33600 39626 48053 58479 21535 42277 31044 46855 45473 54780 35260 25804 22597 25577 51970 142624
hsa-miR-98-3p 719 484 911 349 492 393 221 427 478 485 1016 320 645 568 656 668 1103 258 449 299 332 292 1828
hsa-miR-99a-5p 15965 13513 33096 12728 25485 14486 9571 20179 18562 13461 23895 22426 15968 14763 15562 12327 16006 11674 24577 8407 8378 64336 54743
hsa-miR-99a-3p 302 365 621 291 422 335 185 376 358 436 591 258 303 210 318 287 439 240 294 178 251 1491 1198
hsa-miR-100-5p 79481 37070 63987 30825 107620 30986 18025 44661 40955 44899 60082 52166 39781 40535 41072 28735 28319 53908 38033 20461 14628 91004 131887
hsa-miR-100-3p 154 89 184 37 166 144 36 62 62 180 168 40 89 68 93 111 101 101 86 29 48 65 319
hsa-miR-101-5p 449 636 966 420 894 557 310 586 529 627 901 538 456 481 574 496 829 243 728 374 412 753 1422
hsa-miR-101-3p 19431 18440 29284 14575 31144 18471 7449 17823 18165 24323 34198 9800 13607 10769 15929 17565 21584 9448 9053 8167 13225 57684 46489
hsa-miR-29b-1-5p 126 99 298 100 157 143 45 118 143 112 198 62 135 94 151 129 119 83 119 80 62 417 520
hsa-miR-29b-3p 9231 10383 16612 7804 18075 9233 3548 12501 11360 9461 18227 6548 9338 4033 8918 6990 7204 3688 7076 5152 6062 35669 28289
hsa-miR-29b-2-5p 43 33 86 31 59 46 23 61 47 34 48 29 46 31 53 27 223 41 26 21 180 164
hsa-miR-103a-2-5p 47 32 74 21 36 45 22 46 53 36 64 26 25 20 25 30 218 56 21 18 244 149
hsa-miR-103a-3p 57341 56545 120575 45558 74052 60887 33067 65870 58842 65720 82673 35489 52492 42411 73148 55872 72334 50436 50157 29897 36824 208477 252692
hsa-miR-103a-1-5p 6 2 4 0 1 4 2 3 4 4 0 0 1 0 3 4 222 4
hsa-miR-105-5p 94 67 195 23 93 96 64 50 103 74 177 43 95 93 129 124 188 3109 59 41 370 360
hsa-miR-105-3p 7 5 16 4 14 6 7 5 11 13 12 5 7 2 8 5 11 650 44
hsa-miR-106a-5p 99 71 304 88 191 114 43 249 146 128 125 164 83 83 130 104 159 682 63 128 292 575
hsa-miR-106a-3p 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 00
hsa-miR-107 12864 12924 25287 10703 16544 12143 8198 13385 13381 13838 20856 7270 10120 10615 15478 12161 22185 9413 10800 6737 8071 37333 47477
hsa-miR-16-2-3p 13 13 23 3 28 11 11 22 15 22 19 11 8 21 15 14 68 110 10 32
hsa-miR-192-5p 10850 10721 16893 7812 16012 8565 5717 11078 9463 14552 16540 8147 7275 8285 9508 10099 11859 7911 7783 5007 6835 23833 26345
hsa-miR-192-3p 3 1 5 1 1 1 1 3 2 7 5 4 1 2 1 1 1 24
hsa-miR-196a-5p 0 0 19 3 11 11 0 9 6 0 5 12 1 8 0 0 30 011 0 1
hsa-miR-196a-1-3p 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 00
hsa-miR-197-5p 2 7 6 3 2 5 4 1 2 1 5 4 2 4 3 4 2 112 14
hsa-miR-197-3p 4641 5186 9789 4043 6037 4628 3665 6083 4062 5562 7040 6654 4797 5322 6538 4461 8605 6048 7292 3844 2793 15747 20189
hsa-miR-199a-5p 140 82 1008 145 269 206 165 646 153 249 885 715 383 207 241 300 643 9596 93 173 723 520
hsa-miR-199a-3p 939 837 5648 918 1410 1120 1047 3302 915 1580 6687 3161 2028 1693 1651 2319 4909 433 2284 496 1079 2042 3434
hsa-miR-208a-3p 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 1 01
hsa-miR-129-5p 6973 6508 13303 3986 6530 7258 3774 5453 7559 7194 7962 3709 6279 4931 10167 6221 6809 4274 4919 3070 4283 64162 33485
hsa-miR-129-1-3p 1734 1327 2143 840 1159 857 773 1283 1321 1454 1387 546 1278 486 1733 870 749 1014 1408 642 784 4986 4616
hsa-miR-148a-5p 89 66 372 67 142 114 63 176 85 120 403 132 115 70 100 135 181 576 41 72 266 286
hsa-miR-148a-3p 6130 5176 25339 4577 8570 6976 5268 12690 6384 7454 37987 11049 9217 4590 5783 8804 11497 2906 10277 2773 5167 16279 19534
hsa-miR-30c-5p 8513 6090 11549 4502 9475 6710 3768 8687 7097 7538 13893 6697 6948 4477 6579 5918 6878 5295 9006 3720 4738 18333 20643
hsa-miR-30c-2-3p 361 296 668 213 521 294 199 357 322 374 798 339 353 324 535 380 823 241 238 149 293 2099 1116 |