Overview
Similar to exercise 6.4 we will:
Use created “samplesheet.csv” metadata file for small RNAseq datasets in exercise 6.4.
Use a “nextflow.config” file in the working directory to override Nextflow parameters (e.g., specify where to find the pipeline assets).
Use a PBS script to run the expression profiling of miRNAs against MirGeneDB, a curated database that includes experimentally validated miRNAs.
Prepare pipeline inputs
Let’s move to the working directory:
...
Code Block |
---|
results/ ├── bowtie_index │ ├── mirna_hairpin │ └── mirna_mature ├── fastp │ └── on_raw ├── fastqc │ ├── raw │ └── trimmed ├── mirna_quant │ ├── bam │ ├── edger_qc <----- Expression mature miRNA (mature_counts.csv) and precursor-miRNAs (haripin_counts.csv) counts can be found in this subfolder. │ ├── mirtop │ ├── reference │ └── seqcluster ├── mirtrace │ ├── mirtrace-report.html │ ├── mirtrace-results.json │ ├── mirtrace-stats-contamination_basic.tsv │ ├── mirtrace-stats-contamination_detailed.tsv │ ├── mirtrace-stats-length.tsv │ ├── mirtrace-stats-mirna-complexity.tsv │ ├── mirtrace-stats-phred.tsv │ ├── mirtrace-stats-qcstatus.tsv │ ├── mirtrace-stats-rnatype.tsv │ ├── qc_passed_reads.all.collapsed │ └── qc_passed_reads.rnatype_unknown.collapsed ├── multiqc │ ├── multiqc_data │ ├── multiqc_plots │ └── multiqc_report.html └── pipeline_info ├── execution_report_2024-08-2026_1614-5538-5310.html ├── execution_timelinereport_2024-08-2026_1617-5544-5350.html ├── execution_tracetimeline_2024-08-2026_1614-5538-5310.txthtml ├── nfexecution_core_smrnaseq_software_mqc_versions.ymltimeline_2024-08-26_17-44-50.html ├── execution_trace_2024-08-26_17-44-50.txt ├── nf_core_smrnaseq_software_mqc_versions.yml ├── params_2024-08-2026_1617-5645-0400.json ├── pipeline_dag_2024-08-26_14-38-10.html └── pipeline_dag_2024-08-2026_1617-5544-5350.html |
The quantification of the mature miRNA and hairpin expressions can be found in the /results/mirna_quant/edger_qc directory.
Code Block |
---|
cd /results/mirna_quant/edger_qc |
Code Block |
---|
├── hairpin_counts.csv ├── hairpin_CPM_heatmap.pdf ├── hairpin_edgeR_MDS_distance_matrix.txt ├── hairpin_edgeR_MDS_plot_coordinates.txt ├── hairpin_edgeR_MDS_plot.pdf ├── hairpin_log2CPM_sample_distances_dendrogram.pdf ├── hairpin_log2CPM_sample_distances_heatmap.pdf ├── hairpin_log2CPM_sample_distances.txt ├── hairpin_logtpm.csv ├── hairpin_logtpm.txt ├── hairpin_normalized_CPM.txt ├── hairpin_unmapped_read_counts.txt ├── mature_counts.csv <----- Expression matureprofile miRNA.of This file will be used to identify differentially expressed miRNAs (Session 7) mature miRNAs. ├── mature_CPM_heatmap.pdf ├── mature_edgeR_MDS_distance_matrix.txt ├── mature_edgeR_MDS_plot_coordinates.txt ├── mature_edgeR_MDS_plot.pdf ├── mature_log2CPM_sample_distances_dendrogram.pdf ├── mature_log2CPM_sample_distances_heatmap.pdf ├── mature_log2CPM_sample_distances.txt ├── mature_logtpm.csv ├── mature_logtpm.txt ├── mature_normalized_CPM.txt └── mature_unmapped_read_counts.txt |
...
Note: the “mature_counts.csv” needs to be transposed prior running the statistical analysis. This can be done either user the R script or using a script called “transpose_csv.py”.
Let’s copy the transpose_csv.py script to the working folder:
...