Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Overview

  • Similar to exercise 6.4 we will:

    • Use created “samplesheet.csv” metadata file for small RNAseq datasets in exercise 6.4.

    • Use a “nextflow.config” file in the working directory to override Nextflow parameters (e.g., specify where to find the pipeline assets).

    • Use a PBS script to run the expression profiling of miRNAs against MirGeneDB, a curated database that includes experimentally validated miRNAs.

Prepare pipeline inputs

Let’s move to the working directory:

...

Code Block
results/
├── bowtie_index
│   ├── mirna_hairpin
│   └── mirna_mature
├── fastp
│   └── on_raw
├── fastqc
│   ├── raw
│   └── trimmed
├── mirna_quant
│   ├── bam
│   ├── edger_qc    <----- Expression mature miRNA (mature_counts.csv) and precursor-miRNAs (haripin_counts.csv)
counts can be found in this subfolder. 
│   ├── mirtop
│   ├── reference
│   └── seqcluster
├── mirtrace
│   ├── mirtrace-report.html
│   ├── mirtrace-results.json
│   ├── mirtrace-stats-contamination_basic.tsv
│   ├── mirtrace-stats-contamination_detailed.tsv
│   ├── mirtrace-stats-length.tsv
│   ├── mirtrace-stats-mirna-complexity.tsv
│   ├── mirtrace-stats-phred.tsv
│   ├── mirtrace-stats-qcstatus.tsv
│   ├── mirtrace-stats-rnatype.tsv
│   ├── qc_passed_reads.all.collapsed
│   └── qc_passed_reads.rnatype_unknown.collapsed
├── multiqc
│   ├── multiqc_data
│   ├── multiqc_plots
│   └── multiqc_report.html
└── pipeline_info
    ├── execution_report_2024-08-2026_1614-5538-5310.html
    ├── execution_timelinereport_2024-08-2026_1617-5544-5350.html
    ├── execution_tracetimeline_2024-08-2026_1614-5538-5310.txthtml
    ├── nfexecution_core_smrnaseq_software_mqc_versions.ymltimeline_2024-08-26_17-44-50.html
    ├── execution_trace_2024-08-26_17-44-50.txt
    ├── nf_core_smrnaseq_software_mqc_versions.yml
    ├── params_2024-08-2026_1617-5645-0400.json
    ├── pipeline_dag_2024-08-26_14-38-10.html
    └── pipeline_dag_2024-08-2026_1617-5544-5350.html

The quantification of the mature miRNA and hairpin expressions can be found in the /results/mirna_quant/edger_qc directory.

Code Block
cd /results/mirna_quant/edger_qc
Code Block
├── hairpin_counts.csv
├── hairpin_CPM_heatmap.pdf
├── hairpin_edgeR_MDS_distance_matrix.txt
├── hairpin_edgeR_MDS_plot_coordinates.txt
├── hairpin_edgeR_MDS_plot.pdf
├── hairpin_log2CPM_sample_distances_dendrogram.pdf
├── hairpin_log2CPM_sample_distances_heatmap.pdf
├── hairpin_log2CPM_sample_distances.txt
├── hairpin_logtpm.csv
├── hairpin_logtpm.txt
├── hairpin_normalized_CPM.txt
├── hairpin_unmapped_read_counts.txt
├── mature_counts.csv      <----- Expression matureprofile miRNA.of This file will be used to identify differentially expressed miRNAs (Session 7)
mature miRNAs.
├── mature_CPM_heatmap.pdf
├── mature_edgeR_MDS_distance_matrix.txt
├── mature_edgeR_MDS_plot_coordinates.txt
├── mature_edgeR_MDS_plot.pdf
├── mature_log2CPM_sample_distances_dendrogram.pdf
├── mature_log2CPM_sample_distances_heatmap.pdf
├── mature_log2CPM_sample_distances.txt
├── mature_logtpm.csv
├── mature_logtpm.txt
├── mature_normalized_CPM.txt
└── mature_unmapped_read_counts.txt

...

Note: the “mature_counts.csv” needs to be transposed prior running the statistical analysis. This can be done either user the R script or using a script called “transpose_csv.py”.

Let’s copy the transpose_csv.py script to the working folder:

...