Content Comparison

Save a copy of the DESeq2.R script into the run3_MirGeneDB folder and edit it as below…

Exercises for you to try:

There is a different database for microRNA that we’ve analysed this dataset against, called MirGeneDB. MirGeneDB is a database of manually curated microRNA genes that have been validated and annotated as initially described in Fromm et al. 2015 and Fromm et al. 2020. MirGeneDB 2.1 includes more than 16,000 microRNA gene entries representing more than 1,500 miRNA families from 75 metazoan species and published in the 2022 NAR database issue.

The output of the MirGeneDB analysis can be found at /work/training/2024/smallRNAseq/runs/run3_MirGeneDB, if you want to practice editing the R scripts we’ve given you to get the same plots as above for this analysis (in preparation for you doing it for your own data).

Precomputed results from session 6:

We ran the small RNA seq samples against the MirGeneDB database and the results can be found at:

Code Block
/work/training/2024/smallRNAseq/runs/run3_MirGeneDB/results/mirna_quant/edger_qc/mature_counts.csv /work/training/2024/smallRNAseq/data/human_disease/metadata_microRNA.txt

Let’s create a “DESeq2” folder and copy the files needed for the statistical analysis:

Code Block

cp $HOME/workshop/2024-2/session6_smallRNAseq/scripts/transpose_csv.py $HOME/workshop/2024-2/session6_smallRNAseq/runs/run2_human_MirGeneDB/DESeq2
cp $HOME/workshop/2024-2/session6_smallRNAseq/data/metadata_microRNA.txt $HOME/workshop/2024-2/session6_smallRNAseq/runs/run2_human_MirGeneDB/DESeq2
cp /work/training/2024/smallRNAseq/runs/run3_MirGeneDB/results/mirna_quant/edger_qc/mature_counts.csv $HOME/workshop/2024-2/session6_smallRNAseq/runs/run2_human_MirGeneDB/DESeq2
cd $HOME/workshop/2024-2/session6_smallRNAseq/runs/run2_human_MirGeneDB/DESeq2

To transpose the initial “mature_counts.csv” file do the following:

Code Block
python transpose_csv.py --input mature_counts.csv --out mature_counts.txt

Differential expression analysis using RStudio

Run analysis script in RStudio

Pre-steps: Open RStudio, Create a new R script ('File'->'New File'-> ‘R script’), Hit the save button and save this file in the working directory you created above (H:\workshop\2024-2\session6_smallRNAseq\runs\run2_human_MirGeneDB\DESeq2). Name the R script ‘DESeq2.R’.

Step 1: LOAD PACKAGES

Step 2: IMPORT DATA

Step 3: OUTLIERS AND BATCH EFFECTS - TRANSFORM DATA: remove low-coverage transcripts below 20

Step 4: OUTLIERS AND BATCH EFFECTS - VISUALISE DATA (PCA): change the confidence interval ellipse on the PCA to 99%

Step 5: OUTLIERS AND BATCH EFFECTS - VISUALISE DATA (HEATMAP): change the colours in the heatmap to something you like better - https://www.colorhexa.com/11dd66

Step 6: DIFFERENTIAL EXPRESSION ANALYSIS: change the p-value of the significantly differentially expressed genes to 0.01

Step 7: DIFFERENTIAL EXPRESSION ANALYSIS - VISUALISATION (VOLCANO PLOT): Label the top 30 DE genes instead of top

Step 8: DIFFERENTIAL EXPRESSION ANALYSIS - VISUALISATION (HEATMAP)

Step 9: DIFFERENTIAL EXPRESSION ANALYSIS - OUTLIER REMOVAL AND REPEATING OF CODE ABOVE

DE and FA analysis for smallRNASeq

Running R scripts on the HPC

Version	Old Version 13	New Version 14
Changes made by	Vicki Thomson	Vicki Thomson
Saved on	Oct 28, 2024	Oct 28, 2024

Versions Compared

Key

Precomputed results from session 6:

Differential expression analysis using RStudio

Run analysis script in RStudio