Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Save a copy of the DESeq2.R script into the run3_MirGeneDB folder and edit it as below…

Exercises for you to try:

There is a different database for microRNA that we’ve analysed this dataset against, called MirGeneDB. MirGeneDB is a database of manually curated microRNA genes that have been validated and annotated as initially described in Fromm et al. 2015 and Fromm et al. 2020. MirGeneDB 2.1 includes more than 16,000 microRNA gene entries representing more than 1,500 miRNA families from 75 metazoan species and published in the 2022 NAR database issue.

The output of the MirGeneDB analysis can be found at /work/training/2024/smallRNAseq/runs/run3_MirGeneDB, if you want to practice editing the R scripts we’ve given you to get the same plots as above for this analysis (in preparation for you doing it for your own data).

Precomputed results from session 6:

We ran the small RNA seq samples against the MirGeneDB database and the results can be found at:

Code Block
/work/training/2024/smallRNAseq/runs/run3_MirGeneDB/results/mirna_quant/edger_qc/mature_counts.csv
/work/training/2024/smallRNAseq/data/human_disease/metadata_microRNA.txt

Let’s create a “DESeq2” folder and copy the files needed for the statistical analysis:

Code Block
cp $HOME/workshop/2024-2/session6_smallRNAseq/scripts/transpose_csv.py $HOME/workshop/2024-2/session6_smallRNAseq/runs/run2_human_MirGeneDB/DESeq2
cp $HOME/workshop/2024-2/session6_smallRNAseq/data/metadata_microRNA.txt $HOME/workshop/2024-2/session6_smallRNAseq/runs/run2_human_MirGeneDB/DESeq2
cp /work/training/2024/smallRNAseq/runs/run3_MirGeneDB/results/mirna_quant/edger_qc/mature_counts.csv $HOME/workshop/2024-2/session6_smallRNAseq/runs/run2_human_MirGeneDB/DESeq2
cd $HOME/workshop/2024-2/session6_smallRNAseq/runs/run2_human_MirGeneDB/DESeq2

To transpose the initial “mature_counts.csv” file do the following:

Code Block
python transpose_csv.py --input mature_counts.csv --out mature_counts.txt

Differential expression analysis using RStudio

Run analysis script in RStudio

 

Pre-steps: Open RStudio, Create a new R script ('File'->'New File'-> ‘R script’), Hit the save button and save this file in the working directory you created above (H:\workshop\2024-2\session6_smallRNAseq\runs\run2_human_MirGeneDB\DESeq2). Name the R script ‘DESeq2.R’.

Step 1: LOAD PACKAGES

Step 2: IMPORT DATA

Step 3: OUTLIERS AND BATCH EFFECTS - TRANSFORM DATA: remove low-coverage transcripts below 20

Step 4: OUTLIERS AND BATCH EFFECTS - VISUALISE DATA (PCA): change the confidence interval ellipse on the PCA to 99%

Step 5: OUTLIERS AND BATCH EFFECTS - VISUALISE DATA (HEATMAP): change the colours in the heatmap to something you like better - https://www.colorhexa.com/11dd66

Step 6: DIFFERENTIAL EXPRESSION ANALYSIS: change the p-value of the significantly differentially expressed genes to 0.01

Step 7: DIFFERENTIAL EXPRESSION ANALYSIS - VISUALISATION (VOLCANO PLOT): Label the top 30 DE genes instead of top

Step 8: DIFFERENTIAL EXPRESSION ANALYSIS - VISUALISATION (HEATMAP)

Step 9: DIFFERENTIAL EXPRESSION ANALYSIS - OUTLIER REMOVAL AND REPEATING OF CODE ABOVE