/
2024-2: 7a.1 DE analysis for smallRNAseq against mirBase

2024-2: 7a.1 DE analysis for smallRNAseq against mirBase

Public small RNA-seq data

Species

ENA link

Description

Species

ENA link

Description

Human

https://www.ebi.ac.uk/ena/browser/view/PRJEB5212?show=publications

RNA-seq of micro RNAs (miRNAs) in Human prefrontal cortex to identify differentially expressed miRNAs between Huntington's Disease and control brain samples

1. Connect to an rVDI virtual desktop machine

To access and run an rVDI virtual desktop:

Go to https://rvdi.qut.edu.au/

Click on ‘VMware Horizon HTML Access

Log on with your QUT username and password

*NOTE: you need to be connected to the QUT network first, either being on campus or connecting remotely via VPN.

2. Open PuTTY terminal

  • Click on the PuTTY icon

  • Double-click on “Lyra”

  • Fill your password and connect to the HPC

Precomputed results from session 6:

We ran the small RNA seq samples against the mirBase database and the results can be found at:

/work/training/2024/smallRNAseq/runs/run2_human/results/mirna_quant/edger_qc/mature_counts.csv /work/training/2024/smallRNAseq/data/human_disease/metadata_microRNA.txt

The results of the miRNA profiling can be found in the folder called “mirna_quant/edger_qc”:

├── results │   ├── bowtie_index │   ├── fastp │   ├── fastqc │   ├── mirna_quant │   ├── mirtrace | ├── multiqc | └── pipeline_info

inside the “mirna_quant/edger_qc” folder find the “mature_counts.csv” file:

hairpin_counts.csv hairpin_CPM_heatmap.pdf hairpin_edgeR_MDS_distance_matrix.txt hairpin_edgeR_MDS_plot_coordinates.txt hairpin_edgeR_MDS_plot.pdf hairpin_log2CPM_sample_distances_dendrogram.pdf hairpin_log2CPM_sample_distances_heatmap.pdf hairpin_log2CPM_sample_distances.txt hairpin_logtpm.csv hairpin_logtpm.txt hairpin_normalized_CPM.txt hairpin_unmapped_read_counts.txt mature_counts.csv <-- we will use this file for the statistical analysis in the next section mature_counts.txt mature_CPM_heatmap.pdf mature_edgeR_MDS_distance_matrix.txt mature_edgeR_MDS_plot_coordinates.txt mature_edgeR_MDS_plot.pdf mature_log2CPM_sample_distances_dendrogram.pdf mature_log2CPM_sample_distances_heatmap.pdf mature_log2CPM_sample_distances.txt mature_logtpm.csv mature_logtpm.txt mature_normalized_CPM.txt mature_unmapped_read_counts.txt

Note: the “mature_counts.csv” needs to be transposed prior to running the statistical analysis. This can be done either user the R script or using a script called “transpose_csv.py”.

Let’s initially create a “DESeq2” folder and copy the files needed for the statistical analysis:

To transpose the initial “mature_counts.csv” file do the following:

Differential expression analysis using RStudio

Differential expression analysis for smRNA-Seq is similar to regular RNA-Seq.

As with the previous RNA-Seq, we will also be running this smRNA-Seq differential expression analysis in RStudio on an rVDI virtual machine. The reason is the same as before - to save time as the required R packages are pre-installed on these virtual machines. And, as before, you can also copy and paste this script to RStudio on your local computer and adapt it to your own dataset.

3. Run analysis script in RStudio

 

a. Open RStudio

b. Create a new R script ('File'->'New File'-> ‘R script’)

c. Hit the save button and save this file in the working directory you created above (H:\workshop\2024-2\session6_smallRNAseq\runs\run1_human_miRBase\DESeq2). Name the R script ‘DESeq2.R’.

d. Copy and paste the entire script from the code window below into your R script.

e. Run the entire script ('Code'-> ‘Run region’ → ‘Run all’)

 

LOAD PACKAGES

IMPORT DATA

OUTLIERS AND BATCH EFFECTS - TRANSFORM DATA

Q1. What line would you change if you wanted to change the threshold for low-coverage transcripts that you are filtering out?

OUTLIERS AND BATCH EFFECTS - VISUALISE DATA (PCA)

Q2. What line would you change if you wanted to show the 99% confidence level ellipses on the PCA?

OUTLIERS AND BATCH EFFECTS - VISUALISE DATA (HEATMAP)

Q3. What line would you change if you wanted to change the colours of the heatmap purple to green?

DIFFERENTIAL EXPRESSION ANALYSIS

Q4. What line(s) would you change if you wanted to adjust the p-value of significantly differentially expresssed genes?

DIFFERENTIAL EXPRESSION ANALYSIS - VISUALISATION (VOLCANO PLOT)

Q5. What line would you change if you wanted to change the dot point size on the Volcano plot?

Q6. What line would you change if you wanted to show different cutoff lines - vertically and horizontally? https://bioconductor.org/packages/release/bioc/manuals/EnhancedVolcano/man/EnhancedVolcano.pdf

DIFFERENTIAL EXPRESSION ANALYSIS - VISUALISATION (HEATMAP)

Q7. What happens when you change the annotation_names_col to T?

DIFFERENTIAL EXPRESSION ANALYSIS - OUTLIER REMOVAL AND REPEATING OF CODE ABOVE (you can run all this at once but it will overwrite the plots in the RStudio plot pane)

Q8. Are there many outliers removed? How would you tell?

All_DEG_Heatmap_normal_Vs_Huntingtons_disease_outliers.tiff
Figure 1. All DEG with outliers included
All_DEG_Heatmap_normal_Vs_Huntingtons_disease.tiff
Figure 2. All DEG without outliers (i.e. removed)

 

 

 

 

 

 

 

 

 

 

 

 

Related content

2024-2: 7b-Exercises - MirGeneDB
2024-2: 7b-Exercises - MirGeneDB
More like this
2024-2 eResearch - Session 7: DE using R for smRNAseq
2024-2 eResearch - Session 7: DE using R for smRNAseq
More like this
2024-2: 7c.1 Running R scripts on HPC
2024-2: 7c.1 Running R scripts on HPC
Read with this
2024 eResearch - Session 6 : Small RNAs: A regulatory network of a broad range of biological processes
2024 eResearch - Session 6 : Small RNAs: A regulatory network of a broad range of biological processes
More like this
2024-2: 7c.2 Installing other R packages on HPC
2024-2: 7c.2 Installing other R packages on HPC
Read with this
eResearch - Session 4 - Hands-on smRNAseq training
eResearch - Session 4 - Hands-on smRNAseq training
More like this