2024-2: 7a.1 DE analysis for smallRNAseq against mirBase
Public small RNA-seq data
Species | ENA link | Description |
---|---|---|
Human | https://www.ebi.ac.uk/ena/browser/view/PRJEB5212?show=publications | RNA-seq of micro RNAs (miRNAs) in Human prefrontal cortex to identify differentially expressed miRNAs between Huntington's Disease and control brain samples |
1. Connect to an rVDI virtual desktop machine
To access and run an rVDI virtual desktop:
Go to https://rvdi.qut.edu.au/
Click on ‘VMware Horizon HTML Access’
Log on with your QUT username and password
*NOTE: you need to be connected to the QUT network first, either being on campus or connecting remotely via VPN.
2. Open PuTTY terminal
Click on the PuTTY icon
Double-click on “Lyra”
Fill your password and connect to the HPC
Precomputed results from session 6:
We ran the small RNA seq samples against the mirBase database and the results can be found at:
/work/training/2024/smallRNAseq/runs/run2_human/results/mirna_quant/edger_qc/mature_counts.csv
/work/training/2024/smallRNAseq/data/human_disease/metadata_microRNA.txt
The results of the miRNA profiling can be found in the folder called “mirna_quant/edger_qc”:
├── results
│ ├── bowtie_index
│ ├── fastp
│ ├── fastqc
│ ├── mirna_quant
│ ├── mirtrace
| ├── multiqc
| └── pipeline_info
inside the “mirna_quant/edger_qc” folder find the “mature_counts.csv” file:
hairpin_counts.csv
hairpin_CPM_heatmap.pdf
hairpin_edgeR_MDS_distance_matrix.txt
hairpin_edgeR_MDS_plot_coordinates.txt
hairpin_edgeR_MDS_plot.pdf
hairpin_log2CPM_sample_distances_dendrogram.pdf
hairpin_log2CPM_sample_distances_heatmap.pdf
hairpin_log2CPM_sample_distances.txt
hairpin_logtpm.csv
hairpin_logtpm.txt
hairpin_normalized_CPM.txt
hairpin_unmapped_read_counts.txt
mature_counts.csv <-- we will use this file for the statistical analysis in the next section
mature_counts.txt
mature_CPM_heatmap.pdf
mature_edgeR_MDS_distance_matrix.txt
mature_edgeR_MDS_plot_coordinates.txt
mature_edgeR_MDS_plot.pdf
mature_log2CPM_sample_distances_dendrogram.pdf
mature_log2CPM_sample_distances_heatmap.pdf
mature_log2CPM_sample_distances.txt
mature_logtpm.csv
mature_logtpm.txt
mature_normalized_CPM.txt
mature_unmapped_read_counts.txt
Note: the “mature_counts.csv” needs to be transposed prior to running the statistical analysis. This can be done either user the R script or using a script called “transpose_csv.py”.
Let’s initially create a “DESeq2” folder and copy the files needed for the statistical analysis:
To transpose the initial “mature_counts.csv” file do the following:
Differential expression analysis using RStudio
Differential expression analysis for smRNA-Seq is similar to regular RNA-Seq.
As with the previous RNA-Seq, we will also be running this smRNA-Seq differential expression analysis in RStudio on an rVDI virtual machine. The reason is the same as before - to save time as the required R packages are pre-installed on these virtual machines. And, as before, you can also copy and paste this script to RStudio on your local computer and adapt it to your own dataset.
3. Run analysis script in RStudio
a. Open RStudio
b. Create a new R script ('File'->'New File'-> ‘R script’)
c. Hit the save button and save this file in the working directory you created above (H:\workshop\2024-2\session6_smallRNAseq\runs\run1_human_miRBase\DESeq2). Name the R script ‘DESeq2.R’.
d. Copy and paste the entire script from the code window below into your R script.
e. Run the entire script ('Code'-> ‘Run region’ → ‘Run all’)
LOAD PACKAGES
IMPORT DATA
OUTLIERS AND BATCH EFFECTS - TRANSFORM DATA
Q1. What line would you change if you wanted to change the threshold for low-coverage transcripts that you are filtering out?
OUTLIERS AND BATCH EFFECTS - VISUALISE DATA (PCA)
Q2. What line would you change if you wanted to show the 99% confidence level ellipses on the PCA?
OUTLIERS AND BATCH EFFECTS - VISUALISE DATA (HEATMAP)
Q3. What line would you change if you wanted to change the colours of the heatmap purple to green?
DIFFERENTIAL EXPRESSION ANALYSIS
Q4. What line(s) would you change if you wanted to adjust the p-value of significantly differentially expresssed genes?
DIFFERENTIAL EXPRESSION ANALYSIS - VISUALISATION (VOLCANO PLOT)
Q5. What line would you change if you wanted to change the dot point size on the Volcano plot?
Q6. What line would you change if you wanted to show different cutoff lines - vertically and horizontally? https://bioconductor.org/packages/release/bioc/manuals/EnhancedVolcano/man/EnhancedVolcano.pdf
DIFFERENTIAL EXPRESSION ANALYSIS - VISUALISATION (HEATMAP)
Q7. What happens when you change the annotation_names_col to T?
DIFFERENTIAL EXPRESSION ANALYSIS - OUTLIER REMOVAL AND REPEATING OF CODE ABOVE (you can run all this at once but it will overwrite the plots in the RStudio plot pane)
Q8. Are there many outliers removed? How would you tell?
|
---|
|
|
---|
|
---|
|
|
---|