Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Overview of today’s session:

...

  • inspect the results from Session 2

  • run an advanced RNA-seq pipeline to measure the expression of genes

  • (optional) run statistical analysis to identify differentially expressed genes

Task 1: Evaluation of RNA-seq results using a basic (generic) nextflow pipeline

The nextflow/RNA-seq pipeline automatically generates two output folders:

...

Code Block
fastqc/
trimgalore/
multiqc/
star_salmon/
pipeline_info/

Fastqc FASTQC Report - assessing the quality of input reads

For example; Read1:

View file
namecontrol_r1_1_fastqc.html

Read 2:

...

Connect to the work folder via HPC-FS (See session 2). Browse to the fastqc output folder: run1_star_salmonresultsfastqc. Then click on the HTML reports for each file to assess the quality of raw data. You may also copy the files to your laptop by simply drag-and-drop to a relevant folder.

The main items to verify are denoted below.

  • Per base sequence quality:

    • Inspect the overall quality of the generated data per nucleotide position.

    • Reads with a quality score above 20 (Q20) are 90.0% accurate, and those with >= Q30 are 99.9% accurate.

    • For most applications, it is recommended to set a quality trimming score of 30. Note, by default the pipeline will remove poor quality reads and bases below Q20.

  • Per base sequence content:

    • Determine if biases in the distribution of A, T, C, and G nucleotides are present on either the 5'-end and 3'-end of the reads

    • Recommendation: remove the first 10 nucleotides from the 5'-end (hexamer primer bias during PCR amplification) and 2 nucleotides from the 3'-end of reads (these bases can interfere with the proper mapping of reads onto reference genomes/transcriptomes).

  • Check other items reported in the FASTQC report such as level of duplication, highly abundant sequences, and presence of adapter sequences.

MultiQC Report - provides an overview of the quality, trimming, mapping, PCA, and many informative statistics of all files in the experiment in a single report.

Connect to the work folder via HPC-FS (See session 2). Browse to the fastqc output folder: run1_star_salmonresultsmultiqc.