Table of Contents | ||
---|---|---|
|
...
In this section we’ll be revisiting the variant calling workflows from sessions 2 and 3. Specifically we’ll be doing some further analysis of the nfcore/sarek output.
You downloaded the In the ‘2.Setup’ section, you downloaded 3 nfcore/sarek runs to H:/workshop/sarek/runs. These are: run2_trio, run3_liver and run4_trio2.
...
represent sarek variant calling results from the 3 exercises run in session 2:
Exercise 2: Running the sarek variant calling pipeline with a family trio data (NA12878, NA12891, NA12892)
Exercise 3: Running the sarek variant calling pipeline with liver samples
Exercise 4: Use learned skills to prepare and run variant analysis for a second family trio (HG005, HG006, HG007)
See ‘Session 2 - Variant calling analysis’ for more details on the experimental datasets.
SNPeff and SNPSift
There is a variety of ways to analyse variant data, but we’ll specifically be looking at SNPeff results, which is a variant annotation and functional effect prediction tool that is run as part of the sarek pipeline.
https://pcingola.github.io/SnpEff/
SNPeff annotates each variant according to the annotation information available for the genome - whether the variant is an insertion or deletion, which genomic region it falls under (introns, exons, splice junctions, promoter regions, etc). It also categorises variants as HIGH, MODERATE or LOW impact, depending on how the variant affects codons, amino acids and protein structure.
SNPeff creates its own variant analysis reports (html reports) for each sample, which can be seen here:
H:/workshop/sarek/runs/run2_trio/results/reports/snpeff/haplotypecaller
We’ll be creating our of reports, with some additional functional information, using R.
R Markdown
We’ll be running
In RStudio
Go to:
H:/workshop/sarek/runs/run2_trio
Setup
G:\Other computers\Home PC\Desktop\QUT\Liver_exome_variants\sarek
...