For this exercise we will use the epi2me-labs/wf-human-variation pipeline. Find information on the pipeline at https://labs.epi2me.io/workflows/wf-human-variation/
...
Approximate run time: Variable depending on whether it is targeted sequencing or whole genome sequencing, as well as coverage and the individual analyses requested. For instance, a 90X human sample run (options: --snp --sv --mod --str --cnv --phased --sex male
) takes less than 8h with recommended resources.
NOTE: in contrast to the nf-core/sarek pipeline that we used in session 2, the epi2me-labs/wf-human-variation pipeline runs in ‘local’ mode (needs large amount of CPUs and RAM memory), while the nf-core/sarek pipeline will use a ‘pbspro’ mode, where the pipeline will submit individual jobs to the HPC cluster and define the CPUs and memory for each task individually.
...
Copy the script for the exercise:
Code Block |
---|
cp /work/traingtraining/ONTvariants/scripts/launch_ONTvariants_epi2me-labs_wfWF-human-variationHV.pbs . |
Print the content of the script:
...
Once the pipeline has completed you will see the following set of output files in the ‘results’ folder:
Code Block |
---|
. ├── ├── DEMOexecution │ ├── report.html │ ├── timeline.html │ └── trace.txt ├── jbrowse.json ├── OPTIONAL_FILE ├── SRR17138639.flagstat.tsv ├── DEMOSRR17138639.mosdepth.global.dist.txt ├── DEMOSRR17138639.mosdepth.summary.txt ├── DEMOSRR17138639.readstats.tsv.gz ├── DEMOSRR17138639.regions.bed.gz ├── DEMOSRR17138639.stats.json ├── DEMOSRR17138639.thresholds.bed.gz ├── DEMOSRR17138639.wf-human-alignment-report.html ├── DEMOSRR17138639.wf-human-snp-report.html ├── DEMOSRR17138639.wf-human-sv-report.html ├── DEMOSRR17138639.wf_snp_clinvar.vcf ├── DEMOSRR17138639.wf_snp.vcf.gz ├── DEMOSRR17138639.wf_snp.vcf.gz.tbi ├── DEMOSRR17138639.wf_sv.vcf.gz ├──└── DEMOSRR17138639.wf_sv.vcf.gz.tbi ├── execution │ ├── report.html │ ├── timeline.html │ └── trace.txt ├── jbrowse.json └── OPTIONAL_FILE |
Let’s inspect the HTML reports for wf-human-alignment-report.html
, wf-human-snp-report.html
and wf-human-sv-report.html
.
NOTE: To proceed, you need to be on QUT’s WiFi network or signed via VPN.
To browse the working folder in the HPC type in the file finder:
...
Now browse to the runs → run3_variant_calling folder → results folder and open the HTML reports.
Code Block |
---|
├── SRR17138639.wf-human-alignment-report.html ├── SRR17138639.wf-human-snp-report.html ├── SRR17138639.wf-human-sv-report.html |
Notes:
Transitions (Ti or Ts) vs transversions (Tv) mutations - typically a Whole Genome Sequencing (WGS) study finds a Ti/Tv ratio of 2.1, while exome studies detect Ti/Tv = 2.8
...
ClinVar = The pipeline reports mutations overlapping known Clinical variants of interest (see:
wf-human-snp-report.html
)Structural variants : The dataset used in this workshop does not contain real SVs, rather it reports Insertions or Deletions in regions where there are “N” bases on chromosome 20. For example:
Code Block |
---|
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SRR17138639
chr20 71803 Sniffles2.DEL.10B2S0 TGAAAAGCTAAATTAAACTAATTAAGCTAAAG N 39.0 PASS PRECISE;SVTYPE=DEL;SVLEN=-32;END=7183
chr20 72147 Sniffles2.INS.7S0 N AAAAGTCAAAAAATAACAGACACTGGTATACAGAAGAAAAGGAACACTTATACAC 40.0 PASS PRECISE;SVTYPE=INS;SV
chr20 97733 Sniffles2.DEL.10B8S0 TAAGTCCCGCATGCATTAGCTATTTGTCTTAATGCTCTG N 39.0 PASS PRECISE;SVTYPE=DEL;SVLEN=-39;END=9777
chr20 105784 Sniffles2.DEL.10BBS0 TGTTGGGCCTGGAGAGAGTGGGACCACCTTTGCCATGGGACTGGGGTGCATCTGTTCTGCAGGCCCTCCTACCTGTAGCCCCTCCGAAGGCCCCTGCCTAG
chr20 161017 Sniffles2.DEL.10C3S0 ATCCATTTCTTCTAGATTTTCTAGTTTATTTGTGTAGAGGTGTTTGTAGTATTCTCTGATGGTAGTTTGTATTTCTGTGGGATCGGTGGTGATATCCCCTT
chr20 173685 Sniffles2.INS.28S0 N GGCAAGTACGGCACTGGGGGGCAGAACCCCCAACTTCTCCATGTCTCTACCCCTTCTCTTTTCTTGGGGAGACTGGCTTTTCCCAACCCCTTC
chr20 173712 Sniffles2.INS.27S0 N TGTCTCTGCCCCTTCTCCACTTTTCTGGGGGCGAGAACCCCCAACCCCTTCTCCTTCACCCTTAGTGGCAATTACCGCTTTTCTGAGGGGCAA
chr20 174783 Sniffles2.INS.2BS0 N GGAGCTTGCTACAAGCGCCAGAAATCTGGCCACCAGGCCAAGAATGTCCGCAGCCTGGGATTCCTCCTAAGCCGCGTCCCATCTGTGAAGGAC
chr20 175777 Sniffles2.INS.2DS0 N GGATACTTTTTGACTTCGAAACCTGGTTTTGCCATCCTAATAAAACCATTATATAAACTCACAAAAAGGAAACCTAGCTGACCCCATAGATCC
chr20 176457 Sniffles2.INS.30S0 N AATTGACTTTACTCACATGCCCCGGATCAGAAAACTAAAATACCTCTTAGTCTAGGTAGACACTTTCACTGGATAGGTAGGGCCTTTCCCACA
chr20 176457 Sniffles2.INS.2FS0 N TGAGATGCTACAGGAGTGGTCCATTTGAACTTTTATATGGACACTTTCTTGCTTGGCCCCAACCTCATCCCAGACACCAGCCCTCTAGGTGAC
chr20 177062 Sniffles2.DEL.10CAS0 GCCCAACTACACACATCACTGAAACAATAGGAGCCTTCCAGCTACATATTACAGACAAGCCCTCTATCAATACTGGCAAACTTAAAAACATTAGCTGTAAT
chr20 178476 Sniffles2.DEL.10CDS0 GAAGTAACTGAAGAATCACCAAAGAAGTGAAAGTGGCCT N 59.0 PASS PRECISE;SVTYPE=DEL;SVLEN=-39;END=1785
chr20 178495 Sniffles2.INS.37S0 N AAAAGAATGAATATGCCCTGCCCCACCTTAACTGATGACATTCCACCACAAAGAAGTGTAAATGGCCGGTCATGCACCTTAACTGATGACATT
chr20 183223 Sniffles2.INS.38S0 N ATCAAAAAGCCATTCAAATGGATTCACAGCTGAATTCTACCAGATGTATAAAGAACTGATACCAACTTATTGAAACTATTCCAAAATACGGAG
chr20 184725 Sniffles2.INS.3DS0 N AAAGCATTGAGATGTTTATGTGTATGCATATCCAAAAAGCACAGCATAATCCTTTACATTGTCTATGATGCCAAGACCTTTGTTCACGTGTTT
chr20 185082 Sniffles2.INS.3BS0 N AAGGAAGAAAACCAGGCTGGGCACAACGGCTCATGCCTCAAATCTCAATACATTGGCAAGCCAAGTAGAGGATCATTTGTTTCTCAGTTGTTC
chr20 190692 Sniffles2.DUP.2114S0 N <DUP> 53.0 PASS PRECISE;SVTYPE=DUP;SVLEN=25400936;END=25591628;SUPPORT=8;RNAMES=SRR17 |