For this exercise we will use the epi2me-labs/wf-human-variation pipeline. Find information on the pipeline at https://labs.epi2me.io/workflows/wf-human-variation/
...
Approximate run time: Variable depending on whether it is targeted sequencing or whole genome sequencing, as well as coverage and the individual analyses requested. For instance, a 90X human sample run (options: --snp --sv --mod --str --cnv --phased --sex male
) takes less than 8h with recommended resources.
NOTE: in contrast to the nf-core/sarek pipeline that we used in session 2, the epi2me-labs/wf-human-variation pipeline runs in ‘local’ mode (needs large amount of CPUs and RAM memory), while the nf-core/sarek pipeline will use a ‘pbspro’ mode, where the pipeline will submit individual jobs to the HPC cluster and define the CPUs and memory for each task individually.
...
Copy the script for the exercise:
Code Block |
---|
cp /work/traingtraining/ONTvariants/scripts/launch_ONTvariants_epi2me-labs_wfWF-human-variationHV.pbs . |
Print the content of the script:
...
Line 1: Defines that the script is a bash script.
Lines 2-5: Are commented out with “#” at the beginning and are ignored by bash, however, these PBS lines tell the scholar (PBS Pro) the name of the job (line 2), the number of CPUs and RAM memory to use (line 3), the time to run the script (line 4) and report if there are any errors (line 5).
Line 7: load java required to run cextflow nextflow pipelines.
Line 8: assign up to 4GB memory for the nextflow initial script to use.
Line 9: Tells the job to run on the current directory.
Lines 11-22: Parameters to run the epi2me-labs/wf-human-variation pipeline (refer above for details on each parameter)
...
Monitor the progress of the job:
Code Block |
---|
qjobs |
Once the pipeline has completed you will see the following set of output files in the ‘results’ folder:
Code Block |
---|
.
├── execution
│ ├── report.html
│ ├── timeline.html
│ └── trace.txt
├── jbrowse.json
├── OPTIONAL_FILE
├── SRR17138639.flagstat.tsv
├── SRR17138639.mosdepth.global.dist.txt
├── SRR17138639.mosdepth.summary.txt
├── SRR17138639.readstats.tsv.gz
├── SRR17138639.regions.bed.gz
├── SRR17138639.stats.json
├── SRR17138639.thresholds.bed.gz
├── SRR17138639.wf-human-alignment-report.html
├── SRR17138639.wf-human-snp-report.html
├── SRR17138639.wf-human-sv-report.html
├── SRR17138639.wf_snp_clinvar.vcf
├── SRR17138639.wf_snp.vcf.gz
├── SRR17138639.wf_snp.vcf.gz.tbi
├── SRR17138639.wf_sv.vcf.gz
└── SRR17138639.wf_sv.vcf.gz.tbi |
Let’s inspect the HTML reports for wf-human-alignment-report.html
, wf-human-snp-report.html
and wf-human-sv-report.html
.
NOTE: To proceed, you need to be on QUT’s WiFi network or signed via VPN.
To browse the working folder in the HPC type in the file finder:
Windows PC
Code Block |
---|
\\hpc-fs\work\training\ONTvariants |
Mac
Code Block |
---|
smb://hpc-fs/work/training/ONTvariants |
Now browse to the runs → run3_variant_calling folder → results folder and open the HTML reports.
Code Block |
---|
├── SRR17138639.wf-human-alignment-report.html
├── SRR17138639.wf-human-snp-report.html
├── SRR17138639.wf-human-sv-report.html |
Notes:
Transitions (Ti or Ts) vs transversions (Tv) mutations - typically a Whole Genome Sequencing (WGS) study finds a Ti/Tv ratio of 2.1, while exome studies detect Ti/Tv = 2.8
...
ClinVar = The pipeline reports mutations overlapping known Clinical variants of interest (see:
wf-human-snp-report.html
)Structural variants : The dataset used in this workshop does not contain real SVs, rather it reports Insertions or Deletions in regions where there are “N” bases on chromosome 20. For example:
Code Block |
---|
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SRR17138639
chr20 71803 Sniffles2.DEL.10B2S0 TGAAAAGCTAAATTAAACTAATTAAGCTAAAG N 39.0 PASS PRECISE;SVTYPE=DEL;SVLEN=-32;END=7183
chr20 72147 Sniffles2.INS.7S0 N AAAAGTCAAAAAATAACAGACACTGGTATACAGAAGAAAAGGAACACTTATACAC 40.0 PASS PRECISE;SVTYPE=INS;SV
chr20 97733 Sniffles2.DEL.10B8S0 TAAGTCCCGCATGCATTAGCTATTTGTCTTAATGCTCTG N 39.0 PASS PRECISE;SVTYPE=DEL;SVLEN=-39;END=9777
chr20 105784 Sniffles2.DEL.10BBS0 TGTTGGGCCTGGAGAGAGTGGGACCACCTTTGCCATGGGACTGGGGTGCATCTGTTCTGCAGGCCCTCCTACCTGTAGCCCCTCCGAAGGCCCCTGCCTAG
chr20 161017 Sniffles2.DEL.10C3S0 ATCCATTTCTTCTAGATTTTCTAGTTTATTTGTGTAGAGGTGTTTGTAGTATTCTCTGATGGTAGTTTGTATTTCTGTGGGATCGGTGGTGATATCCCCTT
chr20 173685 Sniffles2.INS.28S0 N GGCAAGTACGGCACTGGGGGGCAGAACCCCCAACTTCTCCATGTCTCTACCCCTTCTCTTTTCTTGGGGAGACTGGCTTTTCCCAACCCCTTC
chr20 173712 Sniffles2.INS.27S0 N TGTCTCTGCCCCTTCTCCACTTTTCTGGGGGCGAGAACCCCCAACCCCTTCTCCTTCACCCTTAGTGGCAATTACCGCTTTTCTGAGGGGCAA
chr20 174783 Sniffles2.INS.2BS0 N GGAGCTTGCTACAAGCGCCAGAAATCTGGCCACCAGGCCAAGAATGTCCGCAGCCTGGGATTCCTCCTAAGCCGCGTCCCATCTGTGAAGGAC
chr20 175777 Sniffles2.INS.2DS0 N GGATACTTTTTGACTTCGAAACCTGGTTTTGCCATCCTAATAAAACCATTATATAAACTCACAAAAAGGAAACCTAGCTGACCCCATAGATCC
chr20 176457 Sniffles2.INS.30S0 N AATTGACTTTACTCACATGCCCCGGATCAGAAAACTAAAATACCTCTTAGTCTAGGTAGACACTTTCACTGGATAGGTAGGGCCTTTCCCACA
chr20 176457 Sniffles2.INS.2FS0 N TGAGATGCTACAGGAGTGGTCCATTTGAACTTTTATATGGACACTTTCTTGCTTGGCCCCAACCTCATCCCAGACACCAGCCCTCTAGGTGAC
chr20 177062 Sniffles2.DEL.10CAS0 GCCCAACTACACACATCACTGAAACAATAGGAGCCTTCCAGCTACATATTACAGACAAGCCCTCTATCAATACTGGCAAACTTAAAAACATTAGCTGTAAT
chr20 178476 Sniffles2.DEL.10CDS0 GAAGTAACTGAAGAATCACCAAAGAAGTGAAAGTGGCCT N 59.0 PASS PRECISE;SVTYPE=DEL;SVLEN=-39;END=1785
chr20 178495 Sniffles2.INS.37S0 N AAAAGAATGAATATGCCCTGCCCCACCTTAACTGATGACATTCCACCACAAAGAAGTGTAAATGGCCGGTCATGCACCTTAACTGATGACATT
chr20 183223 Sniffles2.INS.38S0 N ATCAAAAAGCCATTCAAATGGATTCACAGCTGAATTCTACCAGATGTATAAAGAACTGATACCAACTTATTGAAACTATTCCAAAATACGGAG
chr20 184725 Sniffles2.INS.3DS0 N AAAGCATTGAGATGTTTATGTGTATGCATATCCAAAAAGCACAGCATAATCCTTTACATTGTCTATGATGCCAAGACCTTTGTTCACGTGTTT
chr20 185082 Sniffles2.INS.3BS0 N AAGGAAGAAAACCAGGCTGGGCACAACGGCTCATGCCTCAAATCTCAATACATTGGCAAGCCAAGTAGAGGATCATTTGTTTCTCAGTTGTTC
chr20 190692 Sniffles2.DUP.2114S0 N <DUP> 53.0 PASS PRECISE;SVTYPE=DUP;SVLEN=25400936;END=25591628;SUPPORT=8;RNAMES=SRR17 |