Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

For this exercise we will use the epi2me-labs/wf-human-variation pipeline. Find information on the pipeline at https://labs.epi2me.io/workflows/wf-human-variation/

...

Approximate run time: Variable depending on whether it is targeted sequencing or whole genome sequencing, as well as coverage and the individual analyses requested. For instance, a 90X human sample run (options: --snp --sv --mod --str --cnv --phased --sex male) takes less than 8h with recommended resources.

NOTE: in contrast to the nf-core/sarek pipeline that we used in session 2, the epi2me-labs/wf-human-variation pipeline runs in ‘local’ mode (needs large amount of CPUs and RAM memory), while the nf-core/sarek pipeline will use a ‘pbspro’ mode, where the pipeline will submit individual jobs to the HPC cluster and define the CPUs and memory for each task individually.

...

Copy the script for the exercise:

Code Block
cp /work/traingtraining/ONTvariants/scripts/launch_ONTvariants_epi2me-labs_wfWF-human-variationHV.pbs .

Print the content of the script:

...

  • Line 1: Defines that the script is a bash script.

  • Lines 2-5: Are commented out with “#” at the beginning and are ignored by bash, however, these PBS lines tell the scholar (PBS Pro) the name of the job (line 2), the number of CPUs and RAM memory to use (line 3), the time to run the script (line 4) and report if there are any errors (line 5).

  • Line 7: load java required to run cextflow nextflow pipelines.

  • Line 8: assign up to 4GB memory for the nextflow initial script to use.

  • Line 9: Tells the job to run on the current directory.

  • Lines 11-22: Parameters to run the epi2me-labs/wf-human-variation pipeline (refer above for details on each parameter)

...

Monitor the progress of the job:

Code Block
qjobs

Once the pipeline has completed you will see the following set of output files in the ‘results’ folder:

Code Block
.
├── execution
│   ├── report.html
│   ├── timeline.html
│   └── trace.txt
├── jbrowse.json
├── OPTIONAL_FILE
├── SRR17138639.flagstat.tsv
├── SRR17138639.mosdepth.global.dist.txt
├── SRR17138639.mosdepth.summary.txt
├── SRR17138639.readstats.tsv.gz
├── SRR17138639.regions.bed.gz
├── SRR17138639.stats.json
├── SRR17138639.thresholds.bed.gz
├── SRR17138639.wf-human-alignment-report.html
├── SRR17138639.wf-human-snp-report.html
├── SRR17138639.wf-human-sv-report.html
├── SRR17138639.wf_snp_clinvar.vcf
├── SRR17138639.wf_snp.vcf.gz
├── SRR17138639.wf_snp.vcf.gz.tbi
├── SRR17138639.wf_sv.vcf.gz
└── SRR17138639.wf_sv.vcf.gz.tbi

Let’s inspect the HTML reports for wf-human-alignment-report.html, wf-human-snp-report.htmland wf-human-sv-report.html.

NOTE: To proceed, you need to be on QUT’s WiFi network or signed via VPN.

To browse the working folder in the HPC type in the file finder:

Windows PC

Code Block
\\hpc-fs\work\training\ONTvariants

Mac

Code Block
smb://hpc-fs/work/training/ONTvariants

Now browse to the runs run3_variant_calling folder → results folder and open the HTML reports.

Code Block
├── SRR17138639.wf-human-alignment-report.html
├── SRR17138639.wf-human-snp-report.html
├── SRR17138639.wf-human-sv-report.html

Notes:

  • Transitions (Ti or Ts) vs transversions (Tv) mutations - typically a Whole Genome Sequencing (WGS) study finds a Ti/Tv ratio of 2.1, while exome studies detect Ti/Tv = 2.8

...

  • ClinVar = The pipeline reports mutations overlapping known Clinical variants of interest (see: wf-human-snp-report.html)

  • Structural variants : The dataset used in this workshop does not contain real SVs, rather it reports Insertions or Deletions in regions where there are “N” bases on chromosome 20. For example:

Code Block
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SRR17138639
chr20   71803   Sniffles2.DEL.10B2S0    TGAAAAGCTAAATTAAACTAATTAAGCTAAAG        N       39.0    PASS    PRECISE;SVTYPE=DEL;SVLEN=-32;END=7183
chr20   72147   Sniffles2.INS.7S0       N       AAAAGTCAAAAAATAACAGACACTGGTATACAGAAGAAAAGGAACACTTATACAC 40.0    PASS    PRECISE;SVTYPE=INS;SV
chr20   97733   Sniffles2.DEL.10B8S0    TAAGTCCCGCATGCATTAGCTATTTGTCTTAATGCTCTG N       39.0    PASS    PRECISE;SVTYPE=DEL;SVLEN=-39;END=9777
chr20   105784  Sniffles2.DEL.10BBS0    TGTTGGGCCTGGAGAGAGTGGGACCACCTTTGCCATGGGACTGGGGTGCATCTGTTCTGCAGGCCCTCCTACCTGTAGCCCCTCCGAAGGCCCCTGCCTAG
chr20   161017  Sniffles2.DEL.10C3S0    ATCCATTTCTTCTAGATTTTCTAGTTTATTTGTGTAGAGGTGTTTGTAGTATTCTCTGATGGTAGTTTGTATTTCTGTGGGATCGGTGGTGATATCCCCTT
chr20   173685  Sniffles2.INS.28S0      N       GGCAAGTACGGCACTGGGGGGCAGAACCCCCAACTTCTCCATGTCTCTACCCCTTCTCTTTTCTTGGGGAGACTGGCTTTTCCCAACCCCTTC
chr20   173712  Sniffles2.INS.27S0      N       TGTCTCTGCCCCTTCTCCACTTTTCTGGGGGCGAGAACCCCCAACCCCTTCTCCTTCACCCTTAGTGGCAATTACCGCTTTTCTGAGGGGCAA
chr20   174783  Sniffles2.INS.2BS0      N       GGAGCTTGCTACAAGCGCCAGAAATCTGGCCACCAGGCCAAGAATGTCCGCAGCCTGGGATTCCTCCTAAGCCGCGTCCCATCTGTGAAGGAC
chr20   175777  Sniffles2.INS.2DS0      N       GGATACTTTTTGACTTCGAAACCTGGTTTTGCCATCCTAATAAAACCATTATATAAACTCACAAAAAGGAAACCTAGCTGACCCCATAGATCC
chr20   176457  Sniffles2.INS.30S0      N       AATTGACTTTACTCACATGCCCCGGATCAGAAAACTAAAATACCTCTTAGTCTAGGTAGACACTTTCACTGGATAGGTAGGGCCTTTCCCACA
chr20   176457  Sniffles2.INS.2FS0      N       TGAGATGCTACAGGAGTGGTCCATTTGAACTTTTATATGGACACTTTCTTGCTTGGCCCCAACCTCATCCCAGACACCAGCCCTCTAGGTGAC
chr20   177062  Sniffles2.DEL.10CAS0    GCCCAACTACACACATCACTGAAACAATAGGAGCCTTCCAGCTACATATTACAGACAAGCCCTCTATCAATACTGGCAAACTTAAAAACATTAGCTGTAAT
chr20   178476  Sniffles2.DEL.10CDS0    GAAGTAACTGAAGAATCACCAAAGAAGTGAAAGTGGCCT N       59.0    PASS    PRECISE;SVTYPE=DEL;SVLEN=-39;END=1785
chr20   178495  Sniffles2.INS.37S0      N       AAAAGAATGAATATGCCCTGCCCCACCTTAACTGATGACATTCCACCACAAAGAAGTGTAAATGGCCGGTCATGCACCTTAACTGATGACATT
chr20   183223  Sniffles2.INS.38S0      N       ATCAAAAAGCCATTCAAATGGATTCACAGCTGAATTCTACCAGATGTATAAAGAACTGATACCAACTTATTGAAACTATTCCAAAATACGGAG
chr20   184725  Sniffles2.INS.3DS0      N       AAAGCATTGAGATGTTTATGTGTATGCATATCCAAAAAGCACAGCATAATCCTTTACATTGTCTATGATGCCAAGACCTTTGTTCACGTGTTT
chr20   185082  Sniffles2.INS.3BS0      N       AAGGAAGAAAACCAGGCTGGGCACAACGGCTCATGCCTCAAATCTCAATACATTGGCAAGCCAAGTAGAGGATCATTTGTTTCTCAGTTGTTC
chr20   190692  Sniffles2.DUP.2114S0    N       <DUP>   53.0    PASS    PRECISE;SVTYPE=DUP;SVLEN=25400936;END=25591628;SUPPORT=8;RNAMES=SRR17