Session 3 - Variant analysis using Nanopore data

Data

Experiment Accession

sample

FASTQ

Experiment Title

Organism Name

Instrument

Submitter

Study Accession

Study Title

Sample Accession

Total Size, Mb

Total Spots

Total Bases

Library Strategy

Library Source

Library Selection

SRX14748451

S1

SRR18645307

Homo sapiens

MinION

Drexel University

SRP367676

Multiplex structural variant detection by whole-genome mapping and nanopore sequencing.

SRS12509856

821.1

348226

972620520

OTHER

GENOMIC

other

SRX19406878

S2

SRR23513621

NA12878 DNA sequencing from nanopore WSG consortium - basecalled sequences (Guppy 6.1.3 super accuracy)

Homo sapiens

MinION

Garvan Institute of Medical Research

SRP421403

Curated publicly available nanopore datasets

SRS16801715

78526.8

11173458

97545895593

WGS

GENOMIC

RANDOM

ERX8211413

S3

ERR8578833

MinION sequencing

Homo sapiens

MinION

the university of hong kong

ERP135493

Target enrichment sequencing and variant calling on medical exome using ONT MinION

ERS10590135

8961.02

9636172

10382057986

Targeted-Capture

GENOMIC

PCR

ERX8211414

S4

ERR8578834

MinION sequencing

Homo sapiens

MinION

the university of hong kong

ERP135493

Target enrichment sequencing and variant calling on medical exome using ONT MinION

ERS10590135

10669.72

10644000

12212807287

Targeted-Capture

GENOMIC

PCR

SRX13322984

S5

SRR17138639

Nanopore targeted sequencing (ReadUntil/ReadFish) of NA12878-HG001- basecalled sequences

Homo sapiens

MinION

Garvan Institute of Medical Research

SRP349335

Comprehensive genetic diagnosis of tandem repeat expansion disorders with programmable targeted nanopore sequencing

SRS11230712

6629.97

5513156

7815960904

WGS

GENOMIC

other

SRX13323057

S6

SRR17138566

Nanopore targeted sequencing (ReadUntil/ReadFish) of NA12878-HG001- basecalled sequences

Homo sapiens

MinION

Garvan Institute of Medical Research

SRP349335

Comprehensive genetic diagnosis of tandem repeat expansion disorders with programmable targeted nanopore sequencing

SRS11230747

17107.98

12278391

20238395479

WGS

GENOMIC

other

Mapping

Let’s run the --help option of the pipeline to get information on the available parameters

module load java
nextflow run epi2me-labs/wf-alignment -profile singularity --help

N E X T F L O W  ~  version 23.12.0-edge
Launching `https://github.com/epi2me-labs/wf-alignment` [nostalgic_galileo] DSL2 - revision: e1fd7a51dc [master]
WARN: Config setting `prov.formats` is not defined, no provenance reports will be produced

||||||||||   _____ ____ ___ ____  __  __ _____      _       _
||||||||||  | ____|  _ \_ _|___ \|  \/  | ____|    | | __ _| |__  ___
|||||       |  _| | |_) | |  __) | |\/| |  _| _____| |/ _` | '_ \/ __|
|||||       | |___|  __/| | / __/| |  | | |__|_____| | (_| | |_) \__ \
||||||||||  |_____|_|  |___|_____|_|  |_|_____|    |_|\__,_|_.__/|___/
||||||||||  wf-alignment v1.1.2-ge1fd7a5
--------------------------------------------------------------------------------
Typical pipeline command:

  nextflow run epi2me-labs/wf-alignment \ 
        --fastq 'wf-alignment-demo/fastq' \ 
        --references 'wf-alignment-demo/references'

Input Options
  --fastq                [string]  FASTQ files to use in the analysis.
  --bam                  [string]  BAM or unaligned BAM (uBAM) files to use in the analysis.
  --analyse_unclassified [boolean] Analyse unclassified reads from input directory. By default the workflow will not process reads in the unclassified 
                                   directory. 
  --references           [string]  Path to a directory containing FASTA reference files.
  --reference_mmi_file   [string]  Path to an MMI index file to be used as reference.
  --counts               [string]  Path to a CSV file containing expected counts as a control.

Sample Options
  --sample_sheet         [string]  A CSV file used to map barcodes to sample aliases. The sample sheet can be provided when the input data is a directory 
                                   containing sub-directories with FASTQ files. 
  --sample               [string]  A single sample name for non-multiplexed data. Permissible if passing a single .fastq(.gz) file or directory of .fastq(.gz) 
                                   files. 

Output Options
  --out_dir              [string]  Directory for output of all workflow results. [default: output]
  --prefix               [string]  Optional prefix attached to each of the output filenames.

Advanced options
  --depth_coverage       [boolean] Calculate depth coverage statistics and include them in the report. [default: true]
  --minimap_preset       [choice]  Pre-defined parameter sets for `minimap2`, covering most common use cases. [default: dna]
                                   * dna
                                   * rna
  --minimap_args         [string]  String of command line arguments to be passed on to `minimap2`.

Miscellaneous Options
  --threads              [integer] Number of CPU threads to use for the alignment step. [default: 4]
  --disable_ping         [boolean] Enable to prevent sending a workflow ping.

Other parameters
  --monochrome_logs      [boolean] null
  --validate_params      [boolean] null [default: true]
  --show_hidden_params   [boolean] null

!! Hiding 4 params, use --show_hidden_params to show them !!
--------------------------------------------------------------------------------
If you use epi2me-labs/wf-alignment for your analysis please cite:

* The nf-core framework
  https://doi.org/10.1038/s41587-020-0439-x

Variant calling

nextflow run epi2me-labs/wf-human-variation -profile singularity --help

N E X T F L O W  ~  version 23.12.0-edge
Launching `https://github.com/epi2me-labs/wf-human-variation` [amazing_fourier] DSL2 - revision: 5651930a05 [master]
WARN: Config setting `prov.formats` is not defined, no provenance reports will be produced

||||||||||   _____ ____ ___ ____  __  __ _____      _       _
||||||||||  | ____|  _ \_ _|___ \|  \/  | ____|    | | __ _| |__  ___
|||||       |  _| | |_) | |  __) | |\/| |  _| _____| |/ _` | '_ \/ __|
|||||       | |___|  __/| | / __/| |  | | |__|_____| | (_| | |_) \__ \
||||||||||  |_____|_|  |___|_____|_|  |_|_____|    |_|\__,_|_.__/|___/
||||||||||  wf-human-variation v2.2.0-g5651930
--------------------------------------------------------------------------------
Typical pipeline command:

  nextflow run epi2me-labs/wf-human-variation \ 
        --bam 'wf-human-variation-demo/demo.bam' \ 
        --basecaller_cfg 'dna_r10.4.1_e8.2_400bps_hac_prom' \ 
        --mod \ 
        --ref 'wf-human-variation-demo/demo.fasta' \ 
        --sample_name 'DEMO' \ 
        --snp \ 
        --sv

Workflow Options
  --sv                         [boolean] Call for structural variants.
  --snp                        [boolean] Call for small variants
  --cnv                        [boolean] Call for copy number variants.
  --str                        [boolean] Enable Straglr to genotype STR expansions.
  --mod                        [boolean] Enable output of modified calls to a bedMethyl file [requires input BAM with Ml and Mm tags]

Main options
  --sample_name                [string]  Sample name to be displayed in workflow outputs. [default: SAMPLE]
  --bam                        [string]  BAM or unaligned BAM (uBAM) files for the sample to use in the analysis.
  --ref                        [string]  Path to a reference FASTA file.
  --basecaller_cfg             [choice]  Name of the model to use for selecting a small variant calling model. [default: 
                                         dna_r10.4.1_e8.2_400bps_sup@v4.1.0] 
                                         * dna_r10.4.1_e8.2_260bps_fast@v4.1.0
                                         * dna_r10.4.1_e8.2_260bps_hac@v4.1.0
                                         * dna_r10.4.1_e8.2_260bps_sup@v4.1.0
                                         * dna_r10.4.1_e8.2_400bps_fast@v4.1.0
                                         * dna_r10.4.1_e8.2_400bps_fast@v4.2.0
                                         * dna_r10.4.1_e8.2_400bps_fast@v4.3.0
                                         * dna_r10.4.1_e8.2_400bps_hac@v4.1.0
                                         * dna_r10.4.1_e8.2_400bps_hac@v4.3.0
                                         * dna_r10.4.1_e8.2_400bps_sup@v4.1.0
                                         * dna_r10.4.1_e8.2_400bps_sup@v4.3.0
                                         * dna_r9.4.1_e8_fast@v3.4
                                         * dna_r9.4.1_e8_hac@v3.3
                                         * dna_r9.4.1_e8_sup@v3.3
                                         * dna_r9.4.1_e8_sup@v3.6
                                         * custom
                                         * dna_r10.4.1_e8.2_260bps_hac@v4.0.0
                                         * dna_r10.4.1_e8.2_260bps_sup@v4.0.0
                                         * dna_r10.4.1_e8.2_400bps_hac
                                         * dna_r10.4.1_e8.2_400bps_hac@v3.5.2
                                         * dna_r10.4.1_e8.2_400bps_hac@v4.0.0
                                         * dna_r10.4.1_e8.2_400bps_hac@v4.2.0
                                         * dna_r10.4.1_e8.2_400bps_hac_prom
                                         * dna_r10.4.1_e8.2_400bps_sup@v3.5.2
                                         * dna_r10.4.1_e8.2_400bps_sup@v4.0.0
                                         * dna_r10.4.1_e8.2_400bps_sup@v4.2.0
                                         * dna_r9.4.1_450bps_hac
                                         * dna_r9.4.1_450bps_hac_prom
  --bam_min_coverage           [number]  Minimum read coverage required to run analysis. [default: 20]
  --bed                        [string]  An optional BED file enumerating regions to process for variant calling.
  --annotation                 [boolean] SnpEff annotation. [default: true]
  --phased                     [boolean] Perform phasing.
  --include_all_ctgs           [boolean] Call for variants on all sequences in the reference, otherwise small and structural variants will only be called on 
                                         chr{1..22,X,Y,MT}. 
  --output_gene_summary        [boolean] If set to true, the workflow will generate gene-level coverage summaries.
  --out_dir                    [string]  Directory for output of all workflow results. [default: output]

Structural variant calling options
  --tr_bed                     [string]  Input BED file containing tandem repeat annotations for the reference genome.

Structural variant benchmarking options
  --sv_benchmark               [boolean] Benchmark called structural variants.

Copy number variant calling options
  --use_qdnaseq                [boolean] Use QDNAseq for CNV calling.
  --qdnaseq_bin_size           [choice]  Bin size for QDNAseq in kbp. [default: 500]
                                         * 1
                                         * 5
                                         * 10
                                         * 15
                                         * 30
                                         * 50
                                         * 100
                                         * 500
                                         * 1000

Modified base calling options
  --force_strand               [boolean] Require modkit to call strand-aware modifications.

Short tandem repeat expansion genotyping options
  --sex                        [choice]  Sex (XX or XY) to be passed to Straglr-genotype.
                                         * XY
                                         * XX

Advanced Options
  --depth_intervals            [boolean] Output a bedGraph file with entries for each genomic interval featuring homogeneous depth.
  --GVCF                       [boolean] Enable to output a gVCF file in addition to the VCF outputs (experimental).
  --downsample_coverage        [boolean] Downsample the coverage to along the genome.
  --downsample_coverage_target [number]  Average coverage or reads to use for the analyses. [default: 60]

Multiprocessing Options
  --threads                    [integer] Set max number of threads to use for more intense processes (limited by config executor cpus) [default: 4]
  --ubam_map_threads           [integer] Set max number of threads to use for aligning reads from uBAM (limited by config executor cpus) [default: 8]
  --ubam_sort_threads          [integer] Set max number of threads to use for sorting and indexing aligned reads from uBAM (limited by config executor cpus) 
                                         [default: 3] 
  --ubam_bam2fq_threads        [integer] Set max number of threads to use for uncompressing uBAM and generating FASTQ for alignment (limited by config executor 
                                         cpus) [default: 1] 
  --merge_threads              [integer] Set max number of threads to use for merging alignment files (limited by config executor cpus) [default: 4]
  --modkit_threads             [integer] Total number of threads to use in modkit modified base calling (limited by config executor cpus) [default: 4]

Miscellaneous Options
  --disable_ping               [boolean] Enable to prevent sending a workflow ping.

Other parameters
  --monochrome_logs            [boolean] null
  --validate_params            [boolean] null [default: true]
  --show_hidden_params         [boolean] null

!! Hiding 28 params, use --show_hidden_params to show them !!
--------------------------------------------------------------------------------
If you use epi2me-labs/wf-human-variation for your analysis please cite:

* The nf-core framework
  https://doi.org/10.1038/s41587-020-0439-x