For this exercise we will use the epi2me-labs' wf-human-variation pipeline. Find information on the pipeline at https://labs.epi2me.io/workflows/wf-human-variation/
The pipeline is designed to analyse human genomic data. Specifically the workflow can perform the following:
diploid variant calling
structural variant calling
analysis of modified base calls
copy number variant calling
short tandem repeat (STR) expansion genotyping
Compute requirements
Recommended requirements:
CPUs = 32
Memory = 128GB
Minimum requirements:
CPUs = 16
Memory = 32GB
Approximate run time: Variable depending on whether it is targeted sequencing or whole genome sequencing, as well as coverage and the individual analyses requested. For instance, a 90X human sample run (options: --snp --sv --mod --str --cnv --phased --sex male
) takes less than 8h with recommended resources.
NOTE: in contrast to the nf-core/sarek pipeline that we used in session 2, the epi2me-labs/wf-human-variation pipeline runs in ‘local’ mode (needs large amount of CPUs and RAM memory), while the nf-core/sarek pipeline will use a ‘pbspro’ mode, where the pipeline will submit individual jobs to the HPC cluster and define the CPUs and memory for each task individually.
epi2me-labs/wf-human-variation
Nextflow pipelines have the --help option to print the parameter options that are available to the user to analyse input data.
First we need to load java:
Code Block |
---|
module load java |
Now run the following command to see the pipeline options:
Code Block |
---|
nextflow run epi2me-labs/wf-human-variation -profile singularity --help |
Code Block |
---|
N E X T F L O W ~ version 23.12.0-edge
Launching `https://github.com/epi2me-labs/wf-human-variation` [amazing_fourier] DSL2 - revision: 5651930a05 [master]
WARN: Config setting `prov.formats` is not defined, no provenance reports will be produced
|||||||||| _____ ____ ___ ____ __ __ _____ _ _
|||||||||| | ____| _ \_ _|___ \| \/ | ____| | | __ _| |__ ___
||||| | _| | |_) | | __) | |\/| | _| _____| |/ _` | '_ \/ __|
||||| | |___| __/| | / __/| | | | |__|_____| | (_| | |_) \__ \
|||||||||| |_____|_| |___|_____|_| |_|_____| |_|\__,_|_.__/|___/
|||||||||| wf-human-variation v2.2.0-g5651930
--------------------------------------------------------------------------------
Typical pipeline command:
nextflow run epi2me-labs/wf-human-variation \
--bam 'wf-human-variation-demo/demo.bam' \
--basecaller_cfg 'dna_r10.4.1_e8.2_400bps_hac_prom' \
--mod \
--ref 'wf-human-variation-demo/demo.fasta' \
--sample_name 'DEMO' \
--snp \
--sv
Workflow Options
--sv [boolean] Call for structural variants.
--snp [boolean] Call for small variants
--cnv [boolean] Call for copy number variants.
--str [boolean] Enable Straglr to genotype STR expansions.
--mod [boolean] Enable output of modified calls to a bedMethyl file [requires input BAM with Ml and Mm tags]
Main options
--sample_name [string] Sample name to be displayed in workflow outputs. [default: SAMPLE]
--bam [string] BAM or unaligned BAM (uBAM) files for the sample to use in the analysis.
--ref [string] Path to a reference FASTA file.
--basecaller_cfg [choice] Name of the model to use for selecting a small variant calling model. [default:
dna_r10.4.1_e8.2_400bps_sup@v4.1.0]
* dna_r10.4.1_e8.2_260bps_fast@v4.1.0
* dna_r10.4.1_e8.2_260bps_hac@v4.1.0
* dna_r10.4.1_e8.2_260bps_sup@v4.1.0
* dna_r10.4.1_e8.2_400bps_fast@v4.1.0
* dna_r10.4.1_e8.2_400bps_fast@v4.2.0
* dna_r10.4.1_e8.2_400bps_fast@v4.3.0
* dna_r10.4.1_e8.2_400bps_hac@v4.1.0
* dna_r10.4.1_e8.2_400bps_hac@v4.3.0
* dna_r10.4.1_e8.2_400bps_sup@v4.1.0
* dna_r10.4.1_e8.2_400bps_sup@v4.3.0
* dna_r9.4.1_e8_fast@v3.4
* dna_r9.4.1_e8_hac@v3.3
* dna_r9.4.1_e8_sup@v3.3
* dna_r9.4.1_e8_sup@v3.6
* custom
* dna_r10.4.1_e8.2_260bps_hac@v4.0.0
* dna_r10.4.1_e8.2_260bps_sup@v4.0.0
* dna_r10.4.1_e8.2_400bps_hac
* dna_r10.4.1_e8.2_400bps_hac@v3.5.2
* dna_r10.4.1_e8.2_400bps_hac@v4.0.0
* dna_r10.4.1_e8.2_400bps_hac@v4.2.0
* dna_r10.4.1_e8.2_400bps_hac_prom
* dna_r10.4.1_e8.2_400bps_sup@v3.5.2
* dna_r10.4.1_e8.2_400bps_sup@v4.0.0
* dna_r10.4.1_e8.2_400bps_sup@v4.2.0
* dna_r9.4.1_450bps_hac
* dna_r9.4.1_450bps_hac_prom
--bam_min_coverage [number] Minimum read coverage required to run analysis. [default: 20]
--bed [string] An optional BED file enumerating regions to process for variant calling.
--annotation [boolean] SnpEff annotation. [default: true]
--phased [boolean] Perform phasing.
--include_all_ctgs [boolean] Call for variants on all sequences in the reference, otherwise small and structural variants will only be called on
chr{1..22,X,Y,MT}.
--output_gene_summary [boolean] If set to true, the workflow will generate gene-level coverage summaries.
--out_dir [string] Directory for output of all workflow results. [default: output]
Structural variant calling options
--tr_bed [string] Input BED file containing tandem repeat annotations for the reference genome.
Structural variant benchmarking options
--sv_benchmark [boolean] Benchmark called structural variants.
Copy number variant calling options
--use_qdnaseq [boolean] Use QDNAseq for CNV calling.
--qdnaseq_bin_size [choice] Bin size for QDNAseq in kbp. [default: 500]
* 1
* 5
* 10
* 15
* 30
* 50
* 100
* 500
* 1000
Modified base calling options
--force_strand [boolean] Require modkit to call strand-aware modifications.
Short tandem repeat expansion genotyping options
--sex [choice] Sex (XX or XY) to be passed to Straglr-genotype.
* XY
* XX
Advanced Options
--depth_intervals [boolean] Output a bedGraph file with entries for each genomic interval featuring homogeneous depth.
--GVCF [boolean] Enable to output a gVCF file in addition to the VCF outputs (experimental).
--downsample_coverage [boolean] Downsample the coverage to along the genome.
--downsample_coverage_target [number] Average coverage or reads to use for the analyses. [default: 60]
Multiprocessing Options
--threads [integer] Set max number of threads to use for more intense processes (limited by config executor cpus) [default: 4]
--ubam_map_threads [integer] Set max number of threads to use for aligning reads from uBAM (limited by config executor cpus) [default: 8]
--ubam_sort_threads [integer] Set max number of threads to use for sorting and indexing aligned reads from uBAM (limited by config executor cpus)
[default: 3]
--ubam_bam2fq_threads [integer] Set max number of threads to use for uncompressing uBAM and generating FASTQ for alignment (limited by config executor
cpus) [default: 1]
--merge_threads [integer] Set max number of threads to use for merging alignment files (limited by config executor cpus) [default: 4]
--modkit_threads [integer] Total number of threads to use in modkit modified base calling (limited by config executor cpus) [default: 4]
Miscellaneous Options
--disable_ping [boolean] Enable to prevent sending a workflow ping.
Other parameters
--monochrome_logs [boolean] null
--validate_params [boolean] null [default: true]
--show_hidden_params [boolean] null
!! Hiding 28 params, use --show_hidden_params to show them !!
--------------------------------------------------------------------------------
If you use epi2me-labs/wf-human-variation for your analysis please cite:
* The nf-core framework
https://doi.org/10.1038/s41587-020-0439-x |