For this exercise we will use the epi2me-labs/wf-human-variation pipeline. Find information on the pipeline at https://labs.epi2me.io/workflows/wf-human-variation/
The pipeline is designed to analyse human genomic data. Specifically the workflow can perform the following:
diploid variant calling
structural variant calling
analysis of modified base calls
copy number variant calling
short tandem repeat (STR) expansion genotyping
Compute requirements
Recommended requirements:
CPUs = 32
Memory = 128GB
Minimum requirements:
CPUs = 16
Memory = 32GB
Approximate run time: Variable depending on whether it is targeted sequencing or whole genome sequencing, as well as coverage and the individual analyses requested. For instance, a 90X human sample run (options: --snp --sv --mod --str --cnv --phased --sex male
) takes less than 8h with recommended resources.
NOTE: in contrast to the nf-core/sarek pipeline that we used in session 2, the epi2me-labs/wf-human-variation pipeline runs in ‘local’ mode (needs large amount of CPUs and RAM memory), while the nf-core/sarek pipeline will use a ‘pbspro’ mode, where the pipeline will submit individual jobs to the HPC cluster and define the CPUs and memory for each task individually.
epi2me-labs/wf-human-variation
Nextflow pipelines have the --help option to print the parameter options that are available to the user to analyse input data.
First we need to load java:
module load java
Now run the following command to see the pipeline options:
nextflow run epi2me-labs/wf-human-variation -profile singularity --help
N E X T F L O W ~ version 23.12.0-edge Launching `https://github.com/epi2me-labs/wf-human-variation` [amazing_fourier] DSL2 - revision: 5651930a05 [master] WARN: Config setting `prov.formats` is not defined, no provenance reports will be produced |||||||||| _____ ____ ___ ____ __ __ _____ _ _ |||||||||| | ____| _ \_ _|___ \| \/ | ____| | | __ _| |__ ___ ||||| | _| | |_) | | __) | |\/| | _| _____| |/ _` | '_ \/ __| ||||| | |___| __/| | / __/| | | | |__|_____| | (_| | |_) \__ \ |||||||||| |_____|_| |___|_____|_| |_|_____| |_|\__,_|_.__/|___/ |||||||||| wf-human-variation v2.2.0-g5651930 -------------------------------------------------------------------------------- Typical pipeline command: nextflow run epi2me-labs/wf-human-variation \ --bam 'wf-human-variation-demo/demo.bam' \ --basecaller_cfg 'dna_r10.4.1_e8.2_400bps_hac_prom' \ --mod \ --ref 'wf-human-variation-demo/demo.fasta' \ --sample_name 'DEMO' \ --snp \ --sv Workflow Options --sv [boolean] Call for structural variants. --snp [boolean] Call for small variants --cnv [boolean] Call for copy number variants. --str [boolean] Enable Straglr to genotype STR expansions. --mod [boolean] Enable output of modified calls to a bedMethyl file [requires input BAM with Ml and Mm tags] Main options --sample_name [string] Sample name to be displayed in workflow outputs. [default: SAMPLE] --bam [string] BAM or unaligned BAM (uBAM) files for the sample to use in the analysis. --ref [string] Path to a reference FASTA file. --basecaller_cfg [choice] Name of the model to use for selecting a small variant calling model. [default: dna_r10.4.1_e8.2_400bps_sup@v4.1.0] * dna_r10.4.1_e8.2_260bps_fast@v4.1.0 * dna_r10.4.1_e8.2_260bps_hac@v4.1.0 * dna_r10.4.1_e8.2_260bps_sup@v4.1.0 * dna_r10.4.1_e8.2_400bps_fast@v4.1.0 * dna_r10.4.1_e8.2_400bps_fast@v4.2.0 * dna_r10.4.1_e8.2_400bps_fast@v4.3.0 * dna_r10.4.1_e8.2_400bps_hac@v4.1.0 * dna_r10.4.1_e8.2_400bps_hac@v4.3.0 * dna_r10.4.1_e8.2_400bps_sup@v4.1.0 * dna_r10.4.1_e8.2_400bps_sup@v4.3.0 * dna_r9.4.1_e8_fast@v3.4 * dna_r9.4.1_e8_hac@v3.3 * dna_r9.4.1_e8_sup@v3.3 * dna_r9.4.1_e8_sup@v3.6 * custom * dna_r10.4.1_e8.2_260bps_hac@v4.0.0 * dna_r10.4.1_e8.2_260bps_sup@v4.0.0 * dna_r10.4.1_e8.2_400bps_hac * dna_r10.4.1_e8.2_400bps_hac@v3.5.2 * dna_r10.4.1_e8.2_400bps_hac@v4.0.0 * dna_r10.4.1_e8.2_400bps_hac@v4.2.0 * dna_r10.4.1_e8.2_400bps_hac_prom * dna_r10.4.1_e8.2_400bps_sup@v3.5.2 * dna_r10.4.1_e8.2_400bps_sup@v4.0.0 * dna_r10.4.1_e8.2_400bps_sup@v4.2.0 * dna_r9.4.1_450bps_hac * dna_r9.4.1_450bps_hac_prom --bam_min_coverage [number] Minimum read coverage required to run analysis. [default: 20] --bed [string] An optional BED file enumerating regions to process for variant calling. --annotation [boolean] SnpEff annotation. [default: true] --phased [boolean] Perform phasing. --include_all_ctgs [boolean] Call for variants on all sequences in the reference, otherwise small and structural variants will only be called on chr{1..22,X,Y,MT}. --output_gene_summary [boolean] If set to true, the workflow will generate gene-level coverage summaries. --out_dir [string] Directory for output of all workflow results. [default: output] Structural variant calling options --tr_bed [string] Input BED file containing tandem repeat annotations for the reference genome. Structural variant benchmarking options --sv_benchmark [boolean] Benchmark called structural variants. Copy number variant calling options --use_qdnaseq [boolean] Use QDNAseq for CNV calling. --qdnaseq_bin_size [choice] Bin size for QDNAseq in kbp. [default: 500] * 1 * 5 * 10 * 15 * 30 * 50 * 100 * 500 * 1000 Modified base calling options --force_strand [boolean] Require modkit to call strand-aware modifications. Short tandem repeat expansion genotyping options --sex [choice] Sex (XX or XY) to be passed to Straglr-genotype. * XY * XX Advanced Options --depth_intervals [boolean] Output a bedGraph file with entries for each genomic interval featuring homogeneous depth. --GVCF [boolean] Enable to output a gVCF file in addition to the VCF outputs (experimental). --downsample_coverage [boolean] Downsample the coverage to along the genome. --downsample_coverage_target [number] Average coverage or reads to use for the analyses. [default: 60] Multiprocessing Options --threads [integer] Set max number of threads to use for more intense processes (limited by config executor cpus) [default: 4] --ubam_map_threads [integer] Set max number of threads to use for aligning reads from uBAM (limited by config executor cpus) [default: 8] --ubam_sort_threads [integer] Set max number of threads to use for sorting and indexing aligned reads from uBAM (limited by config executor cpus) [default: 3] --ubam_bam2fq_threads [integer] Set max number of threads to use for uncompressing uBAM and generating FASTQ for alignment (limited by config executor cpus) [default: 1] --merge_threads [integer] Set max number of threads to use for merging alignment files (limited by config executor cpus) [default: 4] --modkit_threads [integer] Total number of threads to use in modkit modified base calling (limited by config executor cpus) [default: 4] Miscellaneous Options --disable_ping [boolean] Enable to prevent sending a workflow ping. Other parameters --monochrome_logs [boolean] null --validate_params [boolean] null [default: true] --show_hidden_params [boolean] null !! Hiding 28 params, use --show_hidden_params to show them !! -------------------------------------------------------------------------------- If you use epi2me-labs/wf-human-variation for your analysis please cite: * The nf-core framework https://doi.org/10.1038/s41587-020-0439-x
Running the pipeline
The epi2me-labs/wf-human-variation pipeline uses as input aligned BAM files. We will use the BAM file we generated in the “ONTvariants - mapping” exercise as the input for this pipeline.
Let’s change directory to the run3_variant_calling folder:
cd $HOME/workshop/ONTvariants/runs/run3_variant_calling
Copy the script for the exercise:
cp /work/traing/ONTvariants/scripts/launch_epi2me-labs_wf-human-variation.pbs .
Print the content of the script:
#!/bin/bash -l #PBS -N run3_epi2me_WF-HV #PBS -l select=1:ncpus=16:mem=32gb #PBS -l walltime=48:00:00 #PBS -m abe module load java export NXF_OPTS='-Xms1g -Xmx4g' cd $PBS_O_WORKDIR nextflow run epi2me-labs/wf-human-variation \ -r v2.2.0 \ -profile singularity \ --bam "/training/ONTvariants/runs/run2_mapping/SRR17138639_mapped_chr20.bam" \ --basecaller_cfg "dna_r10.4.1_e8.2_400bps_hac_prom" \ --mod \ --ref "/training/ONTvariants/data/chr20.fasta" \ --sample_name "SRR17138639" \ --snp \ --sv \ --bam_min_coverage 15 \ --out_dir "results"