ONT Oxford Nanopore fast5 processing
Software Requirements
an initial python venv
bonito has a very specific requirement on pytorch 1.10.0, so depending on the base CUDA version we'll look at setting up one of these two:
# CUDA 11.1
pip install torch==1.10.0+cu111 torchvision==0.11.0+cu111 torchaudio==0.10.0 -f https://download.pytorch.org/whl/torch_stable.html
# CUDA 10.2
pip install torch==1.10.0+cu102 torchvision==0.11.0+cu102 torchaudio==0.10.0 -f https://download.pytorch.org/whl/torch_stable.html
since we're wanting to use the A100's I'm assuming that they're running CUDA 11, but just wanted to capture this note
bonito
guppy (I see there is a nightly build of IGV that supports the new modbam tags, so worth testing)
samtools
bedtools
minimap2
nextflow
clair3 (as implemented below)
the specific workflow: nextflow run epi2me-labs/wf-human-snp --help
whatshap
cuteSV
sniffles2
IGV (though this might need some work as it's GUI based)
could look at the web server version
Bonito
singularity exec -B /work/ont --nv docker://ghcr.io/eresearchqut/bonito:v0.0.3 bonito basecaller dna_r9.4.1_e8_sup@v3.3 \
. \
--modified-bases 5mC \
--reference /work/ont/reference/GCA_000001405.15_GRCh38_no_alt_analysis_set.mmi \
--recursive \
--alignment-threads 4 > basecalls_mod_ref_S.bam
This example runs bonito from the container. It uses a particular folder with fast5 files (line 2)
Modified bases parameter 5mC (line 3)
Path to reference (line 4)
Search folder recursively (line 5)
Use threads and redirect STDOUT to a file (line 6)
Bonito Standalone
#!/bin/bash -l
#PBS -N bonito-gpu
#PBS -l select=1:ncpus=8:mem=32gb:ngpus=1:gputype=A100
#PBS -l walltime=72:00:00
#Change to folder where the job was submitted
cd $PBS_O_WORKDIR
# Run Bonito - check paths and parameters are correct
singularity exec -B /work/ont --nv docker://ghcr.io/eresearchqut/bonito:v0.0.5 bonito \
basecaller dna_r9.4.1_e8_sup@v3.3 \
./example_fast5 \
--modified-bases 5mC \
--reference ./reference/GCA_000001405.15_GRCh38_no_alt_analysis_set.mmi \
--recursive \
--alignment-threads 4 | \
samtools view -u | samtools sort -@ 4 > ./bonito_test/bam/HG002_as_test.bam
# Run samtools index
singularity exec -B /work/ont --nv docker://ghcr.io/eresearchqut/bonito:v0.0.5 \
samtools index HG002_as_test.bam
Clair3
Clair3 is available in a nextflow pipeline:
epi2me-labs/wf-human-snp: Small variant calling for human samples (github.com)
The pipeline’s configuration is to run in the local process and not submit jobs - will need to test if the overhead of PBS compares with running “locally” in a job.
Using test data:
nextflow run epi2me-labs/wf-human-snp -profile singularity \
--bam /work/ont/clair3/sample_data/chr6_chr20.bam \
--bed /work/ont/clair3/sample_data/chr6_chr20.bed \
--ref /work/ont/clair3/sample_data/chr6_chr20.fasta \
--out_dir "results" \
-process.executor "pbspro"
Line 1: Use singularity for the pipeline software
Line 5: save results in the results folder
Line 6: use PBS to run the processes