/
ONT de novo genome assembly
ONT de novo genome assembly
Aim:
Implement a de novo assembly pipeline for ONT data
Source:
https://nanoporetech.com/sites/default/files/s3/literature/Bacterial-assembly-workflow.pdf
https://github.com/fenderglass/Flye/blob/flye/docs/INSTALL.md
Methodology
a) create a conda environment with Flye and its dependencies
Prepare a file called ‘environment.yml’ that contains the following information:
name: flye
channels:
- defaults
- anaconda
- bioconda
- conda-forge
dependencies:
- python=3.7
- flye=2.9.1
- seqkit=2.3.1
- blast=2.13.0
- nanoq=0.9.0
- minimap2
- samtools
Run the following command to generate the ‘flye’ conda environment:
conda env create -f environment.yml
Activate the environment to access the installed tools:
conda activate flye
merge FASTQ files
cat FAU10290_pass_barcode96_0cf303ee_*.gz > merge_NC483.fastq.gz
get path of merged file
pwd
modify launch*.pbs file
b) Run a de novo assembly and sequence comparison against a Reference genome
#!/bin/bash -l
#PBS -N FlyeAssembly
#PBS -l walltime=24:00:00
#PBS -l mem=16gb
#PBS -l ncpus=8
## Usage: qsub launch_flye_genome_assembly.pbs
cd $PBS_O_WORKDIR
################################################################################################################################
# USER DEFINED VARIABLES
################################################################################################################################
SAMPLEID=MyGenome
REFNAME=Accession_RefGenome
REF='/path/to/REF/genome.fasta'
ONT='/path/to/ONT/reads.fastq.gz'
################################################################################################################################
#activate flye conda environment
conda activate flye
#ASSEMBLY
#STEP1: Run de novo genome assembly for >=Q20 data, use a combination of --nano-hq and --read-error 0.03
flye --out-dir outdir_nano --threads 8 --read-error 0.03 --nano-hq $ONT
#STEP2: Compare sequence similarity of assembled genome with reference sequence
blastn -query outdir_nano/assembly.fasta -subject $REF -evalue 1e-5 -out blastn_${REFNAME}_vs_${SAMPLEID}_assembly.txt -outfmt '6 qseqid sacc length pident mismatch gapopen qstart qend qlen sstart send slen evalue bitscore qcovhsp qcovs'
#STEP3: format output BLASTN table
echo "qseqid sacc length pident mismatch gapopen qstart qend qlen sstart send slen evalue bitscore qcovhsp qcovs" > header
#add header to blast output
cat header blastn_${REFNAME}_vs_${SAMPLEID}_assembly.txt > BLASTN_${REFNAME}_vs_${SAMPLEID}_assembly.txt
#remove intermediate files
rm header blastn_${REFNAME}_vs_${SAMPLEID}_assembly.txt
#MAPPING
#generic mapping reads
minimap2 -a $REF $ONT > ${SAMPLEID}_aln.sam
#mapping noisy reads
#minimap2 -ax $REF $ONT > ${SAMPLEID}_aln2.sam
#Samtools
samtools view -bt ${LIST} -o ${SAMPLEID}_aln.bam ${SAMPLEID}_aln.sam
samtools sort -T /tmp/aln.sorted -o ${SAMPLEID}_aln.sorted.bam ${SAMPLEID}_aln.bam
samtools index ${SAMPLEID}_aln.sorted.bam
submit the job
qsub launch_flye_genome_assembly.pbs
monitor progress of the assembly:
qjobs
, multiple selections available,
Related content
nf-eresearch/ONTprocessing - NextFlow pipeline for Oxford Nanopore de novo assembly and ref guided consensus
nf-eresearch/ONTprocessing - NextFlow pipeline for Oxford Nanopore de novo assembly and ref guided consensus
Read with this
EBV genome integration data analysis
EBV genome integration data analysis
More like this
2024 eResearch - Session 4: nf-core-RNAseq pipeline
2024 eResearch - Session 4: nf-core-RNAseq pipeline
More like this
nf-core/rnaseq - Gene Expression Analysis
nf-core/rnaseq - Gene Expression Analysis
Read with this
2024 - Semester One: Hands-on variant calling and metagenomics analyses using QUT's HPC and Nextflow
2024 - Semester One: Hands-on variant calling and metagenomics analyses using QUT's HPC and Nextflow
Read with this
ONT Oxford Nanopore fast5 processing
ONT Oxford Nanopore fast5 processing
Read with this