Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

Aim:

Implement a de novo assembly pipeline for ONT data

Source:

https://nanoporetech.com/sites/default/files/s3/literature/Bacterial-assembly-workflow.pdf

https://github.com/fenderglass/Flye/blob/flye/docs/INSTALL.md

Methodology

a) create a conda environment with Flye and its dependencies

Prepare a file called ‘environment.yml’ that contains the following information:

name: flye
channels:
 - defaults
 - anaconda
 - bioconda
 - conda-forge
dependencies:
 - python=3.7
 - flye=2.9.1
 - seqkit=2.3.1
 - blast=2.13.0

Run the following command to generate the ‘flye’ conda environment:

conda env create -f environment.yml

Activate the environment to access the installed tools:

conda activate flye 

b) Run a de novo assembly and sequence comparison against a Reference genome

#!/bin/bash -l
#PBS -N FlyeAssembly
#PBS -l walltime=24:00:00
#PBS -l mem=16gb
#PBS -l ncpus=8

## Usage: qsub launch_flye_genome_assembly.pbs

cd $PBS_O_WORKDIR

################################################################################################################################
# USER DEFINED VARIABLES
################################################################################################################################
SAMPLEID=MyGenome
REFNAME=Accession_RefGenome
REF='/path/to/REF/genome.fasta'
ONT='/path/to/ONT/reads.fastq.gz'
################################################################################################################################

#activate flye conda environment 
conda activate flye

#STEP1: Run de novo genome assembly for >=Q20 data, use a combination of --nano-hq and --read-error 0.03
flye  --out-dir outdir_nano --threads 8 --read-error 0.03 --nano-hq $ONT  

#STEP2: Compare sequence similarity of assembled genome with reference sequence
blastn -query outdir_nano/assembly.fasta -subject $REF -evalue 1e-5 -out blastn_${REFNAME}_vs_${SAMPLEID}_assembly.txt -outfmt '6 qseqid sacc length pident mismatch gapopen qstart qend qlen sstart send slen evalue bitscore qcovhsp qcovs'

#STEP3: format output BLASTN table
echo "qseqid sacc length pident mismatch gapopen qstart qend qlen sstart send slen evalue bitscore qcovhsp qcovs" > header

#add header to blast output
cat header blastn_${REFNAME}_vs_${SAMPLEID}_assembly.txt > BLASTN_${REFNAME}_vs_${SAMPLEID}_assembly.txt 

#remove intermediate files
rm header blastn_${REFNAME}_vs_${SAMPLEID}_assembly.txt

monitor progress of the assembly:

qjobs

  • No labels