Page Comparison

...

a) create a conda environment with Flye and its dependencies

Create a 'python 3.7' environment called flye

Code Block
conda create --name flye python=3.7

activate the conda environment

Code Block
conda activate flye

install flye

Code Block
conda install flye

b) Convert FASTQ to FASTA

Code Block
conda install -c bioconda seqkit

c) install blast

Prepare a file called ‘environment.yml’ that contains the following information:

Code Block
name: flye channels: - defaults - anaconda - bioconda - conda-forge dependencies: - python=3.7 - flye=2.9.1 - seqkit=2.3.1 - blast=2.13.0 - nanoq=0.9.0 - minimap2 - samtools

Run the following command to generate the ‘nanoQ’ ‘flye’ conda environment:

Code Block
conda env create -f environment.yml

...

Code Block
conda activate flye

merge FASTQ files

Code Block
cat FAU10290_pass_barcode96_0cf303ee_*.gz

...

> merge_NC483.fastq.gz

get path of merged file

Code Block
pwd

modify launch*.pbs file

b) Run a de novo assembly test runand sequence comparison against a Reference genome

Code Block

#!/bin/bash -l
#PBS -N FlyeAssembly
#PBS -l walltime=24:00:00
#PBS -l mem=16gb
#PBS -l ncpus=8

## Usage: qsub launch_flye_genome_assembly.pbs

cd $PBS_O_WORKDIR

ONTDATA################################################################################################################################
# USER DEFINED VARIABLES
################################################################################################################################
SAMPLEID=MyGenome
REFNAME=Accession_RefGenome
REF='/path/to/ONTREF/readsgenome.fasta'
REF='ET300_MT921572_reference_sequence.fasta'

conda activate flye

#run assembly
flye --nano-raw $ONTDATAONT='/path/to/ONT/reads.fastq.gz'
################################################################################################################################

#activate flye conda environment 
conda activate flye

#ASSEMBLY

#STEP1: Run de novo genome assembly for >=Q20 data, use a combination of --nano-hq and --read-error 0.03
flye  --out-dir outoutdir_nano --threads 8 --read-error 0.03 --nano-hq $ONT  

#STEP2: Compare sequence similarity of assembled genome with reference sequence
blastn -query outdir_nano/assembly.fasta -subject $REF -evalue 1e-5 -out blastn_${REFNAME}_vs_${SAMPLEID}_assembly.txt -outfmt '6 qseqid sacc length pident mismatch gapopen qstart qend qlen sstart send slen evalue bitscore qcovhsp qcovs'

#STEP3: format output BLASTN table
echo "qseqid sacc length pident mismatch gapopen qstart qend qlen sstart send slen evalue bitscore qcovhsp qcovs" > header

#add header to blast output
cat header blastn_${REFNAME}_vs_${SAMPLEID}_assembly.txt > BLASTN_${REFNAME}_vs_${SAMPLEID}_assembly.txt 

#remove intermediate files
rm header blastn_${REFNAME}_vs_${SAMPLEID}_assembly.txt

#MAPPING

#generic mapping reads
minimap2 -a $REF $ONT > ${SAMPLEID}_aln.sam

#mapping noisy reads
#minimap2 -ax $REF $ONT > ${SAMPLEID}_aln2.sam  

#Samtools
samtools view -bt ${LIST} -o ${SAMPLEID}_aln.bam ${SAMPLEID}_aln.sam

samtools sort -T /tmp/aln.sorted -o ${SAMPLEID}_aln.sorted.bam ${SAMPLEID}_aln.bam

samtools index ${SAMPLEID}_aln.sorted.bam

submit the job

Code Block
qsub launch_flye_genome_assembly.pbs

monitor progress of the assembly:

Code Block
qjobs

Versions Compared

Old Version 2

New Version Current

Key