...
a) create a conda environment with Flye and its dependencies
Create a 'python 3.7' environment called flye
Code Block |
---|
conda create --name flye python=3.7 |
activate the conda environment
Code Block |
---|
conda activate flye |
install flye
Code Block |
---|
conda install flye |
b) Convert FASTQ to FASTA
Code Block |
---|
conda install -c bioconda seqkit |
c) install blast
Prepare a file called ‘environment.yml’ that contains the following information:
Code Block |
---|
name: flye channels: - defaults - anaconda - bioconda - conda-forge dependencies: - python=3.7 - flye=2.9.1 - seqkit=2.3.1 - blast=2.13.0 - nanoq=0.9.0 - minimap2 - samtools |
Run the following command to generate the ‘nanoQ’ ‘flye’ conda environment:
Code Block |
---|
conda env create -f environment.yml |
...
Code Block |
---|
conda activate flye |
merge FASTQ files
Code Block |
---|
cat FAU10290_pass_barcode96_0cf303ee_*.gz |
...
> merge_NC483.fastq.gz |
get path of merged file
Code Block |
---|
pwd |
modify launch*.pbs file
b) Run a de novo assembly test runand sequence comparison against a Reference genome
Code Block |
---|
#!/bin/bash -l #PBS -N FlyeAssembly #PBS -l walltime=24:00:00 #PBS -l mem=16gb #PBS -l ncpus=8 ## Usage: qsub launch_flye_genome_assembly.pbs cd $PBS_O_WORKDIR ONTDATA################################################################################################################################ # USER DEFINED VARIABLES ################################################################################################################################ SAMPLEID=MyGenome REFNAME=Accession_RefGenome REF='/path/to/ONTREF/readsgenome.fasta' REF='ET300_MT921572_reference_sequence.fasta' conda activate flye #run assembly flye --nano-raw $ONTDATAONT='/path/to/ONT/reads.fastq.gz' ################################################################################################################################ #activate flye conda environment conda activate flye #ASSEMBLY #STEP1: Run de novo genome assembly for >=Q20 data, use a combination of --nano-hq and --read-error 0.03 flye --out-dir outoutdir_nano --threads 8 --read-error 0.03 --nano-hq $ONT #STEP2: Compare sequence similarity of assembled genome with reference sequence blastn -query outdir_nano/assembly.fasta -subject $REF -evalue 1e-5 -out blastn_${REFNAME}_vs_${SAMPLEID}_assembly.txt -outfmt '6 qseqid sacc length pident mismatch gapopen qstart qend qlen sstart send slen evalue bitscore qcovhsp qcovs' #STEP3: format output BLASTN table echo "qseqid sacc length pident mismatch gapopen qstart qend qlen sstart send slen evalue bitscore qcovhsp qcovs" > header #add header to blast output cat header blastn_${REFNAME}_vs_${SAMPLEID}_assembly.txt > BLASTN_${REFNAME}_vs_${SAMPLEID}_assembly.txt #remove intermediate files rm header blastn_${REFNAME}_vs_${SAMPLEID}_assembly.txt #MAPPING #generic mapping reads minimap2 -a $REF $ONT > ${SAMPLEID}_aln.sam #mapping noisy reads #minimap2 -ax $REF $ONT > ${SAMPLEID}_aln2.sam #Samtools samtools view -bt ${LIST} -o ${SAMPLEID}_aln.bam ${SAMPLEID}_aln.sam samtools sort -T /tmp/aln.sorted -o ${SAMPLEID}_aln.sorted.bam ${SAMPLEID}_aln.bam samtools index ${SAMPLEID}_aln.sorted.bam |
submit the job
Code Block |
---|
qsub launch_flye_genome_assembly.pbs |
monitor progress of the assembly:
Code Block |
---|
qjobs |