/
nf-eresearch/ONTprocessing - NextFlow pipeline for Oxford Nanopore de novo assembly and ref guided consensus

nf-eresearch/ONTprocessing - NextFlow pipeline for Oxford Nanopore de novo assembly and ref guided consensus

Aim:

This pipeline uses raw Oxford Nanopore (ONT) data to run the following processes:

  1. De novo genome assemebly of ONT reads using flye https://github.com/fenderglass/Flye

  2. Sequence comparison of assembled genome to a provide Reference Genome;

  3. Nano-Q: conservatively cleaning ONT reads from bam files and estimate variant frequencies https://github.com/PrestonLeung/Nano-Q

Prerequisites:

Install Nextflow using the following User Guide: https://eresearchqut.atlassian.net/wiki/spaces/EG/pages/862028236

Inputs:

  • ONT raw data in FASTQ format (compressed) - if multiple FASTQ.gz files are available for the same sample all need to be in the same folder. DO NOT place raw files for different samples in the same folder.

  • Index file that provide information of the ONT data including: Sample ID, location of ONT raw files and a reference genome:

  • sampleid,sample_files,reference NC483,/mnt/work/phylo/OxfordNanopore/NC483_barcode96/*.fastq.gz,/mnt/work/phylo/OxfordNanopore/NC483_NC001477_reference_sequence.fasta

The index file (i.e., index.csv) can contain one or multiple samples information, one per line:

sampleid,sample_files,reference ET300,/mnt/work/phylo/OxfordNanopore/ET300_barcode95/*.fastq.gz,/mnt/work/phylo/OxfordNanopore/ET300_MT921572_reference_sequence.fasta NC483,/mnt/work/phylo/OxfordNanopore/NC483_barcode96/*.fastq.gz,/mnt/work/phylo/OxfordNanopore/NC483_NC001477_reference_sequence.fasta

Create an index.csv file using the following ‘run_create_index_ONT.sh’ script. Usage: bash run_create_index_ONT.sh SampleID /path/to/ONT/fastq_files.gz

#!/bin/bash ## eResearch,QUT ## Script: Generates an index.csv file for input to the ONTprocessing nextflow workflow #Usage: run_create_index_ONT.sh SampleID /path/to/ONT/fastq_files/ ################################################################## SAMPLEID=$1 READSDIR=$2 REFERENCE='/mnt/work/phylo/OxfordNanopore/ET300_MT921572_reference_sequence.fasta' ################################################################## #Create index.csv file echo 'sampleid,sample_files,reference' > index.csv echo 'ONT' > filesTemp awk -v sample="$SAMPLEID" -v dir="$READSDIR" -v ref="$REFERENCE" '{print sample "," dir "*.fastq.gz," ref}' filesTemp >> index.csv rm filesTemp

Running the ONTprocessing nextflow pipeline:

Prepare a PBS pro submission script to run the ONTprocessing pipeline. An example launch.pbs script is the following:

Create a folder where you analyses will be run, and place a copy of both launch.pbs and index.csv in the same folder. The submit the job to the HPC cluster:

Monitor progress of the job:

Outputs:

Find in a subfolder called ‘work’ all the intermediate files generating while running the ONTprocessing workflow - this folder is not typically used to assess the outputs but rather to debug any issues with the pipeline. Key intermediate files or outputs for specific ‘processes’ (analyses tasks) are saved to a subfolder called ‘results’ where data for the following analyses are presented:

 

Related content

2024 - Semester One: Hands-on variant calling and metagenomics analyses using QUT's HPC and Nextflow
2024 - Semester One: Hands-on variant calling and metagenomics analyses using QUT's HPC and Nextflow
More like this
Anacapa - eDNA analysis toolkit
Anacapa - eDNA analysis toolkit
Read with this
nf-eresearch/ConsGenome: Nextflow based Genome Assembly, Variant Calling and building a Consensus Genome workflow
nf-eresearch/ConsGenome: Nextflow based Genome Assembly, Variant Calling and building a Consensus Genome workflow
More like this
ONT Oxford Nanopore fast5 processing
ONT Oxford Nanopore fast5 processing
Read with this
ONT de novo genome assembly
ONT de novo genome assembly
More like this
VSD-1.0 (Virus Surveillance and Diagnosis)
VSD-1.0 (Virus Surveillance and Diagnosis)
Read with this