Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Aim:

This pipeline uses raw Oxford Nanopore (ONT) data to run the following processes:

...

  • ONT raw data in FASTQ format (compressed) - if multiple FASTQ.gz files are available for the same sample all need to be in the same folder. DO NOT place raw files for different samples in the same folder.

  • Index file that provide information of the ONT data including: Sample ID, location of ONT raw files and a reference genome:

  • Code Block
    sampleid,sample_files,reference
    NC483,/mnt/work/phylo/OxfordNanopore/NC483_barcode96/*.fastq.gz,/mnt/work/phylo/OxfordNanopore/NC483_NC001477_reference_sequence.fasta

The index file (i.e., index.csv) can contain one or multiple samples information, one per line:

Code Block
sampleid,sample_files,reference
ET300,/mnt/work/phylo/OxfordNanopore/ET300_barcode95/*.fastq.gz,/mnt/work/phylo/OxfordNanopore/ET300_MT921572_reference_sequence.fasta
NC483,/mnt/work/phylo/OxfordNanopore/NC483_barcode96/*.fastq.gz,/mnt/work/phylo/OxfordNanopore/NC483_NC001477_reference_sequence.fasta

Create an index.csv file using the following ‘run_create_index.sh’ script. Usage: bash run_create_index.sh SampleID /path/to/ONT/fastq_files.gz

Code Block
#!/bin/bash
  
## eResearch,QUT,3 November 2022
## Script: Generates an index.csv file for input to the ONTprocessing nextflow workflow

#Usage: run_create_index.sh SampleID /path/to/ONT/fastq_files 

##################################################################
SAMPLEID=$1
READSDIR=$2
REFERENCE='/mnt/work/phylo/OxfordNanopore/ET300_MT921572_reference_sequence.fasta'
##################################################################

#Create index.csv file
echo 'sampleid,sample_files,reference' > index.csv

echo 'ONT' > filesTemp
awk -v sample="$SAMPLEID" -v dir="$READSDIR" -v ref="$REFERENCE" '{print sample "," dir "*.fastq.gz," ref}' filesTemp >> index.csv 

rm filesTemp

Running the ONTprocessing nextflow pipeline:

Prepare a PBS pro submission script to run the ONTprocessing pipeline. An example launch.pbsscript is the following:

Code Block
#!/bin/bash -l
#PBS -N ontprocessing
#PBS -l select=1:ncpus=2:mem=4gb
#PBS -l walltime=24:00:00

#Use the current directory to run the workflow
cd $PBS_O_WORKDIR

module load java
NXF_OPTS='-Xms1g -Xmx4g'

nextflow run eresearchqut/ontprocessing --samplesheet index.csv

Create a folder where you analyses will be run, and place a copy of both launch.pbs and index.csv in the same folder. The submit the job to the HPC cluster:

...