Aim:
This pipeline uses raw Oxford Nanopore (ONT) data to run the following processes:
...
ONT raw data in FASTQ format (compressed) - if multiple FASTQ.gz files are available for the same sample all need to be in the same folder. DO NOT place raw files for different samples in the same folder.
Index file that provide information of the ONT data including: Sample ID, location of ONT raw files and a reference genome:
Code Block sampleid,sample_files,reference NC483,/mnt/work/phylo/OxfordNanopore/NC483_barcode96/*.fastq.gz,/mnt/work/phylo/OxfordNanopore/NC483_NC001477_reference_sequence.fasta
The index file (i.e., index.csv) can contain one or multiple samples information, one per line:
Code Block |
---|
sampleid,sample_files,reference ET300,/mnt/work/phylo/OxfordNanopore/ET300_barcode95/*.fastq.gz,/mnt/work/phylo/OxfordNanopore/ET300_MT921572_reference_sequence.fasta NC483,/mnt/work/phylo/OxfordNanopore/NC483_barcode96/*.fastq.gz,/mnt/work/phylo/OxfordNanopore/NC483_NC001477_reference_sequence.fasta |
Create an index.csv file using the following ‘run_create_index.sh’ script. Usage: bash run_create_index.sh SampleID /path/to/ONT/fastq_files.gz
Code Block |
---|
#!/bin/bash
## eResearch,QUT,3 November 2022
## Script: Generates an index.csv file for input to the ONTprocessing nextflow workflow
#Usage: run_create_index.sh SampleID /path/to/ONT/fastq_files
##################################################################
SAMPLEID=$1
READSDIR=$2
REFERENCE='/mnt/work/phylo/OxfordNanopore/ET300_MT921572_reference_sequence.fasta'
##################################################################
#Create index.csv file
echo 'sampleid,sample_files,reference' > index.csv
echo 'ONT' > filesTemp
awk -v sample="$SAMPLEID" -v dir="$READSDIR" -v ref="$REFERENCE" '{print sample "," dir "*.fastq.gz," ref}' filesTemp >> index.csv
rm filesTemp |
Running the ONTprocessing nextflow pipeline:
Prepare a PBS pro submission script to run the ONTprocessing pipeline. An example launch.pbsscript is the following:
Code Block |
---|
#!/bin/bash -l #PBS -N ontprocessing #PBS -l select=1:ncpus=2:mem=4gb #PBS -l walltime=24:00:00 #Use the current directory to run the workflow cd $PBS_O_WORKDIR module load java NXF_OPTS='-Xms1g -Xmx4g' nextflow run eresearchqut/ontprocessing --samplesheet index.csv |
Create a folder where you analyses will be run, and place a copy of both launch.pbs and index.csv in the same folder. The submit the job to the HPC cluster:
...