nf-eresearch/ConsGenome: Nextflow based Genome Assembly, Variant Calling and building a Consensus Genome workflow

Goal

Enable the study of new strains of Dengue viruses by producing de novo assembled genomic scaffolds, comparison to reference genomes, variant calling and generation of a reference guided consensus genome.

Pre-requisites

Nextflow NextFlow quick start
Basic unix command line knowledge (example: Learning Resources: the Linux Command Line ; The Unix Shell: Summary and Setup )
Familiarity with one unix text editors (example Vi/Vim or Nano):
- VIM ( https://bioinformatics.uconn.edu/vim-guide/ ; https://missing.csail.mit.edu/2020/editors/)
- Nano (Basic tutorial for Nano users ; https://www.howtogeek.com/howto/42980/the-beginners-guide-to-nano-the-linux-command-line-text-editor/ )

ConsGenome workflow

Installing

The nextflow based ConsGenome workflow is available on the HPC. To download a copy of the workflow to your home folder the following steps are needed on the HPC

module load java
nextflow pull file:///work/pipelines/eresearch/consgenome.git

Running the pipeline

Step 1:

Create a folder to hold the output of the pipeline. The results folder will be created here.

Step 2:

Prepare the index.csv file. This file will contain a list of sample ids and the full path to the paired samples. Read 1 and Read 2 must be specified separately.

EG:

sampleid,read1,read2
sample01,/full/path/to/sample01_r1.fq.gz,/full/path/to/sample01_r2.fq.gz

Step 3:

Create a nextflow.config file (if needed). If you need to override any of the pipeline defaults, put them in this file.

params {
    genome = "/path/to/genome/reference.fa"
}

Step 4:

Create a PBS launch script. I.e.. Copy this into launch.pbs

#!/bin/bash -l
#PBS -N jobname
#PBS -l walltime=48:00:00
#PBS -l select=1:ncpus=1:mem=5gb
#PBS -m abe
cd $PBS_O_WORKDIR
NXF_OPTS='-Xms1g -Xmx4g'
module load java
singularity remote use QUT
nextflow run consgenome -resume
singularity remote use SylabsCloud
chmod -R g+rwX results/
chmod -R g+rwX work/

If you wish to use the conda version of the pipeline use this script:

#!/bin/bash -l
#PBS -N jobname
#PBS -l walltime=48:00:00
#PBS -l select=1:ncpus=1:mem=5gb
#PBS -m abe
cd $PBS_O_WORKDIR
NXF_OPTS='-Xms1g -Xmx4g'
module load java
nextflow run consgenome -profile conda -resume
chmod -R g+rwX results/
chmod -R g+rwX work/

Step 5:

Launch the pipeline

qsub launch.pbs

see the progress of the job

qjobs

#or 
qstat -u $USER

Step 6:

Monitor on the Nextflow Tower

If you have previously enabled Nextflow Tower visit

https://nftower.qut.edu.au

vi ~/.nextflow/config

#or do the following
vi $HOME/.nextflow/config

Step 7:

Once finished, examine the results in the results folder.