Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Aims:

  • Implement an end-to-end bioinformatics workflow that is reproducible, robust, scalable and compute infrastructure agnostic

  • Leverage from the host plant antiviral response pathway to increase sensitivity and specificity of pathogen detections

  • Prevent or minimise the reporting of cross-sample contaminations owing to index hopping events (false positive detections)

Pre-requisites

Install nextflow: Nextflow

Method

nf nextflow quick start

Database

Custom virus database, please do not distribute to third parties. Location:

Code Block
/work/img/databases/

Creating a local blast database

Code Block
makeblastdb -in test.fasta -parse_seqids -dbtype nucl

Method

We will use two nextflow pipelines to process the Virome data, initially, we run trimgalore to filter out poor quality reads/bases and remove adapter sequences. Then we run VirReport to assess the presence of viruses and viroids.

1) Quality Control of Raw Files

First generate an ‘index.csv’ file that contains the Sample ID and path to the raw data file:

Code Block
sampleId,read1
CB,/work/img/raw_data/AGRF_CAGRF22029755_H52LJDRX2/CB_H52LJDRX2_TCATGCGT_L001_R1.fastq.gz
CM,/work/img/raw_data/AGRF_CAGRF22029755_H52LJDRX2/CM_H52LJDRX2_CTGCATCA_L001_R1.fastq.gz
CP,/work/img/raw_data/AGRF_CAGRF22029755_H52LJDRX2/CP_H52LJDRX2_TCAGACTT_L001_R1.fastq.gz
TB1,/work/img/raw_data/AGRF_CAGRF22029755_H52LJDRX2/TB1_H52LJDRX2_TCACTACG_L001_R1.fastq.gz
TBG,/work/img/raw_data/AGRF_CAGRF22029755_H52LJDRX2/TBG_H52LJDRX2_CTTCACGA_L001_R1.fastq.gz
TM,/work/img/raw_data/AGRF_CAGRF22029755_H52LJDRX2/TM_H52LJDRX2_CGTTCTGC_L001_R1.fastq.gz
TP,/work/img/raw_data/AGRF_CAGRF22029755_H52LJDRX2/TP_H52LJDRX2_AAGTTATC_L001_R1.fastq.gz
TPS,/work/img/raw_data/AGRF_CAGRF22029755_H52LJDRX2/TPS_H52LJDRX2_CTTCTTAA_L001_R1.fastq.gz
TR1,/work/img/raw_data/AGRF_CAGRF22029755_H52LJDRX2/TR1_H52LJDRX2_TCAGTGAG_L001_R1.fastq.gz
TR2,/work/img/raw_data/AGRF_CAGRF22029755_H52LJDRX2/TR2_H52LJDRX2_TGACCGCG_L001_R1.fastq.gz

Create a PBS Pro submission script:

Code Block
#!/bin/bash -l
#PBS -N nftrimgalore 
#PBS -l walltime=24:00:00
#PBS -l select=1:ncpus=1:mem=5gb

cd $PBS_O_WORKDIR
NXF_OPTS='-Xms1g -Xmx4g'
module load java

#run netflow pipeline
nextflow run trimgalore --indexfile index.csv --singleEnd --trim_qual 30

Submit the job to the HPC scheduler:

Code Block
qsub launch.pbs

Check progress of the job:

Code Block
qjobs
Code Block
qstat -u USERNAME

2) Diagnosis of plant viruses and viroids

Installing VirReport

The open-source VirReport code is available at https://github.com/eresearchqut/VirReport

1. Fetch a copy of VirReport

Get At the HPC, run the following command to get a copy of the toolsource code:

Code Block
git clone https://github.com/eresearchqut/VirReport.git

Alternatively, run the following command to fetch and also test VirReport:

Code Block
nextflow run eresearchqut/VirReport -profile singularity --indexfile index_example.csv

Note: the above command will store a cached copy of VirReport at '$HOME/.nextflow/NXF_SINGULARITY_CACHEDIR'

Running VirReport

  1. Sample index file

To run VirReport it is required to create an 'index_samples.csv` that specifies the sample ID, path to raw data, minimal length, and the maximum length of reads to be used for diagnosis. For example:

...

You can modify the above template with your own samples. Note, the files above can be the trimgalore processed files.

2. Run VirReport test

An alternative is to clone a copy of the VirReport (above) run the following command that will both download VirReport tool and also run a test:

Code Block
Code Block
sampleid,samplepath,minlen,maxlen
CB,/work/img/test/trimgalore/results/Trim_Galore/CB_trimmed.fq.gz,21,22
CM,/work/img/test/trimgalore/results/Trim_Galore/CM_trimmed.fq.gz,21,22
CP,/work/img/test/trimgalore/results/Trim_Galore/CP_trimmed.fq.gz,21,22
TB1,/work/img/test/trimgalore/results/Trim_Galore/TB1_trimmed.fq.gz,21,22
TBG,/work/img/test/trimgalore/results/Trim_Galore/TBG_trimmed.fq.gz,21,22
TM,/work/img/test/trimgalore/results/Trim_Galore/TM_trimmed.fq.gz,21,22
TPS,/work/img/test/trimgalore/results/Trim_Galore/TPS_trimmed.fq.gz,21,22
TP,/work/img/test/trimgalore/results/Trim_Galore/TP_trimmed.fq.gz,21,22
TR1,/work/img/test/trimgalore/results/Trim_Galore/TR1_trimmed.fq.gz,21,22
TR2,/work/img/test/trimgalore/results/Trim_Galore/TR2_trimmed.fq.gz,21,22

2. Run VirReport using a PBS Pro script

Define nextflow configurations if different from provided template:

Code Block
includeConfig 'conf/base.config'

params {
  outdir = 'results'
  indexfile = 'index.csv'
  blast_db_dir = '/lustre/work-lustre/hia_mt18005/blastDB/30112021'
  blast_local_db_path = '/work/img/databases/PVirDB/PVirDB_ver20211109.fasta'
  targets = false
  targets_file = 'Targetted_Viruses_Viroids.txt'
  help = false
  cap3_len = '20'
  orf_minsize = '150'
  orf_circ_minsize = '150'
  blastn_evalue = '0.0001'
  blastp_evalue = '0.0001'
  blastn_method = 'megablast'
  blastp = false
  spades = false
  spadeskmer = '9 11 13 15 17 19 21'
  blastlocaldb = false
  ictvinfo = 'ICTV_taxonomy_MinIdentity_Species.tsv'
  contamination_detection = false
  contamination_flag = '0.01'
  contamination_detection_method = 'FPKM'
}

process.container = "ghcr.io/eresearchqut/virreport:v1.0.0"

manifest {
  name          = "eresearchqut/VirReport"
  author        = "Roberto Barrero, Maely Gauthier, Desmond Schmidt, Craig Windell"
  defaultBranch = "main"
  description   = "VirReport is designed to help phytosanitary diagnostics of viruses and viroid pathogens in quarantine facilities. It takes small RNA-Seq samples as input."
  version       = "v1.0.0"
}

Prepare a PBS Pro submission script:

Code Block
#!/bin/bash -l
#PBS -N nftrimgalore 
#PBS -l walltime=24:00:00
#PBS -l select=1:ncpus=1:mem=5gb

cd $PBS_O_WORKDIR
NXF_OPTS='-Xms1g -Xmx4g'
module load java

#run netflow pipeline
nextflow run eresearchqut/VirReport -profile singularity --indexfile index_example.csv