Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

If you want to derive a summary of detections for all the samples included in the index file, specify the --contamination_detection_viral_db or the --contamination_detection_ncbi option. This will create a summary text file under the Summary tab with a column called contamination_flag

Running VirusDetect

VirusDetect version 1.8 can also be run in parallel.

See http://virusdetect.feilab.net/cgi-bin/virusdetect/index.cgi for details about this separate pipeline.

Example of PBS script to run on an HPC with torque batch system

Make sure to either specify the full path to your index.csv file in the PBS script or place a copy of the index.csv file in the folder you will run the PBS script in.

The PBS script example below (VirReport_nextflow.sh) will run on raw fastq files that will need to be merged and then quality filtered.

We are also asking to run a process that will derive an RNA source profile for each samples during the quality filtering step.

Homology searches will be run against NCBI and the PVirDB. The pipeline will also run VirusDetect in parallel.

Finally we will want the reads to be de-duplicated after mapping.

...

With the contamination flag, the assumption is that if a pest is present at high titer in a given sample and detection of reads matching to this pathogen in other samples occur at a significantly lower abundance, there is a risk that this lower signal is due to contamination (e.g. index hopping from high-titer sample). We first calculate the maximum FPKM value recorded for each virus and viroid identified on a run. If for a given virus, the FPKM value reported for a sample represented less than a percentage of this maximum FPKM value, it is then flagged as a contamination event. We apply 0.1% threshold value as default. This is just indicative and method cannot discriminate between false positives and viruses present at very low titer in a plant. It is then recommended to compare the sequences obtained, check the SNPs and validate through independent method.

Running in diagnostic mode (SSG team only internal use)

If you want to run VirReport in diagnostics mode (--diagno), the pipeline will also add an evidence category (ie KNOWN, KNOWN_FRAGMENT and PUTATIVE_NOVEL) to each detection based on av-pident and % bases 10X.

If you are running homology searches against the NCBI NT database, you will also need to provide a list of pests of interest in the Targetted_Virus_Viroid.txt file located in the bin folder. If some of the detections match to this pest list, they will be categorised as Quarantinable versus Higher_plant_viruses in the final summary.

Finally, if sample information is provided (--sampleinfo --sampleinfo_path /path/to/sampleinfo.txt), this will be added to the final summary.

Sampleinfo.txt file example:

Code Block
Sample	PEQ_index_number	LIMS_ID_RAMACIOTTI	Host_species	Host_common_name	Plant_tissue_collected
MT498	P30	ELL110002A1	Allium sativum	Garlic	50 mg leaf
Running VirusDetect

VirusDetect version 1.8 can also be run in parallel.

See http://virusdetect.feilab.net/cgi-bin/virusdetect/index.cgi for details about this separate pipeline.

Example of PBS script to run on an HPC with torque batch system

Make sure to either specify the full path to your index.csv file in the PBS script or place a copy of the index.csv file in the folder you will run the PBS script in.

Example 1:

The PBS script example below (VirReport_nextflow.sh) will run on raw fastq files that will need to be merged and then quality filtered.

We are also asking to run a process that will derive an RNA source profile for each samples during the quality filtering step.

Blastn (using the megablast algorithm) and tblastn homology searches will be run against the PVirDB.

Finally we will want the reads to be de-duplicated after mapping.

Code Block
#!/bin/bash -l
#PBS -N VirReport
#PBS -l select=1:ncpus=2:mem=8gb
#PBS -l walltime=05:00:00


cd $PBS_O_WORKDIR
module load java
NXF_OPTS='-Xms1g -Xmx4g'

nextflow run eresearchqut/VirReport -profile singularity -resume --indexfile index.csv \
                                    --merge_lane --qualityfilter --rna_source_profile \
                                    --bowtie_db_dir /path_to_bowtie_indices \
                                    --dedup \
                                    --virreport_viral_db --blast_viral_db_path /path_to_local_viral_database --contamination_detection_viral_db

Example 2:

In the PBS job below, blastn (using the megablast algorithm) and blastx homology searches will be run against NCBI NR and NT respectively.

Code Block
#!/bin/bash -l
#PBS -N VirReport
#PBS -l select=1:ncpus=2:mem=8gb
#PBS -l walltime=05:00:00


cd $PBS_O_WORKDIR
module load java
NXF_OPTS='-Xms1g -Xmx4g'

nextflow run eresearchqut/VirReport -profile singularity -resume --indexfile index.csv \
                                    --merge_lane --qualityfilter --rna_source_profile \
                                    --bowtie_db_dir /path_to_bowtie_indices \
                                    --dedup \
                                    --virreport_ncbi --blast_viral_db_path /path_to_ncbi_databases --contamination_detection

Example 3:

In the PBS job below, homology searches will be run against NCBI and the PVirDB. The pipeline will also run VirusDetect in parallel.

Code Block
#!/bin/bash -l
#PBS -N VirReport
#PBS -l select=1:ncpus=2:mem=8gb
#PBS -l walltime=05:00:00


cd $PBS_O_WORKDIR
module load java
NXF_OPTS='-Xms1g -Xmx4g'

nextflow run eresearchqut/VirReport -profile singularity -resume --indexfile index.csv \
                                    --merge_lane --qualityfilter --rna_source_profile \
                                    --bowtie_db_dir /path_to_bowtie_indices \
                                    --dedup \
                                    --virreport_ncbi --blast_viral_db_path /path_to_ncbi_databases --contamination_detection \
                                    --virreport_viral_db --blast_viral_db_path /path_to_local_viral_database --contamination_detection_viral_db \
                                    --virusdetect --virusdetect_db_path /path_to_virusdetect_database

Submit your job using the qsub command:

...

Alternatively use the following command to check on the jobs you are running.:

qjobs

You can also check the .nextflow.log file for details on progress.

...

◦ under the QC_report folder, read size distribution pdf file and read RNA source pdf file are created. The folder also includes a run_qc_report text file

...

Image RemovedImage Removed

Image Removed

01_VirReport folder content:

For each sample:

  • assembly: results associated with de novo assembly

  • blastn: megablast results (NCBI NT or viral database PVirDB)

  • blastx: blastx results against NR

  • tblastn: tblastn results against viral database PVirDB

  • alignments: alignment against top reference hit and associated statistic derivation

  • Summary

...

Definitions of terms used in summary report: 

...

sacc  Accession number of best homology match recovered

...

av-pident  Average per cent identity of all de novo assembled contigs to the same top reference hit

...

Mean read depth  The mean coverage in bases to the genome/sequence of the best homology match

...

Dedup read count  Read counts after PCR duplicates sharing UMIs are collapsed

...

Dup %  Duplication rate detected using UMIs

...

FPKM:  Fragments Per Kilobase of transcript, per Million mapped reads is a normalised unit of

...

  transcript expression. It scales by transcript length to compensate for the fact that most

...

  RNA-seq protocols will generate more sequencing reads from longer RNA molecules

...

  [deduplicated read count x 10^3 x 10^6]/[total quality filtered reads x genome length]

...

% bases 5X  The fraction of bases that attained at least 5X sequence coverage

...

% bases 10X  The fraction of bases that attained at least 10X sequence coverage

...

file are created. The folder also includes a run_qc_report text file

...

Image AddedImage Added

Image Added

01_VirReport folder content:

For each sample:

  • assembly: results associated with de novo assembly

  • blastn: megablast results (NCBI NT or viral database PVirDB)

  • blastx: blastx results against NR

  • tblastn: tblastn results against viral database PVirDB

  • alignments: alignment against top reference hit and associated statistic derivation

  • Summary

...

Definitions of terms used in summary report: 

  • sacc  Accession number of best homology match recovered

  • av-pident  Average per cent identity of all de novo assembled contigs to the same top reference hit

  • Mean read depth  The mean coverage in bases to the genome/sequence of the best homology match

  • Dedup read count  Read counts after PCR duplicates sharing UMIs are collapsed

  • Dup %  Duplication rate detected using UMIs

  • FPKM:  Fragments Per Kilobase of transcript, per Million mapped reads is a normalised unit of transcript expression. It scales by transcript length to compensate for the fact that most RNA-seq protocols will generate more sequencing reads from longer RNA molecules. The formula is: [deduplicated read count x 10^3 x 10^6]/[total quality filtered reads x genome length]

  • % bases 5X  The fraction of bases that attained at least 5X sequence coverage

  • % bases 10X  The fraction of bases that attained at least 10X sequence coverage

  • Contamination flag.