Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Install a local NCBI blast directory (NT and NR)

Find detailed infor on how to download these databases at https://www.ncbi.nlm.nih.gov/books/NBK569850/

Make sure the taxdb.btd and the taxdb.bti files are also present in the directory.

Create a folder where you will store your NCBI database including the date of download. For instance:

Code Block
mkdir blastDB/30112021

Run the following PBS script in the newly created folder. Use the update_blastdb.pl script from the blast+ version you will use with your pipeline.

Code Block
#!/bin/bash -l
#PBS -N blastdb_download
#PBS -l walltime=24:00:00
#PBS -l mem=60gb
#PBS -l ncpus=2

cd $PBS_O_WORKDIR
perl update_blastdb.pl --decompress nt [*]
perl update_blastdb.pl --decompress nr [*]
perl update_blastdb.pl taxdb
tar -xzf taxdb.tar.gz

The VSD workflow

The VSD workflow will perform the following steps by default:

...

  • 01_read_size_selection (cutadapt log file and fastq file including reads only matching the size specified in the index.csv file) MT020_21-22nt_cutadapt.log & MT020_21-22nt.fastq

  • 02_velvet (velvet results and the fasta file which includes the velvet assembled contigs MT020_velvet_assembly_21-22nt.fasta

  • 02a_spades (if spades is additionally run)

  • 03_cap3 (fasta file of the scaffolds produced by CAP3 as well as the singletons) MT020_velvet_cap3_21-22nt_rename.fasta

  • 04_blastn (all blastn results, filtered results limited to only viruses and viroid top 5 hit matches and their taxonomy) MT020_velvet_21-22nt_megablast_vs_NT.bls, MT020_velvet_21-22nt_megablast_vs_NT_top5Hits.txt, MT020_velvet_21-22nt_megablast_vs_NT_top5Hits_virus_viroids_final.txt MT020_velvet_21-22nt_megablast_vs_NT_top5Hits_virus_viroids_seq_ids_taxonomy.txt

  • 05_blastoutputs (BlastTools.jar summary output which clusters all the contigs matching to a specific hit. summary_MT029_velvet_21-22nt_megablast_vs_NT_top5Hits_virus_viroids_final.txt

  • 06_blastp (blastp outputs)

  • 07_filternstats (filtered blast summary with various coverage statistics for each virus and viroid hit, and associated consensus fasta file and vcf file) MT020_21-22nt_top_scoring_targets_with_cov_stats.txt, MT020_21-22nt_MK929590_Peach_latent_mosaic_viroid.consensus.fasta, MT020_21-22nt_MK929590_Peach_latent_mosaic_viroid_sequence_variants.vcf.gz

  • 08_report (summary of results for all samples included in the index.csv file. This includes a cross-contamination prediction) run_top_scoring_targets_with_cov_stats_with_cont_flag_21-22nt_0.01.txt.

To do:

  • Include a deduplication step for fastq files that have UMIs incorporated

  • Make QC filtering optional

  • Work on final report

  • Add coverage statistics to local db blast results

  • Incorporate VirusDetect in the pipeline