...
Install a local NCBI blast directory (NT and NR)
Find detailed infor on how to download these databases at https://www.ncbi.nlm.nih.gov/books/NBK569850/
Make sure the taxdb.btd and the taxdb.bti files are also present in the directory.
Create a folder where you will store your NCBI database including the date of download. For instance:
Code Block |
---|
mkdir blastDB/30112021 |
Run the following PBS script in the newly created folder. Use the update_blastdb.pl
script from the blast+ version you will use with your pipeline.
Code Block |
---|
#!/bin/bash -l #PBS -N blastdb_download #PBS -l walltime=24:00:00 #PBS -l mem=60gb #PBS -l ncpus=2 cd $PBS_O_WORKDIR perl update_blastdb.pl --decompress nt [*] perl update_blastdb.pl --decompress nr [*] perl update_blastdb.pl taxdb tar -xzf taxdb.tar.gz |
The VSD workflow
The VSD workflow will perform the following steps by default:
...
01_read_size_selection (cutadapt log file and fastq file including reads only matching the size specified in the index.csv file) MT020_21-22nt_cutadapt.log & MT020_21-22nt.fastq
02_velvet (velvet results and the fasta file which includes the velvet assembled contigs MT020_velvet_assembly_21-22nt.fasta
02a_spades (if spades is additionally run)
03_cap3 (fasta file of the scaffolds produced by CAP3 as well as the singletons) MT020_velvet_cap3_21-22nt_rename.fasta
04_blastn (all blastn results, filtered results limited to only viruses and viroid top 5 hit matches and their taxonomy) MT020_velvet_21-22nt_megablast_vs_NT.bls, MT020_velvet_21-22nt_megablast_vs_NT_top5Hits.txt, MT020_velvet_21-22nt_megablast_vs_NT_top5Hits_virus_viroids_final.txt MT020_velvet_21-22nt_megablast_vs_NT_top5Hits_virus_viroids_seq_ids_taxonomy.txt
05_blastoutputs (
BlastTools
.jar summary output which clusters all the contigs matching to a specific hit. summary_MT029_velvet_21-22nt_megablast_vs_NT_top5Hits_virus_viroids_final.txt06_blastp (blastp outputs)
07_filternstats (filtered blast summary with various coverage statistics for each virus and viroid hit, and associated consensus fasta file and vcf file) MT020_21-22nt_top_scoring_targets_with_cov_stats.txt, MT020_21-22nt_MK929590_Peach_latent_mosaic_viroid.consensus.fasta, MT020_21-22nt_MK929590_Peach_latent_mosaic_viroid_sequence_variants.vcf.gz
08_report (summary of results for all samples included in the index.csv file. This includes a cross-contamination prediction) run_top_scoring_targets_with_cov_stats_with_cont_flag_21-22nt_0.01.txt.
To do:
Include a deduplication step for fastq files that have UMIs incorporated
Make QC filtering optional
Work on final report
Add coverage statistics to local db blast results
Incorporate VirusDetect in the pipeline