Aim:
Identify sequence variants using the outputs from the NextFlow: ONTprocessing pipeline. Generated minimap2 alignments are processed using https://github.com/kishwarshafin/pepper to identify highly reliable sequence variants (i.e., SNPs).
Preparing a samplesheet.csv file
The nextflow ‘eresearch/deepvariant’ pipeline requires a sample metadata file that specifies: sample ID, BAM alignment, BAI index, and genome reference. For example:
sampleid,sample_files,sample_files_index,reference NC483,/ontprocessing/NC483/run1/results/samtools/NC483_aln.sorted.bam,/ontprocessing/NC483/run1/results/samtools/NC483_aln.sorted.bam.bai,/data/ref/NC483_NC001477_reference_sequence.fasta
Use the following script (i.e., called run_create_deepvariant_index.sh
) to generate the index file. Note: modify the sample ID and reference sequence location as appropriate.
#/bin/bash #eResearch,QUT #Usage: ./run_create_deepvariant_index.sh ######################################################################################## SAMPLEID='NC483' BAM=`readlink -f ./results/samtools/*.bam` BAI=`readlink -f ./results/samtools/*.bam.bai` REF='/mnt/work/phylo/OxfordNanopore/NC483_NC001477_reference_sequence.fasta' ######################################################################################## #header for index file echo "sampleid,sample_files,sample_files_index,reference" > header #create sample metadata awk -v sampleid2="$SAMPLEID" -v bam2="$BAM" -v bai2="$BAI" -v ref2="$REF" '{print sampleid2 "," bam2 "," bai2 "," ref2}' header > index_deepvariant #merge header and location of files cat header index_deepvariant > index_deepvariant.csv #remove intermediate files rm header index_deepvariant
Run the above script from within the ‘ontprocessing’ folder for the sample of interest, just outside the ‘results’ and ‘work’ folders. Once all the variables have been adjusted, run the following command:
./run_create_deepvariant_index.sh .
Check that the index file has been properly generated.