Aim:
Identify sequence variants using the outputs from the NextFlow: ONTprocessing pipeline. Generated minimap2 alignments are processed using https://github.com/kishwarshafin/pepper to identify highly reliable sequence variants (i.e., SNPs).
...
Preparing a samplesheet.csv file
The nextflow ‘eresearch/deepvariant’ pipeline requires a sample metadata file that specifies: sample ID, BAM alignment, BAI index, and genome reference. For example:
Code Block |
---|
sampleid,sample_files,sample_files_index,reference
NC483,/ontprocessing/NC483/run1/results/samtools/NC483_aln.sorted.bam,/ontprocessing/NC483/run1/results/samtools/NC483_aln.sorted.bam.bai,/data/ref/NC483_NC001477_reference_sequence.fasta |
Use the following script (i.e., called run_create_deepvariant_index.sh
) to generate the index file. Note: modify the sample ID and reference sequence location as appropriate.
Code Block |
---|
#/bin/bash
#eResearch,QUT
#Usage: ./run_create_deepvariant_index.sh
########################################################################################
SAMPLEID='NC483'
BAM=`readlink -f ./results/samtools/*.bam`
BAI=`readlink -f ./results/samtools/*.bam.bai`
REF='/mnt/work/phylo/OxfordNanopore/NC483_NC001477_reference_sequence.fasta'
########################################################################################
#header for index file
echo "sampleid,sample_files,sample_files_index,reference" > header
#create sample metadata
awk -v sampleid2="$SAMPLEID" -v bam2="$BAM" -v bai2="$BAI" -v ref2="$REF" '{print sampleid2 "," bam2 "," bai2 "," ref2}' header > index_deepvariant
#merge header and location of files
cat header index_deepvariant > index_deepvariant.csv
#remove intermediate files
rm header index_deepvariant |
Run the above script from the “ONTprocessing” pipeline.within the ‘ONTprocessing’ folder for the sample of interest, just outside the ‘results’ and ‘work’ folders. Once all the variables have been adjusted, run the following command:
Code Block |
---|
./run_create_deepvariant_index.sh . |
Check that the index file has been properly generated.
Running the ‘deepvariant’ analysis
Create a folder for the deepvariant analysis and copy the ‘index_deepvariant.csv’ file to it.
Prepare the following PBS Pro script to run the ‘deepvariant’ analysis using the minimap2 BAM files produced by the ‘ONTprocessing’ pipeline.
Code Block |
---|
#!/bin/bash -l
#PBS -N deepvariant
#PBS -l select=1:ncpus=2:mem=4gb
#PBS -l walltime=24:00:00
cd $PBS_O_WORKDIR
module load java
NXF_OPTS='-Xms1g -Xmx4g'
#run the deepvariant pipeline
nextflow run eresearchqut/deepvariant --samplesheet index_deepvariant.csv -resume
#allow access to others in the group
chmod -R g+rwX results
chmod -R g+rwX work |
Create a folder where you analyses will be run, and place a copy of both launch.pbs and index.csv in the same folder. The submit the job to the HPC cluster:
Code Block |
---|
qsub launch.pbs |
Monitor progress of the job:
Code Block |
---|
qjobs |