Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Current »

Table of Contents

Aims

  • Implement a bioinformatics pipeline for the detection of EBV-integration sites in the human genome

Generate ONT simulated data

squigulator is a tool for simulating nanopore raw signal data. It is under development and there could be interface changes and changes to default parameters. Read more here: https://github.com/hasindu2008/squigulator

Install a precompiled copy:

VERSION=0.3.0
wget https://github.com/hasindu2008/squigulator/releases/download/v${VERSION}/squigulator-v${VERSION}-x86_64-linux-binaries.tar.gz
tar xf squigulator-v${VERSION}-x86_64-linux-binaries.tar.gz  && cd squigulator-v${VERSION}
./squigulator --help

Location: /work/GRC_collaborations/EBV/tools/squigulator-v0.3.0

PHASE 1: No mutations introduced to reference genomes prior simulation of ONT data.

Genomes:

  • GRCh38.p14

  • EBV_ASM240226v1

  • GRCh38+EBV_ASM240226v1 (integrated in the genome)

GRCh38.p14 ONT simulated data:

Location:

/work/GRC_collaborations/EBV/analysis/1.squigulator_simulated_data_NoVariants/run1_GRCh38.p14_human_genome

Squigulator simulation:

#!/bin/bash -l
#PBS -N squigulator_GRCh38.p14
#PBS -l walltime=48:00:00
#PBS -l mem=64gb
#PBS -l ncpus=32

#use current working directory
cd $PBS_O_WORKDIR

#################################################
## user defined variables
#################################################
SQUIGULATOR='/work/GRC_collaborations/EBV/tools/squigulator-v0.3.0/squigulator'
SAMPLEID='GRCh38.p14'
GENOME='/work/GRC_collaborations/EBV/Genomes/GCF_000001405.40_GRCh38.p14_genomic.fna'
COVERAGE=30
PROFILE='dna-r10-prom'
#################################################


#STEP1: Create simulated reads at 30X genome coverage
#example code: 
#squigulator hg38noAlt.fa -x dna-r10-prom -o reads.blow5 -f 30
#we use the user defined variables above to modify the example code:

$SQUIGULATOR $GENOME -x $PROFILE \
	-o ${SAMPLEID}_ONT_${PROFILE}_reads.blow5 \
	-f $COVERAGE \
	-t 32 \
	-q ${SAMPLEID}_ONT_${PROFILE}_reads.fasta \
	-c ${SAMPLEID}_ONT_${PROFILE}_reads_aln.paf \
	-a ${SAMPLEID}_ONT_${PROFILE}_reads_aln.sam

EBV_ASM240226v1 ONT simulated data:

Location:

/work/GRC_collaborations/EBV/analysis/1.squigulator_simulated_data_NoVariants/run2_ASM240226v1_viral_genome

Squigulator simulation:

#!/bin/bash -l
#PBS -N squigulator_EBV
#PBS -l walltime=24:00:00
#PBS -l mem=32gb
#PBS -l ncpus=16

#use current working directory
cd $PBS_O_WORKDIR

#################################################
## user defined variables
#################################################
SQUIGULATOR='/work/GRC_collaborations/EBV/tools/squigulator-v0.3.0/squigulator'
SAMPLEID='EBV_ASM240226v1'
GENOME='/work/GRC_collaborations/EBV/Genomes/GCF_002402265.1_ASM240226v1_genomic.fna'
COVERAGE=30
PROFILE='dna-r10-prom'
#################################################


#STEP1: Create simulated reads at 30X genome coverage
#example code: 
#squigulator hg38noAlt.fa -x dna-r10-prom -o reads.blow5 -f 30
#we use the user defined variables above to modify the example code:

$SQUIGULATOR $GENOME -x $PROFILE \
	-o ${SAMPLEID}_ONT_${PROFILE}_reads.blow5 \
	-f $COVERAGE \
	-t 16 \
	-q ${SAMPLEID}_ONT_${PROFILE}_reads.fasta \
	-c ${SAMPLEID}_ONT_${PROFILE}_reads_aln.paf \
	-a ${SAMPLEID}_ONT_${PROFILE}_reads_aln.sam

GRCh38.p14+EBV_ASM240226v1 ONT simulated data:

Locations:

/work/GRC_collaborations/EBV/analysis/1.squigulator_simulated_data_NoVariants/run3_custom_genome_human+virus

Squigulator simulation:

#!/bin/bash -l
#PBS -N squigulator_GRCh38.p14_EBV
#PBS -l walltime=48:00:00
#PBS -l mem=64gb
#PBS -l ncpus=32

#use current working directory
cd $PBS_O_WORKDIR

#################################################
## user defined variables
#################################################
SQUIGULATOR='/work/GRC_collaborations/EBV/tools/squigulator-v0.3.0/squigulator'
SAMPLEID='GRCh38.p14_EBV_custom_genome'
GENOME='/work/GRC_collaborations/EBV/Genomes/custom_genome_one_GCF.fna'
COVERAGE=30
PROFILE='dna-r10-prom'
#################################################


#STEP1: Create simulated reads at 30X genome coverage
#example code: 
#squigulator hg38noAlt.fa -x dna-r10-prom -o reads.blow5 -f 30
#we use the user defined variables above to modify the example code:

$SQUIGULATOR $GENOME -x $PROFILE \
	-o ${SAMPLEID}_ONT_${PROFILE}_reads.blow5 \
	-f $COVERAGE \
	-t 32 \
	-q ${SAMPLEID}_ONT_${PROFILE}_reads.fasta \
	-c ${SAMPLEID}_ONT_${PROFILE}_reads_aln.paf \
	-a ${SAMPLEID}_ONT_${PROFILE}_reads_aln.sam

Outputs include:

  • BLOW5: simulated ONT data using dna-r10-prom profile at 30X genome coverage (*reads.blow5)

  • FASTA: FASTA file to write simulated reads with no errors (*reads.fasta)

  • PAF: PAF file to write the alignment of simulated reads (*reads_aln.paf)

  • SAM: SAM file to write the alignment of simulated reads (*reads_aln.sam)

Converting BLOW5 to FAST5 data

First, let’s install ‘slow5tools’ from GitHub:

VERSION=v1.1.0
wget "https://github.com/hasindu2008/slow5tools/releases/download/$VERSION/slow5tools-$VERSION-x86_64-linux-binaries.tar.gz" && tar xvf slow5tools-$VERSION-x86_64-linux-binaries.tar.gz && cd slow5tools-$VERSION/
./slow5tools 

Optionally, copy the ‘slow5tools’ executable to your home bin:

cp slow5tools $HOME/bin/

Example data:

/work/GRC_collaborations/EBV/analysis/1.squigulator_simulated_data_NoVariants/coverage_5x/run2_ASM240226v1_viral_genome/EBV_ASM240226v1_ONT_dna-r10-prom_reads_5x.blow5

slow5tools parameters:

./slow5tools --help
Usage: ./slow5tools [OPTIONS] [COMMAND] [ARG]
Tools for using slow5 files.

OPTIONS:
    -h, --help       Display this message and exit.
    -v, --verbose    Verbosity level.
    -V, --version    Output version information and exit.
    --cite           Prints the citation.

COMMANDS:
    f2s or fast5toslow5   convert fast5 file(s) to SLOW5/BLOW5
    s2f or slow5tofast5   convert SLOW5/BLOW5 file(s) to fast5
    merge                 merge SLOW5/BLOW5 files
    split                 split SLOW5/BLOW5 files
    index                 create a SLOW5/BLOW5 index file
    get                   display the read entry for each specified read id
    view                  view the contents of a SLOW5/BLOW5 file or convert between different SLOW5/BLOW5 formats and compressions
    stats                 prints statistics of a SLOW5/BLOW5 file to the stdout
    cat                   quickly concatenate SLOW5/BLOW5 files of same type (same header, extension, compression)]
    quickcheck            quickly checks if a SLOW5/BLOW5 file is intact
    skim                  skims through requested components in a SLOW5/BLOW5 file

ARGS:    Try './slow5tools [COMMAND] --help' for more information.

  • No labels