Create working folder and copy data

Let’s create an interactive session on the HPC:

qsub -I -S /bin/bash -l walltime=10:00:00 -l select=1:ncpus=2:mem=4gb

Let’s create working folders for today. Copy and paste the following block of code into your terminal and hit enter for the last command:

mkdir -p $HOME/workshop/ONTvariants
mkdir -p $HOME/workshop/ONTvariants/data
mkdir -p $HOME/workshop/ONTvariants/scripts
mkdir -p $HOME/workshop/ONTvariants/runs/run1_QC
mkdir -p $HOME/workshop/ONTvariants/runs/run2_mapping
mkdir -p $HOME/workshop/ONTvariants/runs/run3_variant_calling

Now, let copy the scripts and data for today’s session:

cp /work/training/ONTvariants/data/* $HOME/workshop/ONTvariants/data
cp /work/training/ONTvariants/scripts/* $HOME/workshop/ONTvariants/scripts

Install tools using conda

Create a conda environments

Create a conda environment called ONTvariants_nanoplot

conda create -n ONTvariants_QC
Collecting package metadata (current_repodata.json): done
Solving environment: done


## Package Plan ##

  environment location: /home/barrero/miniconda3/envs/ONTvariants_nanoplot


Proceed ([y]/n)? y

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate ONTvariants_nanoplot
#
# To deactivate an active environment, use
#
#     $ conda deactivate

Let’s activate the conda environment:

conda activate ONTvariant_nanoplot

Next, we need to install few tools for today’s exercises. Now let’s go the https://anaconda.org and search for the following tools and instructions on how to install them:

nanoplot, porechop_abi, chopper, seqkit

For example, search for nanoplot:

image-20240517-045848.png

If the tool you are looking is available in conda, a list of options will be presented. Typically choose the option at the top with most downloads and compatible for your system:

image-20240517-050025.png

Click on the link to the tool of interest and you will be presented with the conda command line to run in your system to install the tool:

image-20240517-050058.png

Copy and paste the first command shown above in your terminal where you have activated the ‘ONTvariant’ conda environment (Approx. 5-10min):

conda install bioconda::nanoplot

Now repeat the process for porechop_abi (Approx. 1min)’, then install it (Approx. 1min):

conda install bioconda::porechop_abi

Next, let’s install chopper (Approx. 1min):

conda install bioconda::chopper

Finally, let’s install the seqkit suite of tools:

conda install bioconda::seqkit

Running QC

Now that we have installed all the tools needed for the QC of Nanopore reads, let’s run the preprocessing of reads.

Let’s initially move to the run1_QC working directory:

cd $HOME/workshop/ONTvariants/runs/run1_QC

Now let’s copy the script for the exercise:

cp /work/training/ONTvariants/scripts/launch_ONTvariants_QC.pbs .

Note: the above script copies the launch script for the scripts folder to the current directory denoted by the full stop “ . “ at the end of the command.

Let’s print the content of the script:

cat launch_ONTvariants_QC.pbs 
#!/bin/bash -l
#PBS -N run1_QC
#PBS -l select=1:ncpus=8:mem=16gb
#PBS -l walltime=48:00:00
#PBS -m abe

cd $PBS_O_WORKDIR

conda activate ONTvariants_QC

###############################################################
# Variables
###############################################################
FASTQ='/work/training/ONTvariants/data/SRR17138639_1.fastq.gz'
GENOME='/work/training/ONTvariants/data/chr20.fasta'
SAMPLEID='SRR17138639'
###############################################################

#STEP1: NanoPlot - overall QC report
NanoPlot -t 8 --fastq $FASTQ --prefix ${SAMPLEID}_QC_ --plots dot --N50 --tsv_stats

#STEP2: porechop_abi - remove adapters
porechop_abi -abi -t 8 --input ${SAMPLEID}.fastq.gz --discard_middle --output ${SAMPLEID}_trimmed.fastq

#STEP3: chopper - retain reads with >Q10 and length>300b
chopper -q 10 -l 300 -i ${SAMPLEID}_trimmed.fastq > ${SAMPLEID}_trimmed_q10.fastq

Note:

Submit the QC job to the HPC cluster:

qsub launch_ONTvariants_QC.pbs

Monitor the progress of the job:

qjobs

The job will take ~30 - 40 min to complete.

Outputs

The following list of outputs will be generated once the job has completed:

.
├── launch_ONTvariants_QC.pbs
├── SRR17138639_1_porechop_abi_chopper_q10_300b.fa
├── SRR17138639_1_porechop_abi_chopper_q10_300b.fa_stats.txt
├── SRR17138639_1_porechop_abi_chopper_q10_300b.fastq
├── SRR17138639_1_porechop_abi.fastq
├── SRR17138639.fastq.gz
├── SRR17138639_QC_LengthvsQualityScatterPlot_dot.html
├── SRR17138639_QC_LengthvsQualityScatterPlot_dot.png
├── SRR17138639_QC_NanoPlot_20240517_2037.log
├── SRR17138639_QC_NanoPlot-report.html
├── SRR17138639_QC_NanoStats.txt
├── SRR17138639_QC_Non_weightedHistogramReadlength.html
├── SRR17138639_QC_Non_weightedHistogramReadlength.png
├── SRR17138639_QC_Non_weightedLogTransformed_HistogramReadlength.html
├── SRR17138639_QC_Non_weightedLogTransformed_HistogramReadlength.png
├── SRR17138639_QC_WeightedHistogramReadlength.html
├── SRR17138639_QC_WeightedHistogramReadlength.png
├── SRR17138639_QC_WeightedLogTransformed_HistogramReadlength.html
├── SRR17138639_QC_WeightedLogTransformed_HistogramReadlength.png
├── SRR17138639_QC_Yield_By_Length.html
└── SRR17138639_QC_Yield_By_Length.png

As outputs find the porechop_abi processed file (SRR17138639_1_porechop_abi.fastq) and the chopper output (SRR17138639_1_porechop_abi_chopper_q10_300b.fastq). To visualise the QC reports, let’s connect to the HPC via file finder (see below).

NOTE: To proceed, you need to be on QUT’s WiFi network or signed via VPN.

To browse the working folder in the HPC type in the file finder:

Windows PC

\\hpc-fs\work\training\ONTvariants

Mac

smb://hpc-fs/work/training/ONTvariants

Once connected, browse to the “/ONTvariants/runs/run1_QC” folder.

Let’s open the “SRR17138639_QC_NanoPlot-report.html“ report:

Summary statistics

Metrics

dataset

number_of_reads

5513156

number_of_bases

7815960904.0

median_read_length

586.0

mean_read_length

1417.7

read_length_stdev

2997.2

n50

4054.0

mean_qual

11.4

median_qual

13.4

longest_read_(with_Q):1

199230 (3.7)

longest_read_(with_Q):2

169532 (3.9)

longest_read_(with_Q):3

134047 (3.6)

longest_read_(with_Q):4

133337 (3.6)

longest_read_(with_Q):5

115232 (3.3)

highest_Q_read_(with_length):1

26.1 (290)

highest_Q_read_(with_length):2

25.4 (202)

highest_Q_read_(with_length):3

24.9 (331)

highest_Q_read_(with_length):4

24.5 (232)

highest_Q_read_(with_length):5

24.5 (243)

Reads >Q5:

5417207 (98.3%) 7502.8Mb

Reads >Q7:

5275906 (95.7%) 6978.9Mb

Reads >Q10:

4853447 (88.0%) 6056.6Mb

Reads >Q12:

3905370 (70.8%) 4809.9Mb

Reads >Q15:

1324999 (24.0%) 1571.9Mb

Next, let’s inspect the “SRR17138639_QC_LengthvsQualityScatterPlot_dot.png“ file. Alternatively for high resolution image open instead “SRR17138639_QC_LengthvsQualityScatterPlot_dot.html“

image-20240517-233321.png