Content Comparison

...

style	none

Create working folder and copy data

...

Code Block

mkdir -p $HOME/workshop/ONTvariants
mkdir -p $HOME/workshop/ONTvariants/data
mkdir -p $HOME/workshop/ONTvariants/scripts
mkdir -p $HOME/workshop/ONTvariants/runs/run1_QC
mkdir -p $HOME/workshop/ONTvariants/runs/run2_mapping
mkdir -p $HOME/workshop/ONTvariants/runs/run3_variant_calling

Now, let let’s copy the scripts and data for today’s session:

Code Block
cp /work/training/ONTvariants/data/* $HOME/workshop/ONTvariants/data cp /work/training/ONTvariants/scripts/* $HOME/workshop/ONTvariants/scripts cd $HOME/workshop/ONTvariants

Install tools using conda

Approach 1: Create a conda

...

environment and install tools one at a time

Create a conda environment called ONTvariants_nanoplotQC

Code Block
conda create -n ONTvariants_QC

Code Block

Collecting package metadata (current_repodata.json): done
Solving environment: done


## Package Plan ##

  environment location: /home/barrero/miniconda3/envs/ONTvariants_nanoplotQC


Proceed ([y]/n)? y

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate ONTvariants_nanoplotQC
#
# To deactivate an active environment, use
#
#     $ conda deactivate

Let’s activate the conda environment:

Code Block
conda activate ONTvariantONTvariants_nanoplotQC

Next, we need to install few tools for today’s exercises. Now let’s go the https://anaconda.org and search for the following tools and instructions on how to install them:

...

Code Block
conda install bioconda::seqkit

Approach 2: Create environment and install tools all at once

This is a slower option, but it is convenient when installing many tools.

Prepare the following environment.yml file:

Code Block
name: ONTvariants_QC channels: - conda-forge - defaults - bioconda dependencies: - nanoplot - porechop_abi - chopper

Create a new environment:

Code Block
cd $HOME/workshop/ONTvariants/scripts conda env create -f environment_QC.yml

Running QC

Now that we have installed all the tools needed for the QC of Nanopore reads, let’s run the preprocessing of reads.

...

Code Block

#!/bin/bash -l
#PBS -N run1_QC
#PBS -l select=1:ncpus=8:mem=16gb
#PBS -l walltime=48:00:00
#PBS -m abe

cd $PBS_O_WORKDIR

conda activate ONTvariants_QC

###############################################################
# Variables
###############################################################
FASTQ='/work/training/ONTvariants/data/SRR17138639_1.fastq.gz'
GENOME='/work/training/ONTvariants/data/chr20.fasta'
SAMPLEID='SRR17138639'
###############################################################

#STEP1: NanoPlot - overall QC report
NanoPlot -t 8 --fastq $FASTQ --prefix ${SAMPLEID}_QC_ --plots dot --N50 --tsv_stats

#STEP2: porechop_abi - remove adapters
porechop_abi -abi -t 8 --input ${SAMPLEID}.fastq.gz$FASTQ --discard_middle --output ${SAMPLEID}_trimmed.fastq

#STEP3: chopper - retain reads with >Q10 and length>300b
chopper -q 10 -l 300 -i ${SAMPLEID}_trimmed.fastq > ${SAMPLEID}_trimmed_q10.fastq

#STEP4: get stats of trimmed FASTQ files
seqkit stats *.fastq > Report_trimmed_FASTQ_stats.txt

Note:

Line 1: Defines that the script is a bash script.
Lines 2-5: Are commented out with “#” at the beginning and are ignored by bash, however, these PBS lines tell the scholar (PBS Pro) the name of the job (line 2), the number of CPUs and RAM memory to use (line 3), the time to run the script (line 4) and report if there are any errors (line 5).
Line 7: Tells the job to run on the current directory.
Line 9: Activate the conda environment where the QC tools were installed using conda.
Lines 11-17: User defined variables. Modify the FASTQ, genome and/or sample ID to use to run the job as appropriate. Note: in the lines below, the variable names are used instead of the actual names or locations of the files (e.g., $FASTQ)
Line 20: Run a Quality Control (QC) overview of the raw Nanopore reads using NanoPlot
Line 23: Remove adapter sequences from the 5'- and 3’-ends of the raw reads
Line 26: Filter reads with a quality score below Q10 (90% accuracy; -q 10) and shorter than 300 bases (-l 300)
Line 28: collect the stats for trimmed FASTQ files processed using porechop_abi and chopper

Submit the QC job to the HPC cluster:

...

As outputs find the porechop_abi processed file (SRR17138639_1_porechop_abi.fastq) and the chopper output (SRR17138639_1_porechop_abi_chopper_q10_300b.fastq). To visualise the QC reports, let’s connect to the HPC via file finder (see below).

NOTE: To proceed, you need to be on QUT’s WiFi network or signed via VPN.

To browse the working folder in the HPC type in the file finder:

...

Next, let’s inspect the “SRR17138639_QC_LengthvsQualityScatterPlot_dot.png“ file. Alternatively for high resolution image open instead “SRR17138639_QC_LengthvsQualityScatterPlot_dot.html“

...

Next: ONTvariants - mapping

Version	Old Version 12	New Version Current
Changes made by	Roberto Barrero Gumiel	Roberto Barrero Gumiel
Saved on	18/05/2024	21/05/2024

Versions Compared

Key

Create working folder and copy data

Install tools using conda

Approach 1: Create a conda

environment and install tools one at a time

Approach 2: Create environment and install tools all at once

Running QC

Content Comparison

Versions Compared

Key

<span class="diff-html-changed" data-a11y-before="Start of changed content" data-a11y-after="End of changed content" id="changed-diff-0">[data-colorid=</span>

Install tools using conda

Approach 1: Create a conda

environment and install tools one at a time

Approach 2: Create environment and install tools all at once

Running QC