...
Code Block |
---|
conda install bioconda::chopper |
Running QC
Now that we have installed all the tools needed for the QC of Nanopore reads, let’s run the preprocessing of reads.
Let’s initially move to the run1_QC working directory:
Code Block |
---|
cd $HOME/workshop/ONTvariants/runs/run1_QC |
Now let’s copy the script for the exercise:
Code Block |
---|
cp /work/training/ONTvariants/scripts/launch_ONTvariants_QC.pbs . |
Note: the above script copies the launch script for the scripts folder to the current directory denoted by the full stop “ . “ at the end of the command.
Let’s print the content of the script:
Code Block |
---|
cat launch_ONTvariants_QC.pbs |
Code Block |
---|
#!/bin/bash -l
#PBS -N run1_QC
#PBS -l select=1:ncpus=8:mem=16gb
#PBS -l walltime=72:00:00
#PBS -m abe
cd $PBS_O_WORKDIR
conda activate ONTvariants_QC
###############################################################
# Variables
###############################################################
FASTQ='/work/training/ONTvariants/data/SRR17138639_1.fastq.gz'
GENOME='/work/training/ONTvariants/data/chr20.fasta'
SAMPLEID='SRR17138639'
###############################################################
#STEP1: NanoPlot - overall QC report
NanoPlot -t 8 --fastq $FASTQ --prefix ${SAMPLEID}_QC_ --plots dot --N50 --tsv_stats
#STEP2: porechop_abi - remove adapters
porechop_abi -abi -t 8 --input ${SAMPLEID}.fastq.gz --discard_middle --output ${SAMPLEID}_trimmed.fastq
#STEP3: chopper - retain reads with >Q10 and length>300b
chopper -q 10 -l 300 -i ${SAMPLEID}_trimmed.fastq > ${SAMPLEID}_trimmed_q10.fastq |
Note:
Line 1: Defines that the script is a bash script.
Lines 2-5: Are commented out with “#” at the beginning and are ignored by bash, however, these PBS lines tell the scholar (PBS Pro) the name of the job (line 2), the number of CPUs and RAM memory to use (line 3), the time to run the script (line 4) and report if there are any errors (line 5).
Line 7: Tells the job to run on the current directory.
Line 9: Activate the conda environment where the QC tools were installed using conda.
Lines 11-17: User defined variables. Modify the FASTQ, genome and/or sample ID to use to run the job as appropriate. Note: in the lines below, the variable names are used instead of the actual names or locations of the files (e.g., $FASTQ)
Line 20: Run a Quality Control (QC) overview of the raw Nanopore reads using NanoPlot
Line 23: Remove adapter sequences from the 5'- and 3’-ends of the raw reads
Line 26: Filter reads with a quality score below Q10 (90% accuracy; -q 10) and shorter than 300 bases (-l 300)