Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Let’s create an interactive session on the HPC:

...

Code Block
conda install bioconda::samtools

Now repeat the process for ‘sniffles’, then install it:

code
conda install bioconda::sniffles

Next, let’s install minimap2 and seqkit:

Code Block
conda install bioconda::minimap2

Now we are done installing all the tools that we need for today.

Approach #2 (we are not doing this - this just for your information) - installing all tools at once (slower option!)

Prepare the following environment.yml file:

Code Block
name: ONTvariants_mapping
channels:
  - conda-forge
  - defaults
  - bioconda
dependencies:
  - samtools=1.20
  - sniffles=1.0.12
  - minimap=2-2.28

Create a new environment:

...

As you have seen, we can search at anaconda.org for other tools that we might be interested to use.

Remember, if you run into compatibility issues or errors, you can always create a new conda environment for the tool of interest. NOTE: you can switch between conda environements as follows:

Code Block
conda activate ONTvariants_QC
...
...
...
conda deactivate
conda activate ONTvariants_mapping
...
...
...

...

Running mapping

Now that we have installed all the tools needed for the QC of Nanopore reads, let’s run the preprocessing of reads.

Let’s initially move to the run1_QC working directory:

Code Block
cd $HOME/workshop/ONTvariants/runs/run2_mapping

Now let’s copy the script for the exercise:

Code Block
cp /work/training/ONTvariants/scripts/launch_ONTvariants_mapping.pbs .

Note: the above script copies the launch script for the scripts folder to the current directory denoted by the full stop “ . “ at the end of the command.

Let’s print the content of the script:

Code Block
cat launch_ONTvariants_mapping.pbs 
Code Block
#PBS -N run2_mapping
#PBS -l select=1:ncpus=8:mem=16gb
#PBS -l walltime=72:00:00
#PBS -m abe

cd $PBS_O_WORKDIR

#conda activate ONTvariants_QC
conda activate porechop

###############################################################
# Variables
###############################################################
FASTQ='/work/training/ONTvariants/data/runs/run1_QC/SRR17138639_1_porechop_abi_chopper_q10_300b.fastq'
GENOME='/work/training/ONTvariants/data/chr20.fasta'
SAMPLEID='SRR17138639'
GENOMEID='chr20'
###############################################################

#STEP1: Mapping preprocessed reads with minimap2 onto reference genome
minimap2 -t 8 -a $GENOME $FASTQ | awk '$3!="*"'  > ${SAMPLEID}_mapped_${GENOMEID}.sam

##STEP2: samtools - SAM to sorted BAM
samtools view -bS ${SAMPLEID}_mapped_${GENOMEID}.sam > ${SAMPLEID}_mapped_${GENOMEID}.bam
samtools sort -o ${SAMPLEID}_mapped_${GENOMEID}.sorted.bam ${SAMPLEID}_mapped_${GENOMEID}.bam
samtools index ${SAMPLEID}_mapped_${GENOMEID}.sorted.bam

Exercise #1: running a test using a sample dataset

...