Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Let’s create an NEW interactive session on the HPC:

Code Block
qsub -I -S /bin/bash -l walltime=10:00:00 -l select=1:ncpus=2:mem=4gb

Install tools using conda

...

We will now create a new CONDA environment to install the tools needed for mapping. The reason we need to create a new a new environments is because the QC and mapping tools have no compatible dependencies.

Create a conda environment called snifflesONTvariants_mapping

Code Block
conda create -n ONTvariants_mapping
Code Block
Collecting package metadata (current_repodata.json): done
Solving environment: done


==> WARNING: A newer version of conda exists. <==
  current version: 4.12.0
  latest version: 24.5.0

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/barrero/miniconda3/envs/ONTvariant



Proceed ([y]/n)? y

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate ONTvariants_mapping
#
# To deactivate an active environment, use
#
#     $ conda deactivate

...

Next, we need to install few tools for today’s exercises. Now let’s go the https://anaconda.org and search for the following tools and instructions on how to install them:

Code Block
samtools, sniffles, minimap2

For example, search for samtools:

...

Click on the link to the tool of interest and you will be presented with the conda command line to run in your system to install the tool:

...

Copy and paste the first command shown line from above in into your terminal where you have activated the ‘ONTvariant’ conda environmentsessions with the activated ‘ONTvariants_mapping’ environment. Install samtools (version 1.20):

Code Block
conda install bioconda::samtools

Next, let’s install minimap2 (version 2.28):

Code Block
conda install bioconda::minimap2

Now we are done installing all the tools that we need for today.

...

Alternative approach to create a conda env and install tools (we are not doing this - this just for your information) - installing all tools at once (slower option!)

Prepare the following environment.yml file:

Code Block
name: ONTvariants_mapping
channels:
  - conda-forge
  - defaults
  - bioconda
dependencies:
  - samtools=1.20
  - minimapminimap2=2-2.28

Create a new environment:

Code Block
conda env create -f environment.yml

Installing more tools or dealing with compatibility issues between tools

As you have seen, we can search at anaconda.org for other tools that we might be interested to use.

Remember, if you run into compatibility issues or errors, you can always create a new conda environment for the tool of interest. NOTE: you can switch between conda environements as follows:

Code Block
conda activate ONTvariants_QC
...
...
...
conda deactivate
conda activate ONTvariants_mapping
...
...
...

Running mapping

Now that we have installed all the tools needed for the QC of Nanopore reads, let’s run the preprocessing of reads.

...

Code Block
#!/bin/bash -l
#PBS -N run2_mapping
#PBS -l select=1:ncpus=8:mem=16gb
#PBS -l walltime=72:00:00
#PBS -m abe

cd $PBS_O_WORKDIR

#condaconda activate ONTvariants_QC
conda activate porechopmapping

###############################################################
# Variables
###############################################################
FASTQ='/work/training/ONTvariants/data/runs/run1_QC/SRR17138639_1_porechop_abi_chopper_q10_300b.fastq'
GENOME='/work/training/ONTvariants/data/chr20.fasta'
SAMPLEID='SRR17138639'
GENOMEID='chr20'
###############################################################

#STEP1: Mapping preprocessed reads with minimap2 onto reference genome
minimap2#minimap2 -t 8 -a $GENOME $FASTQ | awk '$3!="*"'  > ${SAMPLEID}_mapped_${GENOMEID}.sam
minimap2 -t 8 -a $GENOME $FASTQ > ${SAMPLEID}_mapped_${GENOMEID}.sam

##STEP2: samtools - SAM to sorted BAM
samtools view -bS ${SAMPLEID}_mapped_${GENOMEID}.sam > ${SAMPLEID}_mapped_${GENOMEID}.bam
samtools sort -o ${SAMPLEID}_mapped_${GENOMEID}.sorted.bam ${SAMPLEID}_mapped_${GENOMEID}.bam
samtools index ${SAMPLEID}_mapped_${GENOMEID}.sorted.bam

...

Monitor the progress of the job:

Code Block
qjobs

Once the run has completed. The following files will be in the “run2_mapping” folder:

Code Block
.
├── launch_ONTvariants_mapping.pbs
├── SRR17138639_mapped_chr20.bam
├── SRR17138639_mapped_chr20.sam
├── SRR17138639_mapped_chr20.sorted.bam
└── SRR17138639_mapped_chr20.sorted.bam.bai

To visualise the mapped reads (sorted BAM file) using IGV (see below), first we need to connect to the HPC using FileFinder:

NOTE: To proceed, you need to be on QUT’s WiFi network or signed via VPN.

To browse the working folder in the HPC type in the file finder:

Windows PC

Code Block
\\hpc-fs\work\training\ONTvariants\runs\run2_mapping

Mac

Code Block
smb://hpc-fs/work/training/ONTvariants\runs\run2_mapping

Visualisation of the alignment using IGV

We can visualise the mapped reads using a web-based IGV genome browser tool at https://igv.org

...

Zoom in a visualise the alignments:

...

Next: ONTvariants - epi2me WF-Human Variation pipeline