Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Let’s create an interactive session on the HPC:

...

Now we are done installing all the tools that we need for today.

Approach #2 (we are not doing this - this just for your information) - installing all tools at once (slower option!)

Prepare the following environment.yml file:

Code Block
name: ONTvariants_mapping
channels:
  - conda-forge
  - defaults
  - bioconda
dependencies:
  - samtools=1.20
  - minimap=2-2.28

...

As you have seen, we can search at anaconda.org for other tools that we might be interested to use.

Remember, if you run into compatibility issues or errors, you can always create a new conda environment for the tool of interest. NOTE: you can switch between conda environements as follows:

Code Block
conda activate ONTvariants_QC
...
...
...
conda deactivate
conda activate ONTvariants_mapping
...
...
...

...

Code Block
cat launch_ONTvariants_mapping.pbs 
Code Block
#!/bin/bash -l
#PBS -N run2_mapping
#PBS -l select=1:ncpus=8:mem=16gb
#PBS -l walltime=72:00:00
#PBS -m abe

cd $PBS_O_WORKDIR

#conda activate ONTvariants_QC
conda activate porechop

###############################################################
# Variables
###############################################################
FASTQ='/work/training/ONTvariants/data/runs/run1_QC/SRR17138639_1_porechop_abi_chopper_q10_300b.fastq'
GENOME='/work/training/ONTvariants/data/chr20.fasta'
SAMPLEID='SRR17138639'
GENOMEID='chr20'
###############################################################

#STEP1: Mapping preprocessed reads with minimap2 onto reference genome
minimap2 -t 8 -a $GENOME $FASTQ | awk '$3!="*"'  > ${SAMPLEID}_mapped_${GENOMEID}.sam

##STEP2: samtools - SAM to sorted BAM
samtools view -bS ${SAMPLEID}_mapped_${GENOMEID}.sam > ${SAMPLEID}_mapped_${GENOMEID}.bam
samtools sort -o ${SAMPLEID}_mapped_${GENOMEID}.sorted.bam ${SAMPLEID}_mapped_${GENOMEID}.bam
samtools index ${SAMPLEID}_mapped_${GENOMEID}.sorted.bam

Exercise #1: running a test using a sample dataset

Code Block
Convert the sam file to bam (a binary sam format) using samtools’ view command
Sort the bam file (needed for fast access) using samtools sort command
Create an index of the bam file (needed by IGV) using samtools index command

Ecoli

https://www.ebi.ac.uk/ena/browser/view/SRR17645344 Note:

  • Line 1: Defines that the script is a bash script.

  • Lines 2-5: Are commented out with “#” at the beginning and are ignored by bash, however, these PBS lines tell the scholar (PBS Pro) the name of the job (line 2), the number of CPUs and RAM memory to use (line 3), the time to run the script (line 4) and report if there are any errors (line 5).

  • Line 7: Tells the job to run on the current directory.

  • Line 9: Activate the conda environment where the QC tools were installed using conda.

  • Lines 15-18: User defined variables. Modify the FASTQ, genome and/or sample ID to use to run the job as appropriate. Note: in the lines below, the variable names are used instead of the actual names or locations of the files (e.g., $FASTQ)

  • Line 22: Map quality processed reads onto reference genome using minimap2. The output alignment file is in SAM format.

  • Line 25: Convert the SAM file to BAM (a binary SAM format) using samtools' view command.

  • Line 26: Sort the BAM file (required for fast access) using samtools' sort command

  • Line 27: Create an index of the sorted BAM file (needed for IGV) using the samtools' index command.

Submit the QC job to the HPC cluster:

Code Block
qsub launch_ONTvariants_mapping.pbs

Monitor the progress of the job:

Code Block
qjobs

Visualisation of the alignment using IGV

We can visualise the mapped reads using a web-based IGV genome browser tool at https://igv.org

Click on IGV Web App:

...

Next you will be presented with a page with the human genome hg38 (GRCh38) loaded for all chromosomes including RefSeq (reference) genes:

...

Let’s now upload the BAM and the index BAM.BAI file by selecting Tracks → Local File → browse to the BAM and BAI files and select them both at the same time to upload:

...

Next select chromosome 20 (chromosome from the scroll down menu on the top left corner, click on ‘all’ then select ‘chr20’:

...

Zoom in a visualise the alignments:

...