Let’s create an interactive session on the HPC:
qsub -I -S /bin/bash -l walltime=10:00:00 -l select=1:ncpus=2:mem=4gb
Install tools using conda
Approach #1 - installing tools one at a time (faster option)
Create a conda environment called sniffles
conda create -n ONTvariants_mapping
Collecting package metadata (current_repodata.json): done Solving environment: done ==> WARNING: A newer version of conda exists. <== current version: 4.12.0 latest version: 24.5.0 Please update conda by running $ conda update -n base -c defaults conda ## Package Plan ## environment location: /home/barrero/miniconda3/envs/ONTvariant Proceed ([y]/n)? y Preparing transaction: done Verifying transaction: done Executing transaction: done # # To activate this environment, use # # $ conda activate ONTvariants_mapping # # To deactivate an active environment, use # # $ conda deactivate
Let’s activate the conda environment:
conda activate ONTvariants_mapping
Next, we need to install few tools for today’s exercises. Now let’s go the https://anaconda.org and search for the following tools and instructions on how to install them:
samtools, sniffles, minimap2
For example, search for samtools:
If the tool you are looking is available in conda, a list of options will be presented. Typically choose the option at the top with most downloads and compatible for your system:
Click on the link to the tool of interest and you will be presented with the conda command line to run in your system to install the tool:
Copy and paste the first command shown above in your terminal where you have activated the ‘ONTvariant’ conda environment:
conda install bioconda::samtools
Next, let’s install minimap2:
conda install bioconda::minimap2
Now we are done installing all the tools that we need for today.
Approach #2 (we are not doing this - this just for your information) - installing all tools at once (slower option!)
Prepare the following environment.yml file:
name: ONTvariants_mapping channels: - conda-forge - defaults - bioconda dependencies: - samtools=1.20 - minimap=2-2.28
Create a new environment:
conda env create -f environment.yml
Installing more tools or dealing with compatibility issues between tools
As you have seen, we can search at anaconda.org for other tools that we might be interested to use.
Remember, if you run into compatibility issues or errors, you can always create a new conda environment for the tool of interest. NOTE: you can switch between conda environements as follows:
conda activate ONTvariants_QC ... ... ... conda deactivate conda activate ONTvariants_mapping ... ... ...
Running mapping
Now that we have installed all the tools needed for the QC of Nanopore reads, let’s run the preprocessing of reads.
Let’s initially move to the run1_QC working directory:
cd $HOME/workshop/ONTvariants/runs/run2_mapping
Now let’s copy the script for the exercise:
cp /work/training/ONTvariants/scripts/launch_ONTvariants_mapping.pbs .
Note: the above script copies the launch script for the scripts folder to the current directory denoted by the full stop “ . “ at the end of the command.
Let’s print the content of the script:
cat launch_ONTvariants_mapping.pbs
#PBS -N run2_mapping #PBS -l select=1:ncpus=8:mem=16gb #PBS -l walltime=72:00:00 #PBS -m abe cd $PBS_O_WORKDIR #conda activate ONTvariants_QC conda activate porechop ############################################################### # Variables ############################################################### FASTQ='/work/training/ONTvariants/data/runs/run1_QC/SRR17138639_1_porechop_abi_chopper_q10_300b.fastq' GENOME='/work/training/ONTvariants/data/chr20.fasta' SAMPLEID='SRR17138639' GENOMEID='chr20' ############################################################### #STEP1: Mapping preprocessed reads with minimap2 onto reference genome minimap2 -t 8 -a $GENOME $FASTQ | awk '$3!="*"' > ${SAMPLEID}_mapped_${GENOMEID}.sam ##STEP2: samtools - SAM to sorted BAM samtools view -bS ${SAMPLEID}_mapped_${GENOMEID}.sam > ${SAMPLEID}_mapped_${GENOMEID}.bam samtools sort -o ${SAMPLEID}_mapped_${GENOMEID}.sorted.bam ${SAMPLEID}_mapped_${GENOMEID}.bam samtools index ${SAMPLEID}_mapped_${GENOMEID}.sorted.bam
Exercise #1: running a test using a sample dataset
Convert the sam file to bam (a binary sam format) using samtools’ view command Sort the bam file (needed for fast access) using samtools sort command Create an index of the bam file (needed by IGV) using samtools index command
Ecoli