Let’s create an interactive session on the HPC:
...
Code Block |
---|
conda install bioconda::samtools |
Now repeat the process for ‘sniffles’, then install it:
conda install bioconda::sniffles |
Next, let’s install minimap2 and seqkit:
Code Block |
---|
conda install bioconda::minimap2 |
Now we are done installing all the tools that we need for today.
Approach #2 (we are not doing this - this just for your information) - installing all tools at once (slower option!)
Prepare the following environment.yml file:
Code Block |
---|
name: ONTvariants_mapping
channels:
- conda-forge
- defaults
- bioconda
dependencies:
- samtools=1.20
- sniffles=1.0.12
- minimap=2-2.28 |
Create a new environment:
...
As you have seen, we can search at anaconda.org for other tools that we might be interested to use.
Remember, if you run into compatibility issues or errors, you can always create a new conda environment for the tool of interest. NOTE: you can switch between conda environements as follows:
Code Block |
---|
conda activate ONTvariants_QC ... ... ... conda deactivate conda activate ONTvariants_mapping ... ... ... |
...
Running mapping
Now that we have installed all the tools needed for the QC of Nanopore reads, let’s run the preprocessing of reads.
Let’s initially move to the run1_QC working directory:
Code Block |
---|
cd $HOME/workshop/ONTvariants/runs/run2_mapping |
Now let’s copy the script for the exercise:
Code Block |
---|
cp /work/training/ONTvariants/scripts/launch_ONTvariants_mapping.pbs . |
Note: the above script copies the launch script for the scripts folder to the current directory denoted by the full stop “ . “ at the end of the command.
Let’s print the content of the script:
Code Block |
---|
cat launch_ONTvariants_mapping.pbs |
Code Block |
---|
#PBS -N run2_mapping
#PBS -l select=1:ncpus=8:mem=16gb
#PBS -l walltime=72:00:00
#PBS -m abe
cd $PBS_O_WORKDIR
#conda activate ONTvariants_QC
conda activate porechop
###############################################################
# Variables
###############################################################
FASTQ='/work/training/ONTvariants/data/runs/run1_QC/SRR17138639_1_porechop_abi_chopper_q10_300b.fastq'
GENOME='/work/training/ONTvariants/data/chr20.fasta'
SAMPLEID='SRR17138639'
GENOMEID='chr20'
###############################################################
#STEP1: Mapping preprocessed reads with minimap2 onto reference genome
minimap2 -t 8 -a $GENOME $FASTQ | awk '$3!="*"' > ${SAMPLEID}_mapped_${GENOMEID}.sam
##STEP2: samtools - SAM to sorted BAM
samtools view -bS ${SAMPLEID}_mapped_${GENOMEID}.sam > ${SAMPLEID}_mapped_${GENOMEID}.bam
samtools sort -o ${SAMPLEID}_mapped_${GENOMEID}.sorted.bam ${SAMPLEID}_mapped_${GENOMEID}.bam
samtools index ${SAMPLEID}_mapped_${GENOMEID}.sorted.bam |
Exercise #1: running a test using a sample dataset
...