BLAST is a program that compares a DNA/RNA or protein sequence, which are strings of letters against other reference sequences (i.e., whole non-redundant nucleic acid database, also known as NT).
Pre-requisites
Have an HPC account on QUT’s lyra. Apply for a new HPC account here.
Installed conda3 or miniconda3 ( https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html ) → Recommend installing miniconda3 in your HPC’s /home directory.
(optional) Basic unix command line knowledge (example: https://researchcomputing.princeton.edu/education/external-online-resources/linux ; https://swcarpentry.github.io/shell-novice/ )
(optional) Familiarity with one unix text editors (example Vi/Vim or Nano):
VIM ( https://bioinformatics.uconn.edu/vim-guide/ ; https://missing.csail.mit.edu/2020/editors/)
Nano (https://engineering.purdue.edu/ECN/Support/KB/Docs/BasictutorialforNanou ; https://www.howtogeek.com/howto/42980/the-beginners-guide-to-nano-the-linux-command-line-text-editor/ )
Installing BLAST
Once miniconda3 has been installed. The first time you may need to reinitiate your terminal session to make the ‘conda’ command available, you will need to log out and login again to enable the conda command.
To install BLAST or other bioinformatics tools go to https://anaconda.org and search for the tool of interest. You can also use the command
Code Block |
---|
conda search -c bioconda blast |
to search for tools in the bioconda channel.
If the tool is available click on the tool link, which will open a new window showing the command line needed to install the tool. For example for BLAST the suggested command is:
...
We use a PBS Pro submission script to submit jobs to the HPC cluster. Above the Create a file called ‘launch_blastN.pbs’ can be used to submit the job. This file has the following informationand fill it with this content, substituting email@host for your email address, and the files used as input to blastn:
Code Block |
---|
#!/bin/bash -l #PBS -N blastN #PBS -l walltime=10:00:00 #PBS -l mem=8gb #PBS -l ncpus=4 #PBS -q testvm #PBS -m bae ###PBS#PBS -M email@host #PBS -j oe cd $PBS_O_WORKDIR #define variables. For example name of the fasta file to use. Note: it can be either with a suffix .fa or .fasta or other. QUERY=query_sample.fa REFERENCE=MsiMV_genome.fasta EVALUE=1e-10 #run blastn search blastn -query $QUERY \ -subject $REFERENCE \ -out blastN_${QUERY}_vs_${REFERENCE}.out \ -outfmt 6 \ -evalue $EVALUE \ -num_threads 4 |
Submit the job:
Code Block |
---|
qsub launch_blastN.pbs |
Checking the progression of the submitted job:
Code Block |
---|
qjobs #alternatively use: qstatsqstat -u $USERNAMEUSERNAME |
How to interpret the result? check this tutorial.
...