Introduction to using the HPC
Pre-requisites
Have an HPC account on QUT’s lyra. Apply for a new HPC account here.
Introduction to the HPC: https://qutvirtual4.qut.edu.au/group/research-students/conducting-research/specialty-research-facilities/advanced-research-computing-storage/supercomputing/getting-started-with-hpc
Installed conda3 or miniconda3 ( https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html ) → Recommend installing miniconda3 in your HPC’s /home directory.
(optional) Basic unix command line knowledge (example: https://researchcomputing.princeton.edu/education/external-online-resources/linux ; https://swcarpentry.github.io/shell-novice/ )
(optional) Familiarity with one unix text editors (example Vi/Vim or Nano):
Installing Putty and connecting to the HPC
Install Putty:
Installing PuTTY - QUT Media Hub
Connect to the HPC:
Connecting to the HPC with PuTTY - QUT MediaHub
Log into the HPC
ssh userID@lyra.qut.edu.au
Brief basic Unix commands
Once you log into the HPC, you will land in your personal home space (i.e. /home/myStudentID/). This space is only accessible to you. To work in collaboration with others we use workspaces (i.e. /work/myProjectName/).
To go to a shared directory for your project named “kenna_team” type the following command and hit enter:
cd /work/speight_team/
Display list of files in a directory
ls -lh
Print working directory
pwd
Create a folder
mkdir myfolder
Enter new folder
cd myfolder
Move back to the previous folder
cd ..
Make a backup copy of the file
cp myfile.txt myfile_copy.txt
Move a copy of a file to a newly created folder - note it is recommended to make a copy of important files prior to modifying or executing commands on them.
mv myfile_copy.txt myfolder/
View the content of a file (note hashtags # at the start of a line is used to provide information of the code underneath it)
#hash tags are used to add comments on what a command line does
#several commands can be used including cat, less, more, head and tail
cat myfile_copy.txt
#example: less -S allows to visualise very large (wide) files
less -S myfile_copy.txt
#stop viewing a file using the above command
--> Type “Control” and “c” at the same time.
Or “Control” and “d” at the same time.
#print the first 50 lines of a file
head -50 myfile_copy.txt
#print the last 20 lines of a file
tail -20 myfile_coy.txt
Go back to my personal space. Type 'cd' and hit enter. This will move you to /home/mystudentID/
cd
Installing CONDA
https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html
Follow the links on the page above and find the version of ‘Miniconda3’ script to download for your system. For the HPC, we select a Linux installer and right-click to copy the link to the file. The use the ‘wget’ command on the HPC to download the file:
Step1: Download miniconda installer
wget https://repo.anaconda.com/miniconda/Miniconda3-py39_4.11.0-Linux-x86_64.sh
Step2: Verify the integrity of the downloaded file:
sha256sum Miniconda3-py39_4.11.0-Linux-x86_64.sh
The above should print the following hash:
4ee9c3aa53329cd7a63b49877c0babb49b19b7e5af29807b793a76bdb1d362b4 Miniconda3-py39_4.11.0-Linux-x86_64.sh
Step3: Install ‘miniconda3’ as follows:
bash Miniconda3-py39_4.11.0-Linux-x86_64.sh
Follow the prompts and accept all the suggested options.
Step4: Close and reopen the terminal to enable conda to be available.
Step5: Check that conda is installed
conda
#list all tools and dependencies installed
conda list
Installing BLAST
Once miniconda3 has been installed, you will need to log out and login again to enable the conda command.
To install BLAST or other bioinformatics tools go to https://anaconda.org and search for the tool of interest. You can also use the command
conda search -c bioconda blast
to search for tools in the bioconda channel.
If the tool is available click on the tool link, which will open a new window showing the command line needed to install the tool. For example for BLAST the suggested command is:
conda install -c bioconda blast
run the above command to install blast. Conda will evaluate if the tool or necessary dependencies are available and will automatically install all necessary items to run in this case blast.
Note: Follow a similar process as above to install other tools.
Interactive Job
qsub -I -S /bin/bash -l walltime=10:00:00 -l select=1:ncpus=2:mem=4gb
Sample Data
Demo sample data to compare the similarity of DNA sequences generated by an RNA-seq approach against a reference Miscanthus sinensis mosaic virus (MsiMV) can be found at:
/work/eresearch_bio/sandpit/blast
-rw-rw---- 1 barrero 36K Jan 20 12:43 query_sample.fa
-rw-rw---- 1 barrero 9.7K Jan 20 12:44 MsiMV_genome.fasta
-rw-rw----+ 1 barrero 419 Jan 20 12:50 launch_blastN.pbs
We want to compare the similarity (from 0 to 100%) of the sequences (also called ‘reads’) inside the query_sample.fa file against the reference MsiMV_genome.fasta sequence. Note: RNA/DNA (and protein) sequences can be stored in a ‘fasta format’. This is a header noted by “>” symbol followed by a sequence identifier on the first row. From the second row onwards the DNA/RNA(protein) sequence is presented.
Running blast on the HPC
We use a PBS Pro submission script to submit jobs to the HPC cluster. Create a file called ‘launch_blastN.pbs’ and fill it with this content, substituting email@host for your email address, and the files used as input to blastn:
#!/bin/bash -l
#PBS -N blastN
#PBS -l walltime=10:00:00
#PBS -l mem=8gb
#PBS -l ncpus=4
#PBS -m bae
#PBS -M email@host
#PBS -j oe
cd $PBS_O_WORKDIR
#define variables. For example name of the fasta file to use. Note: it can be either with a suffix .fa or .fasta or other.
QUERY=query_sample.fa
REFERENCE=MsiMV_genome.fasta
EVALUE=1e-10
#run blastn search
blastn -query $QUERY \
-subject $REFERENCE \
-out blastN_${QUERY}_vs_${REFERENCE}.out \
-outfmt 6 \
-evalue $EVALUE \
-num_threads 4
Submit the job:
qsub launch_blastN.pbs
Checking the progression of the submitted job:
qjobs
#alternatively use:
qstat -u USERNAME
#check load of queues
qstat -q
How to interpret the result? check this tutorial.