https://mediahub.qut.edu.au/media/t/0_d0bsv333
Requesting Resources
Before you can run your software analysis on the HPC, suitable resources must be reserved for you. You must determine the resources needed to run your software before you submit your job. If you do not request enough CPUs, your analysis may run slower than intended. If your request does request enough memory, the PBS scheduler will kill your job if it tries to allocate more than the limit. The PBS scheduler will also kill your job if it has not finished before time you requested in your job.
It can be a chicken and the egg situation as you may not know what resources you need to request. At the end of your job, PBS will create a summary of the resources you used. You can use this summary to tune your job requests. Asking for more resources than you need is good to ensure your job finishes successfully, but the more resources you ask for, the longer it usually takes before your job starts. So if you over estimate, use the results to lower your request for the next job. A good place to start is if you have run the analysis on your laptop, you can start there, typically 4 cpus and 16gb.
Types of Resources
PBS tracks the following resources which you can request in your job:
...
ngpus: Select one or more GPUs for your job. Only select if you know your software can use a GPU
cputype: Select the type of CPU you want to use. You can use Intel CPUs or AMD CPUs
Types of Jobs
When running software on the HPC, we do NOT run the apps on the Login Node. Since the Login node is shared amongst all the connected HPC users, we don’t want you slowing everyone else down, and we don’t want others slowing you down. To run any software, we need to submit a job.
...
Sometimes it may be difficult to find the right software that will run on the HPC, or you might need to experiment with providing command line options. An Interactive job can help here. When you submit an Interactive job, your terminal stops accepting input until the job is allocated to a node and starts. When it starts, you will be transfered to the node. This session is not shared, you can run apps without effecting others.
Helpful Tools
The command to submit jobs is qsub. qsub has many options and you find out all of them by accessing qsub’s man page.
To check on the status of your jobs, use the qstat command. Another tool written by QUT HPC staff is qjobs.
Launching an Interactive job
To launch an Interactive job, we need to supply the necessary options to qsub. Let's launch a small Interactive job, with 1 CPU and 1gb of memory for 1 hour3 hours.
Code Block |
---|
qsub -I -S /bin/bash -l select=1:ncpus=1:mem=1gb -l walltime=13:00:00 |
You will see:
Code Block |
---|
qsub: wating for job {job id} to start qsub: job {job id} ready {username}@{node}:~> |
You are now connected to the node assigned to your job and you can run commands. Notice how the server name (after the @ symbol) has changed.
Launching a Batch Job - Constructing a Job Script
It is possible to submit a batch job completely from the command line, saving the job parameters and commands in a text file is very handy for documenting you use of the HPC. In a job script #PBS is used to provide instructions to PBS, they are not run as commands. A small script is:
Code Block |
---|
#!/bin/bash -l
#PBS -l select=1:ncpus=1:mem=2gb
#PBS -l walltime=00:10:00
echo $(hostname) |
This job will request one node (select=1), one cpu (ncpus=1), 2gb of memory (mem=2gb) and run for a maximum of 10 minutes (walltime=00:10:00)
Notice how the options after #PBS are the same as the qsub command line?
This script is very basic, it will run the command hostname, which outputs the name of the computer this job is running on, then echo that to the screen.
While the name of the file is not important, I like to save my PBS job scripts as {name}.pbs to easily identify them in the file list. Use training01.pbs here.
Launching a Batch Job - Submitting a Job Script
Since all the options are contained in the job script, the qsub line is short:
Code Block |
---|
qsub training.pbs |
And you will see a job number printed on the screen. Use qjobs to check on the status of the job.
Checking on the Job Status
To quickly check on you jobs that are queued and running, use the qjobs command
Code Block |
---|
qjobs |
You will get a summary of each queued job and the running ones. The finished ones are not displayed.
An alternative way to list your jobs:
Code Block |
---|
qstat -u $user |
Get more details about a particular job:
Code Block |
---|
qstat -f {jobid} |
Checking the Output
Since we told the job to print the name of the node the job was running on, how do we see it? PBS will save the output of the commands run in the job into two files by default. The format is {job name}.o{job id} and {job name}.e{job id}
Let's examine these files:
Code Block |
---|
# find the files by listing the contents of the folder sorted by reverse date
ls -ltr
# the 'e' file is empty
cat training01.pbs.o{tab}
cl4n018
PBS Job 5228698.pbs
CPU time : 00:00:00
Wall time : 00:00:02
Mem usage : 0b |
We can see in this case, the job ran on the cl4n018 node, use no measurable cpu and memory, and lasted for 2 seconds. The two files represent the standard output and the error output of the commands. The name of the files and merging them is possible with more options.
More options in job scripts
We have just scratched the surface of what you can specify when you submit and run jobs. A few useful ones are:
Be notified when the job starts, use the -m option eg be sent an email if the job is aborted, when it begins, and when it ends: #PBS -m abe
Give the job a name: To find your job in a long list give it a meaning name with the -N option: #PBS -N MyJob01
Merge the error file into the standard output file: #PBS -j oe
Overriding the email address: If you want to send the job notification email to another address, use the -M option, eg #PBS -M bob@bob.com
Tricks and Tips
When the job starts, PBS will logon to the node as you and your working directory will be your home folder. If your data is in a sub folder or in a shared folder, you can use this to automatically change to that folder:
Code Block |
---|
cd $PBS_O_WORKDIR |
$PBS_O_WORKDIR is a special environment variable created by PBS. This will be the folder where you ran the qsub commandWe can now run commands that use all of the resources we requested. These resources are not shared like the Login Node.
We shall keep this job running for the next section.