The KU Community Cluster uses the TORQUE resource manager and the Moab Workload Manager to manage and schedule jobs.
Use the Moab
msub command to submit non-interactive or interactive batch jobs for execution on the compute nodes:
NOTE: Maximum number of jobs a user can have submitted at one time is 5000
Non-interactive jobs:To run a job in batch mode, first prepare a Moab job script that specifies the application you want to run and the resources required to run it, and then submit it to Moab with the
A submission script is simply a text file that contains your job parameters and the commands you wish to execute as part of your job. You can also load modules, set environmental variables, or other tasks inside your submission script.
If you do not specify a queue, your job will run on your default queue. Run
mystatsto see your default queue and resources in those queues.
You may also submit simple jobs from the command line
echo "echo Hello World!" | msub
Note:Command-line options will override msub flags in your job script.
Interactive jobs: An interactive job allows you to open a shell on the compute node itself as if you had ssh'd into it. It usually is used for debugging purposes.
To submit an interactive job, use the
-I(to specify an interactive job). Again, if you do not specify a queue, your job will run in your default queue.
msub -I -q sixhour -l nodes=1:ppn=4,mem=4gb,walltime=4:00:00
In the example above, I have requested:
-IRun an interactive job
-q sixhourRun in the sixhour queue
-l nodes=1:ppn=4,mem=4gb,walltime=4:00:00Indicates the job requires 1 node, 4 processors per node, 4GB of memory, and 4 hours of wall-clock time
-I, -q, -lare called flags.
If you have ssh'd to the submit nodes with X11 forwarding enabled and wish to have X11 for an interactive job, then supply the
msub -I -X -q sixhour -l nodes=1:ppn=4,mem=4gb,walltime=4:00:00
To run a job in batch mode on a high-performance computing system using Moab, first prepare a job script that specifies the application you want to run and the resources required to run it, and then submit the script to Moab using the
A very basic job script might contain just a
tcsh shell script. However, Moab job scripts most commonly contain at least one executable command preceded by a list of directives that specify resources and other attributes needed to execute the command (e.g., wall-clock time, the number of nodes and processors, and filenames for job output and errors). These directives are listed in flag lines (lines beginning with #MSUB), which should precede any executable lines in your job script.
Additionally, your Moab job script (which will be executed under your preferred login shell) should begin with a line that specifies the command interpreter under which it should run.
- A Moab job script for an MPI job might look like this:
#!/bin/bash #MSUB -l nodes=2:ppn=6:ib,walltime=30:00 #MSUB -M email@example.com #MSUB -m abe #MSUB -N JobName #MSUB -j oe mpirun ~/bin/binaryname
The last line in the example is the executable line. It tells the operating system to use the
mpiruncommand to execute the ~/bin/binaryname binary. The scheduler is smart enough, that it knows that 2 nodes and 6 cores were requested and it will run on the nodes assigned to the job using 12 cores. You do not need to specify the old way of
Attributes are requested under the
-l msub flag. Because the cluster is consortium of hardware, attributes allow the user to specify which type of node they wish to use (e.g. ib, gpu, intel)
#MSUB -l nodes=2:ppn=4:ib #MSUB -l nodes=1:ppn=1:gpus=1
||Nodes with Infiniband connections|
||Nodes without Infiniband connections|
There are no more bigmem, interactive, gpu queues.
Each owner group has their own queue. (e.g. bi, compbio, crmda). You can view your default queue by running
Default queue means that this is the queue you will run in if you do not specify any queue in your job submission. You can go ahead and specify your default queue
#MSUB -q bi if you wish, but it is not necessary. You are not allowed to run in anybody else's queue.
Other than the owner group queues, there is a sixhour queue. This queue will allow your jobs to go across all nodes in the cluster, but is limited to a wall time of 6 hours.
To run in the sixhour queue, specify
#MSUB -q sixhour in your job script.
All flags below are prefixed with
#MSUB. For example:
#MSUB -q sixhour #MSUB -M firstname.lastname@example.org #MSUB -m e
This is a brief list of the most commonly used flags out of all the flags for
||Execute the job only after specified date and time (
||The directory in which the job should begin executing|
||Defines the file name to be used for stderr|
||Run the job interactively|
||Combine stdout and stderr into the same output file. If you want to give the combined file a name, specify the
||Resource request list. Request nodes, cpus, memory, walltime, and etc.|
||Mail a job summary when the job
||Email address you wish to send the job summaries to|
||Defines the file name to be used for stdout|
||Specify the queue for the job. If not specified, defaults to user's default queue|
||Starts a job array with the jobs in the index list. The limit variable specifies how many jobs may run at a time.|
||Declares that all environment variables in your
||Forwards your X11 connection for interactive job. Can only be used with
The most important resources are nodes and cores. The attributes must be specified on the same line as nodes and cores, separated by a
#MSUB -l nodes=5:ppn=20:gpus=1:ib
Everything else can either be split with a
, or put onto a new line with an additional
#MSUB -l line.
#MSUB -l nodes=5:ppn=20,pmem=5gb,walltime=4:00:00
#MSUB -l nodes=5:ppn=20 #MSUB -l pmem=5gb #MSUB -l walltime=4:00:00
Defaults: If you do not specify
mem, you will receive the default settings.
walltime=8:00:00for owner queues and the max is
walltime=1:00:00(1 hour) for the
Below are the most used resource list. See attributes for all the attributes you can specify. All examples are prefixed with
||Wall clock time. How long the job has to run before being terminated|
||Total number of nodes and processor per node you wish for your job to run on. Total number of cores equals nodes multiplied by ppn.|
||Memory allocated for the entire job. If jobs runs over several nodes or cpus, this memory is then divided equally among them. Only 97% of total memory may be requested.|
||Memory allocated per task. If running MPI job, this is the memory allocated for each task.|
Memory may be allocated in bytes (no suffix), kilobytes ("k"), megabytes ("m"), or gigabytes ("g").
As stated above for the
mem resource request. The total amount of memory you may request is 97% of the total memory for that node. This is implemented to prevent processes from using 100% of the total memory of the node and starving the necessary services on the node of memory and thus causing the node to crash.
This also goes for
pmem. You'll have to take the number of cores requested per node and multiply that by your
pmem value and make sure it does not go above the allowed limit
|Total amount of memory on node||Amount allowed to request|
|32 GB||30 GB|
|64 GB||60 GB|
|128 GB||122 GB|
|256 GB||244 GB|
|512 GB||489 GB|
Below are some common, useful Moab commands:
||Display the jobs in the Moab job queue. (Jobs may be in a number of states; "running" and "idle" are the most common.)|
||Display the jobs submitted by the specified
||Display the jobs in the specified
||Check the status of a job (
||Show an estimate of when your job (
||Check the status of a node (
||Cancel a job.|
Viewpoint is a web portal which allows you to submit jobs, build application templates, get job, node, and queue details, file manager, and use remote visualization. It is accessible by any user who has access to the KU Community Cluster.
Log in with your KU Online ID. After logging in, you will see all jobs you have running and all jobs that you have ran in the past. On the right, you can see usage of the entire cluster
- Submit Jobs:
- Application Templates:
- Job and Node Details:You are able to see various details of all your current and past jobs. By clicking on the node, you can see how many processors, how much memory, what features it has, and what jobs are currently running on it.
- Job: On the
WORKLOADtab, click on the
From here, you can see how much CPU your job is using, what node it is running on, and the resources you requested.
- File Manager:
The Remote Desktop application being used is called RemoteViz. This allows a job to be submitted through the Viewpoint web portal and run an interactive graphical session.
We only have 10 licenses, meaning only 10 people can have a RemoteViz session open at a time.
Default walltime: 7 days
To submit a RemoteViz job, you must first be signed into Viewpoint.
Either on the
Remote Viz Application
Geometryyou wish for the window to be
A new page will load. In the top right you will see a screenshot of what the session looks like. Click on the
Play emblem. You must allow pop-ups
- Gnome Desktop: The Gnome Desktop will give you the normal Linux desktop where you can open multiple GUI applications side-by-side.
- Graphical Terminal: This opens an Xterm session which is much like a terminal window with X11 enabled. You can launch applications from command line, like firefox, but are limited to one application at a time
Suspend and Resume Session
You can leave the session running by closing the browser window. This will leave the RemoteViz job running in the queue and leave the desktop just as you have it
To resume a suspended session, simply go to the Viewpoint web portal, sign in, and then click on the
Job ID of the Remote_viz_job in the Workload screen, then click the
- Log Out: On the desktop screen, go to
Systemat the top toolbar, and then click
Log Out "KU ID"...
- Viewpoint: You can cancel the job from Viewpoint web portal by going to the
Job Detailspage, and clicking the red
- Command line: RemoteViz is another job, just like when you submit a job. Use
qdel "Job ID"to cancel the Remote_viz_job