Submitting Jobs

The KU Community Cluster uses the TORQUE resource manager and the Moab Workload Manager to manage and schedule jobs.

Use the Moab msub command to submit non-interactive or interactive batch jobs for execution on the compute nodes:

NOTE: Maximum number of jobs a user can have submitted at one time is 5000

  • Non-interactive jobs:To run a job in batch mode, first prepare a Moab job script that specifies the application you want to run and the resources required to run it, and then submit it to Moab with the msub command.

    A submission script is simply a text file that contains your job parameters and the commands you wish to execute as part of your job. You can also load modules, set environmental variables, or other tasks inside your submission script.

    If you do not specify a queue, your job will run on your default queue. Run mystats to see your default queue and resources in those queues.

    msub example.sh

    You may also submit simple jobs from the command line

    echo "echo Hello World!" | msub

    Note:Command-line options will override msub flags in your job script.

  • Interactive jobs: An interactive job allows you to open a shell on the compute node itself as if you had ssh'd into it. It usually is used for debugging purposes.

    To submit an interactive job, use the msub with the -I(to specify an interactive job). Again, if you do not specify a queue, your job will run in your default queue.

    msub -I -q sixhour -l nodes=1:ppn=4,mem=4gb,walltime=4:00:00

    In the example above, I have requested:

    • -I Run an interactive job
    • -q sixhour Run in the sixhour queue
    • -l nodes=1:ppn=4,mem=4gb,walltime=4:00:00Indicates the job requires 1 node, 4 processors per node, 4GB of memory, and 4 hours of wall-clock time
    • The -I, -q, -l are called flags.

    If you have ssh'd to the submit nodes with X11 forwarding enabled and wish to have X11 for an interactive job, then supply the -X flag

    msub -I -X -q sixhour -l nodes=1:ppn=4,mem=4gb,walltime=4:00:00

Submission Script

To run a job in batch mode on a high-performance computing system using Moab, first prepare a job script that specifies the application you want to run and the resources required to run it, and then submit the script to Moab using the msub command.

A very basic job script might contain just a bash or tcsh shell script. However, Moab job scripts most commonly contain at least one executable command preceded by a list of directives that specify resources and other attributes needed to execute the command (e.g., wall-clock time, the number of nodes and processors, and filenames for job output and errors). These directives are listed in flag lines (lines beginning with #MSUB), which should precede any executable lines in your job script.

Additionally, your Moab job script (which will be executed under your preferred login shell) should begin with a line that specifies the command interpreter under which it should run.

For Example:

  • A Moab job script for an MPI job might look like this:
     #!/bin/bash 
     #MSUB -l nodes=2:ppn=6:ib,walltime=30:00
     #MSUB -M jayhawk@ku.edu
     #MSUB -m abe 
     #MSUB -N JobName 
     #MSUB -j oe 
    
     mpirun ~/bin/binaryname
    

    In the above example, the first line indicates the script should be read using the bash command interpreter. Then, several MSUB Flags and Node Attributes are included

    The last line in the example is the executable line. It tells the operating system to use the mpirun command to execute the ~/bin/binaryname binary. The scheduler is smart enough, that it knows that 2 nodes and 6 cores were requested and it will run on the nodes assigned to the job using 12 cores. You do not need to specify the old way of -np.

Node Attributes

Attributes are requested under the -l msub flag. Because the cluster is consortium of hardware, attributes allow the user to specify which type of node they wish to use (e.g. ib, gpu, intel)

#MSUB -l nodes=2:ppn=4:ib
#MSUB -l nodes=1:ppn=1:gpus=1
Attribute Description
:intel Intel CPUs
:amd AMD CPUs
:ib Nodes with Infiniband connections
:noib Nodes without Infiniband connections
:gpus=X Request X amount of GPUs
:mics=X Request X amount of Phi Coprocessors. Many-Integrated Cores (MIC)

Queues

There are no more bigmem, interactive, gpu queues.

Each owner group has their own queue. (e.g. bi, compbio, crmda). You can view your default queue by running mystats​.

Default queue means that this is the queue you will run in if you do not specify any queue in your job submission. You can go ahead and specify your default queue #MSUB -q bi if you wish, but it is not necessary. You are not allowed to run in anybody else's queue.

Six Hour

Other than the owner group queues, there is a sixhour queue. This queue will allow your jobs to go across all nodes in the cluster, but is limited to a wall time of 6 hours.

To run in the sixhour queue, specify #MSUB -q sixhour in your job script.

Msub Flags

All flags below are prefixed with #MSUB. For example:

#MSUB -q sixhour
#MSUB -M jayhawk@ku.edu
#MSUB -m e

This is a brief list of the most commonly used flags out of all the flags for msub

Flag Function
-a <date_time> Execute the job only after specified date and time (<date_time>)
-d <path> The directory in which the job should begin executing
-e <filename> Defines the file name to be used for stderr
-I Run the job interactively
-j oe Combine stdout and stderr into the same output file. If you want to give the combined file a name, specify the -o <path> flag also
-l Resource request list. Request nodes, cpus, memory, walltime, and etc.
-m abe Mail a job summary when the job aborts, begins, ends
-M <email_address> Email address you wish to send the job summaries to
-o <filename> Defines the file name to be used for stdout
-q <queue_name> Specify the queue for the job. If not specified, defaults to user's default queue
-t <name>[<indexlist>]%<limit> Starts a job array with the jobs in the index list. The limit variable specifies how many jobs may run at a time.
-V Declares that all environment variables in your env are exported to the batch job
-X Forwards your X11 connection for interactive job. Can only be used with -I

List of all flags for msub

Resource Request (-l)

The most important resources are nodes and cores. The attributes must be specified on the same line as nodes and cores, separated by a :(colon)

#MSUB -l nodes=5:ppn=20:gpus=1:ib

Everything else can either be split with a , or put onto a new line with an additional #MSUB -l line.

#MSUB -l nodes=5:ppn=20,pmem=5gb,walltime=4:00:00
#MSUB -l nodes=5:ppn=20
#MSUB -l pmem=5gb
#MSUB -l walltime=4:00:00
  • Defaults: If you do not specify nodes, ppn, and/or mem, you will receive the default settings.

    • nodes=1
    • ppn=1
    • pmem=2gb
    • walltime=8:00:00 for owner queues and the max is 60:00:00:00 (60 days).
    • walltime=1:00:00 (1 hour) for the sixhour queue.

Below are the most used resource list. See attributes for all the attributes you can specify. All examples are prefixed with #MSUB -l.

Example Resource
walltime=HH:MM:SS Wall clock time. How long the job has to run before being terminated
nodes=X:ppn=Y Total number of nodes and processor per node you wish for your job to run on. Total number of cores equals nodes multiplied by ppn.
mem=X Memory allocated for the entire job. If jobs runs over several nodes or cpus, this memory is then divided equally among them. Only 97% of total memory may be requested.
pmem=X Memory allocated per task. If running MPI job, this is the memory allocated for each task.

Memory may be allocated in bytes (no suffix), kilobytes ("k"), megabytes ("m"), or gigabytes ("g").

Memory Limits

As stated above for the mem resource request. The total amount of memory you may request is 97% of the total memory for that node. This is implemented to prevent processes from using 100% of the total memory of the node and starving the necessary services on the node of memory and thus causing the node to crash.

This also goes for pmem. You'll have to take the number of cores requested per node and multiply that by your pmem value and make sure it does not go above the allowed limit

Total amount of memory on node Amount allowed to request
32 GB 30 GB
64 GB 60 GB
128 GB 122 GB
256 GB 244 GB
512 GB 489 GB

Moab Commands

Below are some common, useful Moab commands:

Moab Command Function
showq Display the jobs in the Moab job queue. (Jobs may be in a number of states; "running" and "idle" are the most common.)
showq -u <username> Display the jobs submitted by the specified <username>
showq -w class=<queue> Display the jobs in the specified <queue>. (Will not show jobs running in the sixhour queue that may be running on nodes in your queue)
checkjob <jobid> Check the status of a job (<jobid>). For verbose mode, add -v (e.g., checkjob -v <jobid>).
showstart <jobid> Show an estimate of when your job (<jobid>) might start.
checknode <node_name> Check the status of a node (<node_name>).
canceljob <jobid> Cancel a job.

Viewpoint

Viewpoint is a web portal which allows you to submit jobs, build application templates, get job, node, and queue details, file manager, and use remote visualization. It is accessible by any user who has access to the KU Community Cluster.

Log in with your KU Online ID. After logging in, you will see all jobs you have running and all jobs that you have ran in the past. On the right, you can see usage of the entire cluster

  • Submit Jobs:
  • Application Templates:
  • Job and Node Details:You are able to see various details of all your current and past jobs. By clicking on the node, you can see how many processors, how much memory, what features it has, and what jobs are currently running on it.
    • Job: On the HOME or WORKLOAD tab, click on the Job ID number.

      From here, you can see how much CPU your job is using, what node it is running on, and the resources you requested.

    • Node:
  • File Manager:

GUI and Remote Desktop

The Remote Desktop application being used is called RemoteViz. This allows a job to be submitted through the Viewpoint web portal and run an interactive graphical session.

We only have 10 licenses, meaning only 10 people can have a RemoteViz session open at a time.

  • Default walltime: 7 days

Open Session

To submit a RemoteViz job, you must first be signed into Viewpoint.

  • Either on the HOME or the WORKLOAD tab, select CREATE JOB

  • Select the Remote Viz Application

  • Select the Application and the Geometry you wish for the window to be

    A new page will load. In the top right you will see a screenshot of what the session looks like. Click on the Play emblem. You must allow pop-ups

  • Gnome Desktop: The Gnome Desktop will give you the normal Linux desktop where you can open multiple GUI applications side-by-side.
  • Graphical Terminal: This opens an Xterm session which is much like a terminal window with X11 enabled. You can launch applications from command line, like firefox, but are limited to one application at a time

Suspend and Resume Session

You can leave the session running by closing the browser window. This will leave the RemoteViz job running in the queue and leave the desktop just as you have it

To resume a suspended session, simply go to the Viewpoint web portal, sign in, and then click on the Job ID of the Remote_viz_job in the Workload screen, then click the Play emblem

Exiting Session

  • Log Out: On the desktop screen, go to System at the top toolbar, and then click Log Out "KU ID"...
  • Viewpoint: You can cancel the job from Viewpoint web portal by going to the Job Details page, and clicking the red X button.
  • Command line: RemoteViz is another job, just like when you submit a job. Use qdel "Job ID" to cancel the Remote_viz_job

CRC Help

If you need any help with the cluster or have general questions related to the cluster, please contact crchelp@ku.edu.

In your email, please include your submission script, any relevant log files, and steps in which you took to produce the problem

One of 34 U.S. public institutions in the prestigious Association of American Universities
44 nationally ranked graduate programs.
—U.S. News & World Report
Top 50 nationwide for size of library collection.
—ALA
23rd nationwide for service to veterans —"Best for Vets," Military Times
KU Today