A job array is a collection of jobs that differ from each other by only a single index parameter. Creating a job array provides an easy way to group related jobs together. For example, if you have a parameter study that requires you to run your application five times, each with a different input parameter, you can use a job array instead of creating five separate MSUB scripts and submitting them separately.
The syntax for submitting job arrays is:
msub -t [<jobname>]<indexlist>[%<limit>] arrayscript.sh. The <jobname> and <limit> are optional.
To create a job array, use a single MSUB script and use the
-t flag to specify a range for the index parameter, either on the msub command line or within your MSUB script. For example, if you submit the following script, a job array with five sub-jobs will be created:
#MSUB -l nodes=4:ppn=20:ib #MSUB -l walltime=8:00:00 #MSUB -t 1-5 # The array index range above can start at any value, including 0 # For each sub-job, a value provided by the -t # option is used for MOAB_JOBARRAYINDEX mkdir dir.$MOAB_JOBARRAYINDEX cd dir.$MOAB_JOBARRAYINDEX mpiexec ../a.out < ../input.$MOAB_JOBARRAYINDEX
Submitting the script to MOAB will return the parent MOAB_JOBID.
% msub job_array_script.sh 516540
Each sub-job in this job array will have a MOAB_JOBID that includes both the parent MOAB_JOBID and a unique MOAB_JOBARRAYINDEX value within the brackets.
516540 516540 516540 516540 516540
To specify that only a certain number of sub-jobs in the array can run at a time, use the percent sign (
%) delimiter. In this example, only five sub-jobs in the array can run at a time.
% msub -t myarray[1-1000]%5
To submit a specific set of array sub-jobs, use the comma delimiter in the array index list.
% msub -t myarray[1,2,3,4] % msub -t myarray[1-5,7,10]
To submit a job with a step size, use a colon (
:) in the array range and specify how many jobs to step. In the example below, a step size of 2 is requested. The sub-jobs will be numbered according to the step size inside the index limit.
% msub -t myarray[2-10:2] job.sh
Since the KU Community Cluster utilizes MOAB job arrays, it is best to use the command
checkjob to view the status of the job instead of
The status of the sub-jobs is not displayed by default. For example, the following
checkjob command shows the job array summary:
% checkjob 516540 job 516540 AName: job_array_script Job Array Info: Name: 516540 Sub-jobs: 5 Active: 5 ( 100.0% ) Eligible: 0 ( 0.0% ) Blocked: 0 ( 0.0% ) Completed: 0 ( 0.0% )
To check the status of the sub-jobs, use the parameter
-v along with
% checkjob -v 516540 job 516540 AName: job_array_script Job Array Info: Name: 516540 1 : 516540 : Running 2 : 516540 : Running 3 : 516540 : Running 4 : 516540 : Running 5 : 516540 : Running Sub-jobs: 5 Active: 5 ( 100.0% ) Eligible: 0 ( 0.0% ) Blocked: 0 ( 0.0% ) Completed: 0 ( 0.0% )
You can also check the sub-jobs individually by using the MOAB_JOBID of the array sub-job.
% checkjob 516540
To delete a job array or a sub-job, use the
canceljob command and specify the array or sub-job.
% canceljob 516540 % canceljob 516540
You have a file with a list of paths to different files that you wish perform the same action on. You could submit a job that loops through the file on 1 node and does said action, or you could submit an array job with an index range with however many lines the file may contain. For this example, our file contains 1000 lines.
#MSUB -l nodes=1:ppn=20 #MSUB -l walltime=6:00:00 #MSUB -q sixhour #MSUB -t [1-1000] LINE=$(sed -n "$MOAB_JOBARRAYINDEX"p File.txt) echo $LINE call-program-name-here $LINE
Now say your file contains a list of paths that you need to do action on line 1 and 2, then lines 3 and 4, then lines 5 and 6, and so on.
#MSUB -l nodes=1:ppn=20 #MSUB -l walltime=6:00:00 #MSUB -q sixhour #MSUB -t [1-500] # Job array size will be half of the number of lines in the file SECOND=$((MOAB_JOBARRAYINDEX*2)) FIRST="$(($SECOND - 1))" LINE1=$(sed "$FIRST"p File.txt) LINE2=$(sed "$SECOND"p File.txt) call-program-name-here $LINE1 $LINE2
Now you have a file with 30,000 lines and wish to do work on each lines. The program you are calling only takes 30 seconds or so, so why not, instead of having 1 node only do 1 line, then take the next job, have that 1 node loop through 100 lines of the file. Now instead of having 30,000 jobs doing only 1 line, you have 300 jobs doing 100 lines.
#MSUB -l nodes=1:ppn=20 #MSUB -l walltime=6:00:00 #MSUB -q sixhour #MSUB -t [1-300] # Job array size will be number of lines in file divided by # number of lines chosen below START=$MOAB_JOBARRAYINDEX NUMLINES=100 STOP=$((MOAB_JOBARRAYINDEX*NUMLINES)) START="$(($STOP - $(($NUMLINES - 1))))" echo "START=$START" echo "STOP=$STOP" for (( N = $START; N <= $STOP; N++ )) do LINE=$(sed "$N"p File.txt) call-program-name-here $LINE done