Euler: Job Priority and Preemption
Job Priority and Preemption
Overview §
Euler offers several priority levels when submitting jobs. This mechanism allows certain essential services, such as critical instruments and instructional computing to run while still allowing researchers who contribute hardware to the cluster to use their machines as much as they need. This document aims to explain the different priority levels, who has access to them, and how to mitigate the inconvenience of being interrupted by a higher priority job.
Priority Levels §
At the time of this writing, there are 5 priority levels for Euler jobs. They are, in descending order of priority, critical
, Hardware Owner, instruction
, research
, and interactive
.
critical
§
The priority level with the highest numerical value is implemented only for a small number of systems on Euler's critical
partition. It is reserved exclusively for pre-approved, time sensitive jobs. Usually, these jobs run on regular intervals and process data coming from instruments, or they relay data to critical systems outside of Euler.
Hardware Owner §
The highest level priority available on most systems is made available only to the research lab which owns that particular hardware. This allows researchers to preempt low-priority jobs that are running on their hardware and quickly get access to their system when they need it.
instruction
§
The instruction
priority level is implemented only for a subset of systems and is only available on the Slurm partition of the same name. Access to the instruction
partition is only available to students enrolled in courses which have been approved for Euler access; and it is intended to be used for conducting coursework using Euler. This partition is limited to systems owned by CAE and a few other systems whose owners have been generous enough to share their hardware with students. The time limits available to instructional jobs are much lower than those afforded to research jobs.
research
§
The most commonly-used priority level on Euler is available on the general-purpose research
partition. The partition contains all Euler nodes and is available to any researcher within the College of Engineering; as such, it is something of a "default" level.
interactive
§
For tasks which don't make efficient use of Euler resources, the interactive
partition provides the lowest level of scheduling priority available on Euler. Low priority jobs include those which run interactively (in that they wait for user input) or those which need to run for an indefinite amount of time and require manual cancellation when they are no longer in use. They may ONLY be scheduled on this partition. In order to ensure that Euler remains available for jobs which are making the most of the resources available, interactive
jobs may be preempted by any other type of job. As such, they should be used sparingly and at one's own risk.
Mitigating the Impacts of Preemption §
A job being interrupted before completion can mean the loss of hours or even days of computational time. This can be frustrating for some users, so they look for ways to avoid preemption. Unfortunately, there isn't any single thing that can guarantee that a lower priority job won't be preempted by a higher-priority one. Instead, we can suggest a couple of ways to mitigate the impacts of the problem as well as some ways to reduce the likelihood of interruption under normal use.
MITIGATION 1: Use Checkpointing §
One of the most effective mitigations to preemption is to implement some kind of checkpointing. Checkpointing works by having the job write its current state to disk at some regular interval of time (such as every hour or every N iterations). The job is then configured it to always start from the latest state. When the job is interrupted, it only loses whatever progress was made since the last checkpoint. Then it is able to restart from that checkpoint as soon as its resources are available again.
This approach is known to be used successfully by researchers who need access to models of CPU and GPU which are in very high demand.
MITIGATION 2: Prefer Resources NOT in High Demand §
One benefit of Euler's heterogeneity is that not all cluster resources are equally useful to all researchers at any given time. There is some hardware that is in higher demand than others, and if your job is scheduled on it, it runs a higher risk of being preempted when the owner needs to use it. If your job doesn't strictly need those high-demand resources, you may benefit from telling Slurm to prefer nodes with different features. There are two flags, --constraint
and --prefer
which can be used to influence scheduling decisions based on certain hardware features. On Euler, these "features" include information about a system's hardware such as the CPU manufacturer and generation, whether it has a GPU, the generation of GPU, and whether it has an onboard SSD that can be used for scratch storage. Certain flags passed to the sinfo
command can enumerate the available combinations of features; a list enumerated by partition can be obtained using sinfo -o "%P %f"
.
MITIGATION 3: Prevent Job Restart After Interruption §
By default, Euler's Slurm scheduler will requeue preempted jobs at the front of the queue so that theyare able to restart as quickly as possible. If restarting a job after it has begun is undesirable, adding --no-requeue
to your sbatch flags will ensure that your job is not returned to the queue after it is interrupted.
MITIGATION 4: Contribute Hardware §
The only way to fully prevent job preemption is to run your jobs at a higher priority level. Euler offers the highest level of priority on a node to the lab which owns that system. For PIs who are able to do so, contributing hardware to Euler is the most direct way to ensure that their researchers are able to get access to computing time when they need it.
If you are interested in this option, please note that Euler has certain minimum configuration requirements to ensure compatibility. Reach out to CAE to discuss your group's computational needs before purchasing computing hardware.