Using the SSCC Slurm Cluster

Slurm (Simple Linux Utility for Resource Management) is a powerful system for managing and scheduling computing jobs that is very popular at large-scale research computing centers. The SSCC is currently running a pilot with a small Slurm cluster as we all learn more about it, but we anticipate that it will become the main way SSCC researchers run large research computing jobs. This article will show you how to use the SSCC's Slurm cluster.

Slurm Quick Start 

To submit a job to Slurm, the basic syntax is:

ssubmit --cores=C --mem=Mg "command"

Where C is the number of cores or threads to use, M is the amount of memory to use in gigabytes, and command is the command you'd normally use to run the job directly on the server. Note that the job must be able to run in the background, but do not put an ampersand at the end of the command. For example, to run a Stata do file called my_do_file.do using 32 cores and 20GB of memory, use:

ssubmit --cores=32 --mem=20g "stata -b do my_do_file"

You can submit multi-node jobs to Slurm by specifying --nodes, in which case --cores and --mem specify the cores and memory to use on each node. You can submit to partitions other than the default with --partition (which can be abbreviated --part). 

You can check the status of your job and see what else is in the Slurm queue with squeue. You can cancel a job with scancel plus the job identifier you can get from squeue

Introducing Slurm

Slurm is a job management and scheduling utility for clusters of Linux servers. You submit jobs to it like HTCondor, but Slurm makes it easier to specify the resources you need, can run multi-server jobs, and makes it easier to set priorities.

The SSCC is currently running a small pilot Slurm cluster containing five servers. The goals of the pilot are:

  1. Ensure most or all of the Linux research computing SSCC members do can run through Slurm, and figure out how
  2. Help SSCC members learn to use Slurm
  3. Observe how SSCC members use Slurm and craft policies that successfully balance giving everyone access to computing resources and allowing heavy users to do resource-intensive work

All of these will be a work in progress during the pilot period, so expect some problems to arise.

Keep in mind that SSCC staff have limited expertise in some of the software used by SSCC researchers, including Matlab, Julia, FORTRAN, and software used for biomedical research. Slurm is heavily used for this kind of computing and we're confident solutions exist for most or all of the problems you may encounter, but we may not be able to help you find them. Send your questions to the Help Desk and we'll help as much as we can. Also send us your solutions, and we'll share them with others.

The Slurm Cluster

The pilot cluster currently consists of four servers (slurm001-slurm-005) with 44 cores and 384GB of memory each. These servers were originally purchased by SMPH, so SMPH researchers have priority on them. We anticipate adding more servers to the cluster as the pilot continues.

Slurm Partitions

When you submit a job to Slurm, you submit it to a partition. Partitions can have different priorities, time limits, and other properties. While a job is only submitted to one partition, servers can belong to (take jobs from) multiple partitions. The SSCC's pilot Slurm cluster currently has the following partitions:

Partition Name Servers Max Job Length Notes
sscc slurm001-slurm003 7 Days Default partition
short slurm001-slurm005 6 Hours slurm005 is reserved for short jobs
smph slurm001-slurm003 7 Days Usable by SMPH researchers, preempts jobs in sscc and short partitions
smph-short slurm001-slurm005 6 Hours Usable by SMPH researchers, preempts jobs in sscc and short partitions, slurm005 reserved for short jobs

If you do not specify a partition your job will go to sscc. To specify a different partition use --partition or the abbreviation  --part (e.g. --partition=sscc-short or --part=smph).

One of the goals of the pilot is to learn what partitions and partition settings will work well for SSCC's needs, so please tell us if the current settings don't work well for you. For example, we anticipate we'll probably need a partition for jobs that take longer than seven days. We will also evaluate the need for partitions containing only servers where no one has elevated priority and jobs are never preempted.

Slurm Priorities

Jobs submitted to high priority partitions (smph, smph-short) will preempt jobs submitted to normal priority partitions (sscc, sscc-short) if they cannot run otherwise. We hope this will be rare. Preempted jobs are put back in the queue to run again, but lose any unsaved progress. SMPH researchers have priority on all the servers in the pilot cluster because they were purchased by SMPH, but we anticipate moving servers purchased by the SSCC into the cluster in the near future.

Jobs submitted to partitions with equal priority will not preempt each other. In choosing which job to run next, Slurm first considers the amount of Slurm computing time each user has used recently (past use decays exponentially with a half life of one week), with users with less past usage getting higher priority. It then considers how long each job has been waiting in the queue, with jobs that have been waiting longer getting higher priority. 

Note that when people submit many jobs to Linstat it tries to run all of them at once, and the busier it gets the slower they all run. With Slurm, jobs will always get the cores and memory they request and run at full speed, but if the cluster is busy they may need to wait for their turn before they start. Both systems will get about the same amount of work done in a given amount of time, but with Slurm the jobs that are run first will get done sooner while the jobs that are run last get done no later--a Pareto improvement.

Submitting Jobs

You can submit a job to Slurm using ssubmit command. The general syntax is:

ssubmit resources "command"

where resources is a description of the computing resources your jobs needs, and command is the Linux command that will run your job.

The ssubmit command is just a wrapper written by SSCC staff for the standard Slurm sbatch command: it takes your command, turns it into a submit file, and then sends it to Slurm with sbatch. Advanced users are welcome to use sbatch directly; documentation can be found at https://slurm.schedmd.com/documentation.html. Note that the --cores switch used by ssubmit translates to --cpus-per-task in standard Slurm, and --partition must be spelled out.

Computing Resources

--partition or --part tells ssubmit which partition to send your job to. If it is not specified the job will be sent to the sscc partition.

--nodes tells ssubmit how many nodes (servers) to use. If it is not specified the job will use one node. If you use multiple nodes, then --cores and --mem specify the cores and memory to be used on each node.

--cores tells ssubmit how many logical cores your job needs. If it is not specified the job will get one core.

--mem tells ssubmit how much memory your jobs needs. Put a g after the number to specify that the unit is gigabytes. If it is not specified the job will get one gigabyte of memory.

Your job will always get the cores and memory it asks for, but no more. If you request too few cores, your job will run more slowly than it could. But if you request too little memory, your job will crash when it runs out. So consider including a safety margin in your memory request; perhaps 25% of the total (e.g. if you know your job needs 40GB, ask for 50GB). On the other hand, as long as your job is running no one else can use the cores and memory you requested. Do not request more cores or memory than your job will actually use (plus a safety margin).

One exception: if your job will use all the cores in a server, it might as well use all the memory too as Slurm will not assign any other jobs to that server.
Please be considerate of other users, but think more in terms of "taking turns" than of "sharing." If your job can take advantage of all the cores in a server, ask for all of them so your job runs quickly and gets out of the way of others.
We realize that many SSCC researchers are not used to identifying the resources used by their jobs. Identifying the Computing Resources Used by a Linux Job has more guidance, but here are a few rules of thumb:
  • If a job can can use multiple cores but does not ask you to specify how many to use, it probably defaults to all the cores in the server.
  • Stata always uses 32 cores; SAS uses 4 by default.
  • Most statistical software loads the data it will use into memory (SAS is a notable exception). The amount of memory needed by a data set is usually similar to or smaller than the amount of disk space the data set uses (in Windows, right-click on the file and choose Properties to see how big it is). Keep in mind what you will then do with the data: making a second copy of the data will double the memory used, for example. Combining data sets will often result in a data set that's bigger than the sum of the sizes of the original data.
  • You can run the Linux top command while your job is running on Linstat to monitor what computing resources it is using.

Note for advanced users: the servers in the Slurm cluster have "hyperthreading" turned off so there is just one logical core per physical core.

Linux Commands for Slurm

Your job needs to be able to run in the background with no graphics or interaction with the user. The commands to do that for some popular programs are:

Stata

stata -b do my_do_file

R

R CMD BATCH --no-save my_R_script.R 

Python

python my_python_script.py

Matlab

matlab -nodisplay < my_matlab_script.m

SAS

sas my_sas_program.sas

Julia

julia my_julia_script.jl

Slurm will send any text output that would normally go to your screen to a file called slurm-N.out, where N is the JOBID of the job. If this output is important, we recommend you send it to a file of your choice instead by adding > outputfile to the end of the command, where outputfile should be replaced by the desired file name. For example:

matlab -nodisplay < my_matlab_script.m > my_matlab_script.log

Managing Jobs

To check the status of your job and/or see what else is in the Slurm queue, use squeue. In the output, NODELIST tells you which node(s) your job is running on. If under NODELIST it says (Resources) that means that the computing resources needed to run your job are not available at the moment. Your job will wait in the queue until other jobs finish, and then start as soon as possible. (BeginTime) means your job was preempted by a higher priority job and put back in the queue; it will be run again shortly if resources are available.
To cancel a job use scancel. The squeue output also tells you the JOBID of each job; you can cancel one of your jobs with:
scancel JOBID
where JOBID should be replaced by the number of the job you want to cancel.




Keywords:sscc slurm linux jobs   Doc ID:115417
Owner:Russell D.Group:Social Science Computing Cooperative
Created:2021-12-28 15:18 CDTUpdated:2022-05-05 12:06 CDT
Sites:Social Science Computing Cooperative
CleanURL:https://kb.wisc.edu/sscc/using-the-sscc-slurm-cluster
Feedback:  0   0