Topics Map > User Guides
Topics Map > Platform R

Platform R: Using Containers in Slurm

Container Overview

Platform R's slurm has a plugin to allow the use of containers. The slurm batch options all contain the name "singularity" but docker images can be used, as well. Note that the open source version of Singularity is called Apptainer. 

Container Parameters

The Singularity plugin adds the following command-line options to sallocsrun, and sbatch:

  • --singularity-container - This is the path to the container you want to use. This can be a local SIF file, or a registry URL such as docker://IMAGE_NAME:VERSION
  • --singularity-args - Sets additional command line arguments for singularity run.
  • --singularity-bind - User defined paths will be appended to the defaults specified in the plug-in configuration. Equivalent to using the environment variable SLURM_SINGULARITY_BIND.
  • --singularity-no-bind-defaults - Disable the bind mount defaults specified in the plug-in configuration.

Registries

Docker Hub

Docker Hub is the default registry in Platform R. A fully-qualified domain name isn't required when pulling images from Docker Hub from within slurm srun or sbatch commands.. Docker Hub is a good source for offical base images like Python and R. For example, docker://r-base:4.5.1 

Nvidia Container Registry

Platform R supports containers images from the Nvidia Container Registry. Nvidia Container Registry provides container images that support data processing on GPUs and LLMs, inlcuding Tensorflow.

Running Container Jobs

Running an Interactive Container Job Using The Docker Hub Registry

This command will start an interactive R session within the container, allowing you to run R commands directly.

srun --cpus-per-task=4 --mem-per-cpu=16G \
  --singularity-container=docker://r-base:4.5.1 --pty R

(The backslashes above allow you to continue a command across several lines. There are used here for readability.)

Command breakdown:

  • --cpus-per-task=4: Requests 4 CPU cores for the job
  • --mem-per-cpu=16G: Requests 16GB of memory per CPU
  • --singularity-container=docker://r-base:4.5.1: Specifies the R container image from Docker Hub
  • --pty: Allocates a pseudo-terminal for interactive use
  • R: Launches the R interpreter

The first time you use a new container it will be downloaded and cached.

Running a Batch Container Job Using The Docker Hub Registry

First create a directory to hold the script, slurm batch file, and output for this example.

Create the file nodename.R, which for demonstration purposes just has one line and one comment, which will print the name of the node it is running on.

# What node was the job scheduled on? 
print(Sys.info()[["nodename"]])

Then the batch script, submit.sh:

#!/bin/bash

#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=16G
#SBATCH --output=nodename.out

srun \
  --singularity-container=docker://r-base:4.5.1 \
  Rscript nodename.R 

(The backslashes above allow you to continue a command across several lines. There are used here for readability.)

The options here are the same as for interactive jobs, with the addition of the --output option which is the file where printed output will go.

Finally, submit your batch job with sbatch submit.sh. Shortly the file nodename.out will appear with the output from the R code above. 

Running an Interactive Tensorflow Container Job Using The Nvidia Container Registry

These commands will start an interactive Python session within the container with tensorflow available.

First, you have to specify where the CUDA libraries are installed

setenv LD_LIBRARY_PATH /usr/local/cuda/lib64

Then you can connect to an interactive Python session:

srun --gres gpu:1 --cpus-per-task=4 \
  --mem-per-cpu=16G \
  --singularity-container=docker://nvcr.io/nvidia/tensorflow:25.02-tf2-py3 \
  --singularity-args="--nv" --pty python


(The backslashes above allow you to continue a command across several lines. There are used here for readability.)

Command breakdown:

  • --gres gpu:1: Requests 1 GPU for the job
  • --cpus-per-task=4: Requests 4 CPU cores for the job
  • --mem-per-cpu=16G: Requests 16GB of memory per CPU
  • --singularity-container=docker://nvcr.io/nvidia/tensorflow:25.02-tf2-py3: Specifies the Tensorflow container image from Nvidia
  • --singularity-args="--nv" Activates access to the GPU inside the container
  • --pty: Allocates a pseudo-terminal for interactive use
  • python: Starts a python shell

Within the Python session you can now ask about the GPU resources available.

>>> from tensorflow.python.client import device_lib
>>> print(device_lib.list_local_devices()) 

Running a Batch Tensorflow Container Job Using The Nvidia Container Registry

First create a directory to hold the script, slurm batch file, and output for this example.

Next, create the file gpuinfo.py, which will use Tensorflow to get information about the GPU.

#!/usr/bin/env python
from tensorflow.python.client import device_lib

print(device_lib.list_local_devices())

Then the slurm batch script, batch.sh.

#!/bin/bash

#SBATCH --gres=gpu:1
#SBATCH --cpus-per-task=4
#SBATCH --mem-per-cpu=16G
#SBATCH --output=gpuinfo.out # So the container can find the CUDA libraries.
export LD_LIBRARY_PATH=/usr/local/cuda/lib64
srun \ --singularity-container=docker://nvcr.io/nvidia/tensorflow:25.02-tf2-py3 \ --singularity-args="--nv" \ python gpuinfo.py 

(The backslashes above allow you to continue a command across several lines. There are used here for readability.)

Finally, submit your job with sbatch batch.sh

Once the job has completed, the file gpuinfo.out will contain the output from gpuinfo.py.

 

Related Documentation