Slurm accounting association

Setup Slurm to use slurmdb and setup slurmdbd according to accounting guide.
Enforce accounting association.

# grep -e AccountingStorage slurm.conf

Once configured, accounts need to be properly associated in SlurmDBD in order to submit jobs.

Setup account association:

sacctmgr: add cluster Name=shannon
sacctmgr: add account Name=em
sacctmgr: add user Name=elton Cluster=shannon account=em
sacctmgr: list association
   Cluster    Account       User  Partition     Share GrpJobs       GrpTRES 
---------- ---------- ---------- ---------- --------- ------- ------------- 
   shannon       root                               1                       
   shannon       root       root                    1                       
   shannon         em                               1                       
   shannon         em      elton                    1                       

Setup Slurm PAM plugin on CentOS 7

Update 2017-07-05: Instructions for disabling
Update 2017-06-06: Instructions for using

Setting this up with help block users from casually accessing the compute nodes.

# ssh alicia@worker1
alicia@compute632's password: 
Access denied: user alicia (uid=1450) has no active jobs on this node.
Connection closed by

First make sure Slurm’s PAM module has been installed, it’s supplied by slurm-pam_slurm package:

# ls -l /usr/lib64/security/ 
-rwxr-xr-x. 1 root root 26368 May 25 14:27 /usr/lib64/security/

Enable PAM module in Slurm:

# grep UsePAM /etc/slurm/slurm.conf

We have to bypass if using, otherwise processes would not be adopted into the correct cgroup.
Prepare password-auth PAM stack that does not include

# grep -v /etc/pam.d/password-auth > /etc/pam.d/password-auth-no-systemd

Configure PAM for sshd (/etc/pam.d/sshd), see example below:

auth	   required
auth       substack     password-auth
auth       include      postlogin
-auth      optional prepare
account    required
# - PAM config for Slurm - BEGIN
account    sufficient
account    required
# - PAM config for Slurm - END
account    include      password-auth
password   include      password-auth
session    required close
session    required
session    required open env_params
session    required
session    optional force revoke
session    include      password-auth
# session    include      password-auth-no-systemd
# Use this instead if you are using
session    include      postlogin
-session   optional prepare

“account sufficient" is used to allow users from specific groups into compute nodes regardless of whether they have jobs on the node, useful for letting admins into the nodes.

Configure pam_access module to always allow admin group (hpcadmins); you may have other rules in access.conf, be careful of their ordering, only first matched rule applies:

# cat /etc/security/access.conf
+ : root (hpcadmins) : ALL
- : ALL : ALL works similar like, but with capability of adopting ssh sessions into job so accounting would work “more correctly" (it’s still screwed if there are multiple jobs from the same user on the same node; but this is already much better). This works by changing the PAM module name in pam.d/sshd.
However I encountered error when launching jobs via mpirun and srun.

Jun  6 13:48:48 cpt15112 slurmd[2039]: launch task 21.4 request from 525.526@ (port 64138)
Jun  6 13:48:48 cpt15112 slurmstepd[23893]: error: pam_setcred ESTABLISH: Failure setting user credentials
Jun  6 13:48:48 cpt15112 slurmstepd[23893]: error: error in pam_setup
Jun  6 13:48:48 cpt15112 slurmstepd[23893]: error: job_manager exiting abnormally, rc = 4020
Jun  6 13:48:48 cpt15112 slurmstepd[23893]: done with job

Turns out you have to add a pam.d file for slurm…

# cat /etc/pam.d/slurm
auth    required
account required
session required

Build & install notes for BLCR & SLURM on CentOS 7.3

Update 2017-06-02: Previous compilation steps caused blcr support to be left out during rpmbuild process. Section updated to reflect what’s needed for Slurm to support blcr

Install build tools & dependencies; EPEL required for munge-devel:

# yum -y groupinstall "Development Tools"
# yum -y install epel-release
# yum -y install freeipmi-devel gtk2-devel hwloc-devel libibmad-devel libibumad-devel lz4-devel mariadb-devel munge-devel ncurses-devel numactl-devel openssl-devel pam-devel perl-ExtUtils-MakeMaker readline-devel rrdtool-devel

BLCR (optional manual dependency):

Official web site provides release up to 0.8.5. Later versions that works with CentOS 7 (kernel 3.10.0) can be found in beta and snapshot distribution directory.

# yum -y install glibc-devel.i686 kernel-devel-`uname -r` libgcc.i686
# wget
# tar zxvf blcr-0.8.6_b4.tar.gz

acinclude.m4 needs to be patched because ./configure ignores weak symbols in System map and it fails to locate a working `rpmbuild`

You may see what changes are needed in this GitHub commit. (Ah yes… You may actually clone / download ZIP and build from there. One more copy of the diff is backed up at this paste.)
After patching acinclude.m4, prepare build environment by running

# ./
# ./configure
# make rpms

Upon successful build, you’ll have RPMs at rpm/RPMS/x86_64; install all RPMs before we go on to test it:

# service blcr start
# /usr/libexec/blcr-testsuite/RUN_ME

You have successfully installed BLCR once all tests pass.
The kernel modules for blcr shall load automatically by its init script on next.

Build SLURM from tar ball:

Before we begin, we need to add definition in .rpmmacros so that Slurm will build with blcr support:

# grep with_blcr ~/.rpmmacros 
%_with_blcr %_lib

Download latest tar ball from official web site:

# wget
# rpmbuild -ta slurm-17.02.3.tar.bz2
# cd rpmbuild/RPMS/x86_64/

Install RPMs:

  • slurm
  • slurm-devel
  • slurm-munge
  • slurm-perlapi
  • slurm-plugins
  • slurm-sjobexit (only prior to version 17.02)
  • slurm-sjstat (only prior to version 17.02)
  • slurm-torque

See RPMS INSTALLED section for more details.


Generate slurm.conf using web configurator.
Note StateSaveLocation defaults to /var/spool; this is not ideal since user slurm needs to write to the directory. I personally use /var/spool/slurmctld
For details:man slurm.conf

And a few manual steps before you have a working Slurm installation:

  1. Populate /etc/munge/munge.key on all nodes, enable and start service; key should be owned by munge:munge, mode 0600
  2. Disable or open necessary ports (6817-18) on firewalld
  3. CgroupAutomount=yes in /etc/slurm/cgroup.conf
  4. Enable and start services for slurmctld & slurmd

Basic slurm.conf example:

NodeName=worker[1-8] Sockets=4 CoresPerSocket=1 ThreadsPerCore=1 State=UNKNOWN 
PartitionName=debug Nodes=worker[1-8] Default=YES MaxTime=INFINITE State=UP

See full example of above config file as generated by configurator.html.

Build SLURM from GitHub repo (advanced / incomplete):

# git clone

Switch to release tag:

# cd slurm
# git checkout slurm-17-02-3-1

Configure for build:

# ./configure --enable-pam

You’ll have to populate some values in slurm.spec before it can be build using rpmbuild.