Slurm Installation and Configuration Slurm Introduction
Slurm is a highly scalable cluster manager and job scheduling system that can be used in large compute node clusters. Slurm maintains a queue of work to be processed and manages the overall resource utilization of this work. Slurm distributes the job to a set of assigned nodes for execution.
Essentially, Slurm is a robust cluster manager, highly portable, scalable to a large node cluster, fault tolerant, and, more importantly, open source.
The architecture of Slurm can be consulted http://slurm.schedmd.com/
Installing Slurm
The installation here is an example of installing on CentOS6.5. And since Slurm is used in the cluster, we assume there are three identical versions of Linux, with machine names mycentos6x,mycentos6x1 and mycentos6x2, mycentos6x as control nodes.
Installing Munge
First Slurm need to use Munge to authenticate, so we have to install Munge first.
Download the installation package from the official website of Munge (Https://github.com/dun/munge) and use the munge-0.5.11.tar.bz2 file here. Run the following command with the root user
Compiling and installing the Munge package
# rpmbuild -tb --clean munge-0.5.11.tar.bz2# cd /root/rpmbuild/RPMS/x86_64# rpm --install munge*.rpm
In the process of compiling RPM package and installation may be prompted to require some third-party software packages, you can use "yum install-y xxx" To install, I installed the installation of the following package is the first
# yum install -y rpm-build rpmdevtools bzip2-devel openssl-devel zlib-devel
After the installation is complete, you need to modify the permissions of the following files
# chmod -Rf 700 /etc/munge# chmod -Rf 711 /var/lib/munge# chmod -Rf 700 /var/log/munge# chmod -Rf 0755 /var/run/munge
Also note is to check the/etc/munge/munge.key file, the file owner and group must be munge, otherwise the startup will fail.
Once the installation is complete, you can start the Munge service.
# /etc/init.d/munge start
Finally, you need to copy the/etc/munge/munge.key to the other two machines and make sure that the file permissions are the same as the owner.
Installing Slurm
First create the Slurm user
# useradd slurm# passwd slurm
Visit Slurm (http://slurm.schedmd.com/) to download the installation package, which uses the SLURM-14.11.8.TAR.BZ2 installation package.
Compiling and installing the Slurm package
# rpmbuild -ta --clean slurm-14.11.8.tar.bz2# cd /root/rpmbuild/RPMS/x86_64# rpm --install slurm*.rpm
During the compilation of RPM packages and installation prompts me to install the following packages
# yum install -y readline-devel pam-devel perl-DBI perl-ExtUtils-MakeMaker
After the installation is complete, modify the group of the following commands
# sudo chown slurm:slurm /var/spool
Here, the installation of Slurm is complete, but it does not start, we need to do a configuration to start the Slurm service and submit the job.
Configure Slurm
Enter the/etc/slurm/directory, copy the Slurm.conf.example file into slurm.conf, and edit the/etc/slurm/slurm.conf file
Here are the parts of my file that have been modified
controlmachine=mycentos6x controladdr=192.168. 145.100 slurmuser=Slurm selecttype=select/cons_res selecttypeparameters=cr_core slurmctlddebug=3 slurmctldlogfile=/var/log/slurmctld.log slurmddebug=3 slurmdlogfile=/var/log/slurmd.log nodename=mycentos6x,mycentos6x1,mycentos6x2 cpus=4 realmemory= sockets=2 corespersocket=2 threadspercore=1 state=idle partitionname=Control nodes=mycentos6x default=YES maxtime=infinite state=up partitionname=Compute nodes=mycentos6x1,mycentos6x2 default=NO maxtime=infinite state=up
Note: This configuration file needs to be deployed to each machine in the cluster.
Save the file, and then use the following command to start the Slurm service
# /etc/init.d/slurm start
Test
After launching the Slurm service, we can use some of the following commands to view the cluster status and submit the job
# sinfoPARTITION AVAIL TIMELIMIT NODES STATE NODELISTcontrol* up infinite 1 idle mycentos6xcompute up infinite 2 idle mycentos6x1,mycentos6x2
# SControl Show Slurm reportsActive Steps =NONEActual CPUs =2Actual Boards =1Actual sockets =1Actual cores =2Actual threads per core =1Actual Real memory =1464Mbactual Temp DiskSpace=29644Mbboot Time= -- -- AT09: -: theHostname = Mycentos6xlast slurmctld msg Time= -- -- AT09: -:Panax NotoginsengSlurmd PID =27755Slurmd Debug =3Slurmd Logfile =/var/Log/slurmd.LogVersion =14.11. 8
# scontrol show config# scontrol show partition# scontrol show node# scontrol show jobs
Submit Job
# srun hostnamemycentos6x
# srun -N 3 -l hostname0: mycentos6x1: mycentos6x12: mycentos6x2
# srun sleep 60 &
Query job
# squeue -a JOBID PARTITION NAME ST TIME NODES NODELIST(REASON) 77 debug sleep kongxx R 0:06 1 mycentos6x
Cancel Job
# scancel <job_id>
Reference:
slurm:http://slurm.schedmd.com/
Munge:https://github.com/dun/munge
Please indicate this address in the form of a link.
This address: http://blog.csdn.net/kongxx/article/details/48173829
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Slurm Installation and Configuration