Slurm Installation and Configuration

Source: Internet
Author: User
Tags rpmbuild

Slurm Installation and Configuration Slurm Introduction

Slurm is a highly scalable cluster manager and job scheduling system that can be used in large compute node clusters. Slurm maintains a queue of work to be processed and manages the overall resource utilization of this work. Slurm distributes the job to a set of assigned nodes for execution.

Essentially, Slurm is a robust cluster manager, highly portable, scalable to a large node cluster, fault tolerant, and, more importantly, open source.

The architecture of Slurm can be consulted http://slurm.schedmd.com/

Installing Slurm

The installation here is an example of installing on CentOS6.5. And since Slurm is used in the cluster, we assume there are three identical versions of Linux, with machine names mycentos6x,mycentos6x1 and mycentos6x2, mycentos6x as control nodes.

Installing Munge

First Slurm need to use Munge to authenticate, so we have to install Munge first.

Download the installation package from the official website of Munge (Https://github.com/dun/munge) and use the munge-0.5.11.tar.bz2 file here. Run the following command with the root user

Compiling and installing the Munge package

# rpmbuild -tb --clean munge-0.5.11.tar.bz2# cd /root/rpmbuild/RPMS/x86_64# rpm --install munge*.rpm

In the process of compiling RPM package and installation may be prompted to require some third-party software packages, you can use "yum install-y xxx" To install, I installed the installation of the following package is the first

# yum install -y rpm-build rpmdevtools bzip2-devel openssl-devel zlib-devel

After the installation is complete, you need to modify the permissions of the following files

# chmod -Rf 700 /etc/munge# chmod -Rf 711 /var/lib/munge# chmod -Rf 700 /var/log/munge# chmod -Rf 0755 /var/run/munge

Also note is to check the/etc/munge/munge.key file, the file owner and group must be munge, otherwise the startup will fail.

Once the installation is complete, you can start the Munge service.

# /etc/init.d/munge start

Finally, you need to copy the/etc/munge/munge.key to the other two machines and make sure that the file permissions are the same as the owner.

Installing Slurm

First create the Slurm user

# useradd slurm# passwd slurm

Visit Slurm (http://slurm.schedmd.com/) to download the installation package, which uses the SLURM-14.11.8.TAR.BZ2 installation package.

Compiling and installing the Slurm package

# rpmbuild -ta --clean slurm-14.11.8.tar.bz2# cd /root/rpmbuild/RPMS/x86_64# rpm --install slurm*.rpm

During the compilation of RPM packages and installation prompts me to install the following packages

# yum install -y readline-devel pam-devel perl-DBI perl-ExtUtils-MakeMaker

After the installation is complete, modify the group of the following commands

# sudo chown slurm:slurm /var/spool

Here, the installation of Slurm is complete, but it does not start, we need to do a configuration to start the Slurm service and submit the job.

Configure Slurm

Enter the/etc/slurm/directory, copy the Slurm.conf.example file into slurm.conf, and edit the/etc/slurm/slurm.conf file
Here are the parts of my file that have been modified

controlmachine=mycentos6x controladdr=192.168.  145.100  slurmuser=Slurm selecttype=select/cons_res selecttypeparameters=cr_core slurmctlddebug=3 slurmctldlogfile=/var/log/slurmctld.log slurmddebug=3 slurmdlogfile=/var/log/slurmd.log nodename=mycentos6x,mycentos6x1,mycentos6x2 cpus=4 realmemory= sockets=2 corespersocket=2 threadspercore=1 state=idle partitionname=Control nodes=mycentos6x default=YES maxtime=infinite state=up partitionname=Compute nodes=mycentos6x1,mycentos6x2 default=NO maxtime=infinite state=up 

Note: This configuration file needs to be deployed to each machine in the cluster.

Save the file, and then use the following command to start the Slurm service

# /etc/init.d/slurm start
Test

After launching the Slurm service, we can use some of the following commands to view the cluster status and submit the job

# sinfoPARTITION AVAIL  TIMELIMIT  NODES  STATE NODELISTcontrol*     up   infinite      1   idle mycentos6xcompute      up   infinite      2   idle mycentos6x1,mycentos6x2
# SControl Show Slurm reportsActive Steps =NONEActual CPUs =2Actual Boards =1Actual sockets =1Actual cores =2Actual threads per core =1Actual Real memory =1464Mbactual Temp DiskSpace=29644Mbboot Time= -- -- AT09: -: theHostname = Mycentos6xlast slurmctld msg Time= -- -- AT09: -:Panax NotoginsengSlurmd PID =27755Slurmd Debug =3Slurmd Logfile =/var/Log/slurmd.LogVersion =14.11. 8
# scontrol show config# scontrol show partition# scontrol show node# scontrol show jobs

Submit Job

# srun hostnamemycentos6x
# srun -N 3 -l hostname0: mycentos6x1: mycentos6x12: mycentos6x2
# srun sleep 60 &

Query job

# squeue -a             JOBID PARTITION     NAME     ST       TIME  NODES NODELIST(REASON)                77     debug    sleep   kongxx  R       0:06      1 mycentos6x

Cancel Job

# scancel <job_id>
Reference:

slurm:http://slurm.schedmd.com/
Munge:https://github.com/dun/munge

Please indicate this address in the form of a link.
This address: http://blog.csdn.net/kongxx/article/details/48173829

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Slurm Installation and Configuration

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.