Slurm Installation and Configuration

Source: Internet
Author: User
Tags bz2 chmod file permissions rpmbuild
Slurm Installation and configuration Slurm Introduction

Slurm is a highly scalable cluster manager and job scheduling system that can be used in large compute node clusters. Slurm maintains a queue of work to be processed and manages the overall resource utilization of this work. Slurm distributes the job to a set of assigned nodes to execute.

Essentially, Slurm is a robust cluster manager that is highly portable, scalable to large node clusters, fault tolerant, and, more importantly, open source.

The architecture for Slurm can refer to http://slurm.schedmd.com/ installation Slurm

The installation here is an example of installing on a CentOS6.5. And because Slurm is used in the cluster, we assume that there are three identical versions of Linux, the machine name is mycentos6x,mycentos6x1 and mycentos6x2, where mycentos6x as the control node. Install Munge

First Slurm need to use Munge to authenticate, so we have to install Munge first.

Download the installation package from the official website of Munge (Https://github.com/dun/munge), where munge-0.5.11.tar.bz2 files are used. Run the following command using the root user

Compiling and installing munge packages

# RPMBUILD-TB--clean munge-0.5.11.tar.bz2
# cd/root/rpmbuild/rpms/x86_64
# rpm--install munge*.rpm

During the compilation of the RPM package and installation may be prompted to require some Third-party software packages, at this time you can use the "Yum install-y xxx" To install, I installed the following is the first installation of the package

# yum install-y rpm-build rpmdevtools bzip2-devel openssl-devel zlib-devel

After the installation is complete, you need to modify the permissions for the following files

# CHMOD-RF 700/etc/munge
# chmod-rf 711/var/lib/munge
# chmod-rf 700/var/log/munge
# CHMOD-RF 0755/var /run/munge

Also need to note is check the/etc/munge/munge.key file, the file owner and group must be munge, otherwise startup will fail.

Once the installation is complete, you can start the Munge service.

#/etc/init.d/munge Start

Finally, you need to copy the/etc/munge/munge.key to the other two machines and make sure that the file permissions are the same as the owner. Install Slurm

First create Slurm User

# useradd Slurm
# passwd Slurm

Visit the Slurm (http://slurm.schedmd.com/) Download installation package, where you use the SLURM-14.11.8.TAR.BZ2 installation package.

Compiling and installing Slurm packages

# Rpmbuild-ta--clean slurm-14.11.8.tar.bz2
# cd/root/rpmbuild/rpms/x86_64
# rpm--install slurm*.rpm

Prompted me to install the following package during the compilation of the RPM package and installation

# yum install-y readline-devel pam-devel perl-dbi perl-extutils-makemaker

After the installation is complete, modify the group of the following commands

# sudo chown slurm:slurm/var/spool

Here, the installation of Slurm is complete, but it can't be started, we still need to do the configuration to start the Slurm service and submit the job. Configure Slurm

Enter the/etc/slurm/directory, copy slurm.conf.example file into slurm.conf, and edit/etc/slurm/slurm.conf file
Here are some of the changes in my file

controlmachine=mycentos6x
controladdr=192.168.145.100
Slurmuser=slurm
selecttype=select/cons_res
Selecttypeparameters=cr_core
Slurmctlddebug=3
slurmctldlogfile=/var/log/slurmctld.log
slurmddebug=3
slurmdlogfile=/var/log/ Slurmd.log
nodename=mycentos6x,mycentos6x1,mycentos6x2 cpus=4 realmemory=500 sockets=2 corespersocket=2 threadspercore=1 state=idle
partitionname=control nodes=mycentos6x default=yes
maxtime=infinite Partitionname=compute nodes=mycentos6x1,mycentos6x2 default=no maxtime=infinite State=UP

Note: This configuration file needs to be deployed to every machine in the cluster.

Save the file, and then start the Slurm service with the following command

#/etc/init.d/slurm Start
Test

After starting the Slurm service, we can use some of the following commands to view the cluster status and submit the job

# sinfo
PARTITION avail  timelimit  NODES State  nodelist
control*     up   infinite      1   idle mycentos6x
compute      up   infinite      2   idle mycentos6x1,mycentos6x2
# SControl Show Slurm reports
Active Steps             = NONE
Actual CPUs              = 2
Actual boards            = 1
Actual Sockets           = 1
Actual cores             = 2
Actual threads per core  = 1
Actual real memory       = 1 464 MB
Actual temp Disk   = 29644 MB
Boot Time                = 2015-07-22t09:50:34
Hostname                 = MYCENTOS6X last
slurmctld msg time  = 2015-07-22t09:50:37
slurmd PID               = 27755 Slurmd
Debug             = 3
SLURMD Logfile           =/var/log/slurmd.log
Version                  = 14.11.8
# scontrol show config
# SControl show Partition
# SControl show Node
# SControl Show Jobs

Submit Job

# srun hostname
mycentos6x
# SRUN-N 3-l hostname
0:mycentos6x
1:mycentos6x1
2:mycentos6x2
# Srun Sleep &

Query job

# squeue-a
             jobid PARTITION     NAME     USER ST       time  NODES nodelist (REASON)
                \     Debug    Sleep   kongxx  R       0:06      1 mycentos6x

Cancel Job

# Scancel <job_id>
Reference:

slurm:http://slurm.schedmd.com/
Munge:https://github.com/dun/munge

Reprint please indicate this address in the form of link
This article address: http://blog.csdn.net/kongxx/article/details/48173829

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.