Openmpi+nfs+nis build a distributed computing cluster

Source: Internet
Author: User
Tags node server

1. Configure the firewall

Properly configure the firewall filtering rules, otherwise it will cause NFS file system mount failure, NIS account authentication failure, MPIRun remote task instance delivery failure. In general, the compute cluster is used on the internal LAN, so you can shut down the firewall of all node servers without having to worry too much about security issues.

The relevant commands are as follows:

Service iptables Stop        #或者/etc/init.d/iptables stop    #以上这两种方法, immediate but restored after reboot                                    #或者chkconfig iptables on        # Permanently effective after reboot

2. Configure the cluster LAN IP and host name Mappings

For convenience, you may need to change the host name of the node to Node1, Node2, Node3 ... Unified form, the command to modify the host name is:

Hostname Node1                     #将主机名修改成了node1, but will fail after the machine restarts

The permanent modification method is to modify the hostname line in the/etc/sysconfig/network file:

Hostname=node1

Modify the file/etc/hosts in each node to write the corresponding relationship between the host name and IP of each node in the cluster.

3. Configuring the NFS shared file system

Distributed parallel computing generally requires the application software environment and working directory environment of each node server to be consistent, which is particularly troublesome if it is configured on each node. Therefore, the use of NFS shared file system, the application software and working directory are deployed in the public directory, can be a good solution to this difficulty. All of the node servers can be accessed with only one deployment.

First install the NFS suite on all nodes , using the command:

Yum Install NFS

Then, select a hard disk to store a larger node server, such as NODE0, which is configured as server for NFS. As a first step, configure the/etc/exports file to write in this file:

/tmp        node* (rw,no_root_squash)                  #将允许主机名为node * (* represents a wildcard) server mount the TMP directory in RW format.

Then execute the following command on the NFS server node:

Exportfs  –ar    #每次修改 the/etc/exports file to execute this command. Service NFS Start   #启动nfs服务

Other node servers, configured as NFS client , need to execute the following command:

Service NFS Start    #启动nfs服务mount –t NFS Node0:/share  /share the           /share directory of the #强nfs server (i.e. NODE0) to the local/share directory

By modifying the/etc/fstab file, you can automatically mount the boot, filling in a line in this file:

192.168.44.130:/share   /share                  NFS     defaults        0 0

Other Related commands:

SHOWMOUNT-E 192.168.0.30         #在客户端使用此命令检查NFS The export directory on the server showmount–a                                   #一般在NFS server, Client machine that displays the NFS directory already on Mount Chkconfig  --level for NFS on     #配置开机自动启动nfs服务

4. Configuring NIS Services

Distributed parallel computing requires an account information environment on each node server. If user information is configured on each node server, the workload is too large and repetitive. This problem can be resolved by configuring a server for NIS, where all hosts can find user information on the NIS server for account authentication. NIS (Network Information Service) is also called YP (Yellow Pages, the phone book means).

First install the NIS related suite on all compute nodes , with the following command:

Yum Install Yp*yum Install xinetd

Modify/etc/xinetd.d/time on all nodes, make Disable=no, and then execute the following command:

Service xinetd restart                     #启动xinetd服务nisdomainname cluster                  #设置NIS域的名字, which was set up in cluster

To modify the/etc/sysconfig/network file on all nodes, add a line:

Nisdomain=cluster

Select a node server, such as NODE0, to configure as NIS server , configure the/etc/ypserv.conf file, add three rows:

127.0.0.0/255.255.255.0: *: *:       none192.168.0.0/255.255.255.0: *: *:                none*                          : *       : *                : Deny

192.168.0.0 represents the network segment, to be completed according to the specific network configuration.

Then create the account database and execute the command:

/usr/lib64/yp/ypinit–m     #添加用户时, only need to increase on the NIS server, and then execute/usr/lib64/yp/ypinit–m update the database

Create the database, and then start the service Ypserv and YPPASSWDD:

Service Ypserv startservice yppasswdd startchkconfig  --level ypserv on                    #开机启动服务chkconfig  --level 35 Yppasswdd on              #开机启动服务

Other compute node Servers are configured as NIS client , and first configure/etc/yp.conf to add two lines:

Nisdomain cluster #设置NIS域的名字, which was set to the hostname of the  clusterypserver node0     #设置NIS Server, which was set to Node0

Configure/ETC/PASSWD add 1 rows:

+::::::                 #注意冒号的数量

To configure/etc/nsswitch.conf, add the following 4 lines:

passwd:     files nis nisplus shadow: Files     NIS nisplus Group:      files NIS nisplus hosts:      files NIS DNS

Final execution Command:

Service ypbind Restart          #启动服务chkconfig  --level ypbind on  #开机自动启动ypbind的方法

5. Configure ssh login without password

If the home directory is not configured in the shared file system, to Host B without password login Host A, configure host A, set up the. SSH directory in the host a user home directory, and then perform the following on the CD in:

ssh-keygen-t RSA                                        #然后一直回车键, by default the generated key is saved in the. ssh/id_rsa file. CP id_rsa.pub Authorized_keys            #这步完成后, under normal circumstances can be no password to log on to the machine. SCP Authorized_keys [email protected]:/homename/.ssh  #把刚刚产生的authorized_keys文件拷一份到主机B上. chmod ~/.sshchmod 600 ~ /.ssh/authorized_keyschmod Authorized_keys   Enter the. SSH directory of Host B to change the permissions of the Authorized_keys file

According to the above steps, can only let b no password access A, so in order to let each node in the cluster can not access each other password, you need to have no two nodes paired with each other according to the above step configuration, the workload is particularly large.

If the home directory is configured in a shared file system, it is much simpler to execute the following command, allowing each node in the cluster to have no password access to each other

SSH-KEYGEN-T RSA                                       CP id_rsa.pub Authorized_keys           chmod ~/.sshchmod

Also add stricthostkeychecking No to the/etc/ssh/ssh_config file so that the system will not be prompted to add the host to the known hosts on the first SSH login.

6. Install and configure the openmpi

Install the Openmpi version of the configuration as follows, if you use the Intel compiler, you need to install the Intel compiler, and then execute the command:

./configure CC=ICC CXX=ICC fc=ifort  --prefix=/opt/openmpi/--enable-static--enable-mpi-cxx   PS: Be sure to create a new directory as the installation directory

If the system comes with a default compiler, execute the following command:

./configure--prefix=/opt/openmpi/--enable-static--enable-mpi-cxx   PS: Be sure to create a new directory as the installation directory

Finally compile the Openmpi command as follows:

Make all Install

7. Install and configure the Load Balancer system (optional)

If you want to increase the job scheduling function, you also need to install LSF and other software, the configuration of these software is more load, and generally small clusters are not necessary to use, so do not repeat here.

Openmpi+nfs+nis build a distributed computing cluster

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.