1. Configure the firewall
Properly configure the firewall filtering rules, otherwise it will cause NFS file system mount failure, NIS account authentication failure, MPIRun remote task instance delivery failure. In general, the compute cluster is used on the internal LAN, so you can shut down the firewall of all node servers without having to worry too much about security issues.
The relevant commands are as follows:
Service iptables Stop #或者/etc/init.d/iptables stop #以上这两种方法, immediate but restored after reboot #或者chkconfig iptables on # Permanently effective after reboot
2. Configure the cluster LAN IP and host name Mappings
For convenience, you may need to change the host name of the node to Node1, Node2, Node3 ... Unified form, the command to modify the host name is:
Hostname Node1 #将主机名修改成了node1, but will fail after the machine restarts
The permanent modification method is to modify the hostname line in the/etc/sysconfig/network file:
Hostname=node1
Modify the file/etc/hosts in each node to write the corresponding relationship between the host name and IP of each node in the cluster.
3. Configuring the NFS shared file system
Distributed parallel computing generally requires the application software environment and working directory environment of each node server to be consistent, which is particularly troublesome if it is configured on each node. Therefore, the use of NFS shared file system, the application software and working directory are deployed in the public directory, can be a good solution to this difficulty. All of the node servers can be accessed with only one deployment.
First install the NFS suite on all nodes , using the command:
Yum Install NFS
Then, select a hard disk to store a larger node server, such as NODE0, which is configured as server for NFS. As a first step, configure the/etc/exports file to write in this file:
/tmp node* (rw,no_root_squash) #将允许主机名为node * (* represents a wildcard) server mount the TMP directory in RW format.
Then execute the following command on the NFS server node:
Exportfs –ar #每次修改 the/etc/exports file to execute this command. Service NFS Start #启动nfs服务
Other node servers, configured as NFS client , need to execute the following command:
Service NFS Start #启动nfs服务mount –t NFS Node0:/share /share the /share directory of the #强nfs server (i.e. NODE0) to the local/share directory
By modifying the/etc/fstab file, you can automatically mount the boot, filling in a line in this file:
192.168.44.130:/share /share NFS defaults 0 0
Other Related commands:
SHOWMOUNT-E 192.168.0.30 #在客户端使用此命令检查NFS The export directory on the server showmount–a #一般在NFS server, Client machine that displays the NFS directory already on Mount Chkconfig --level for NFS on #配置开机自动启动nfs服务
4. Configuring NIS Services
Distributed parallel computing requires an account information environment on each node server. If user information is configured on each node server, the workload is too large and repetitive. This problem can be resolved by configuring a server for NIS, where all hosts can find user information on the NIS server for account authentication. NIS (Network Information Service) is also called YP (Yellow Pages, the phone book means).
First install the NIS related suite on all compute nodes , with the following command:
Yum Install Yp*yum Install xinetd
Modify/etc/xinetd.d/time on all nodes, make Disable=no, and then execute the following command:
Service xinetd restart #启动xinetd服务nisdomainname cluster #设置NIS域的名字, which was set up in cluster
To modify the/etc/sysconfig/network file on all nodes, add a line:
Nisdomain=cluster
Select a node server, such as NODE0, to configure as NIS server , configure the/etc/ypserv.conf file, add three rows:
127.0.0.0/255.255.255.0: *: *: none192.168.0.0/255.255.255.0: *: *: none* : * : * : Deny
192.168.0.0 represents the network segment, to be completed according to the specific network configuration.
Then create the account database and execute the command:
/usr/lib64/yp/ypinit–m #添加用户时, only need to increase on the NIS server, and then execute/usr/lib64/yp/ypinit–m update the database
Create the database, and then start the service Ypserv and YPPASSWDD:
Service Ypserv startservice yppasswdd startchkconfig --level ypserv on #开机启动服务chkconfig --level 35 Yppasswdd on #开机启动服务
Other compute node Servers are configured as NIS client , and first configure/etc/yp.conf to add two lines:
Nisdomain cluster #设置NIS域的名字, which was set to the hostname of the clusterypserver node0 #设置NIS Server, which was set to Node0
Configure/ETC/PASSWD add 1 rows:
+:::::: #注意冒号的数量
To configure/etc/nsswitch.conf, add the following 4 lines:
passwd: files nis nisplus shadow: Files NIS nisplus Group: files NIS nisplus hosts: files NIS DNS
Final execution Command:
Service ypbind Restart #启动服务chkconfig --level ypbind on #开机自动启动ypbind的方法
5. Configure ssh login without password
If the home directory is not configured in the shared file system, to Host B without password login Host A, configure host A, set up the. SSH directory in the host a user home directory, and then perform the following on the CD in:
ssh-keygen-t RSA #然后一直回车键, by default the generated key is saved in the. ssh/id_rsa file. CP id_rsa.pub Authorized_keys #这步完成后, under normal circumstances can be no password to log on to the machine. SCP Authorized_keys [email protected]:/homename/.ssh #把刚刚产生的authorized_keys文件拷一份到主机B上. chmod ~/.sshchmod 600 ~ /.ssh/authorized_keyschmod Authorized_keys Enter the. SSH directory of Host B to change the permissions of the Authorized_keys file
According to the above steps, can only let b no password access A, so in order to let each node in the cluster can not access each other password, you need to have no two nodes paired with each other according to the above step configuration, the workload is particularly large.
If the home directory is configured in a shared file system, it is much simpler to execute the following command, allowing each node in the cluster to have no password access to each other
SSH-KEYGEN-T RSA CP id_rsa.pub Authorized_keys chmod ~/.sshchmod
Also add stricthostkeychecking No to the/etc/ssh/ssh_config file so that the system will not be prompted to add the host to the known hosts on the first SSH login.
6. Install and configure the openmpi
Install the Openmpi version of the configuration as follows, if you use the Intel compiler, you need to install the Intel compiler, and then execute the command:
./configure CC=ICC CXX=ICC fc=ifort --prefix=/opt/openmpi/--enable-static--enable-mpi-cxx PS: Be sure to create a new directory as the installation directory
If the system comes with a default compiler, execute the following command:
./configure--prefix=/opt/openmpi/--enable-static--enable-mpi-cxx PS: Be sure to create a new directory as the installation directory
Finally compile the Openmpi command as follows:
Make all Install
7. Install and configure the Load Balancer system (optional)
If you want to increase the job scheduling function, you also need to install LSF and other software, the configuration of these software is more load, and generally small clusters are not necessary to use, so do not repeat here.
Openmpi+nfs+nis build a distributed computing cluster