Installing and configuring MPI parallel environments on networked multiple machines
 
Linux system installation requirements are the same as the previous stand-alone environment. In addition, you should configure the TCP/IP network connection before you begin the following steps. To avoid any additional hassle, do not turn on any firewall settings when you configure your network.
In addition, to facilitate access to each other, the host name of all machines is placed in the/etc/hosts file. You can use the same/etc/hosts file on all machines, which contains the following form:
127.0.0.1 localhost.localdomain localhost
10.10.10.1 Node1.mydomain Node1
10.10.10.2 Node2.mydomain Node2
... ... ...
10.10.10.N Noden.mydomain Noden
(Replace the host name and IP address in the actual case).
The method described below uses the NIS (Network information Service, also known as Sun Yellow Pages) to manage user accounts and share user Directories using NFS (Network File System).
First select a machine as a Server for NIS and NFS, which we call the service node or the primary node machine, while the other machines are called from the node machine. The main node and the configuration from the node machine is different, the following will be described separately.
1 Setting up NFS
Main node Machine
To create a directory:
Mkdir-p/home/local
Link/usr/local to/home/local:
/bin/rm-rf/usr/local
Ln-s/home/local/usr/.
(Note: If a useful file is installed in a previous/usr/local, it should be copied or moved before executing the above command
Go).
Verify that the Nfs-utils package is installed on the main node machine. To turn on NFS services:
/sbin/chkconfig NFS On
/sbin/chkconfig Nfslock on
/etc/init.d/nfslock restart
/etc/init.d/nfs restart
In file/etc/exports, add the following line:
/home * (Rw,no_root_squash)
It outputs the/home directory to all machines. (Note: For security reasons, you can restrict the catalog output only to the specified node, and change the No_root_squash to Root_squash.) Please use the command Man 5 exports to view the relevant parameters.
Output specified directory (/home):
Exportfs-a
(You can also reboot the system).
To create a directory from a node machine:
Mkdir-p/home
In file/etc/fstab, add the following line:
< main node machine name >:/home/home NFS Defaults 0 0
(Replace the < primary node machine name > with the host name or IP address of the primary node machine).
To run the command:
/sbin/chkconfig Netfs on
It enables the system to automatically hook up the/home directory on the main node machine when it is started.
To run the command:
Mount/home
(You can also reboot the system).
Link/usr/local to/home/local:
/bin/rm-rf/usr/local
Ln-s/home/local/usr/.
(Note: If there is a useful file in the previous/usr/local, it should be copied or removed before executing the above command.)
All of the above actions must be performed as root. When this step is complete, the contents of the/home and/usr/local directories on all nodes should be the same. Check for hooks on the available command df from the node.
Further information on NFS allows you to search for keywords such as "NFS" online.
2 Setting up NIS
The following description assumes ' CLUSTER ' as the NIS domain name.
Main node Machine
Verify that the following packages are installed:
Ypserv, Ypbind, Yp-tools
In file/etc/sysconfig/network, add the following line:
Nisdomain=cluster
To turn on NIS services:
/sbin/chkconfig Ypserv on
/etc/init.d/ypserv start
Initializing NIS databases:
/usr/lib/yp/ypinit-m
Press CTRL-D when the program runs, then press Y and enter. This command will generate the NIS database. Can ignore
No rule to make target ...
Error messages like that.
To open the NIS client program:
/sbin/chkconfig Ypbind on
/etc/init.d/ypbind start
Verifying NIS settings
– the command "Ypwhich" should display the primary node computer hostname.
– the command "Ypcat passwd" should display a user account (on the primary node).
From the node-point machine
Verify that the following packages are installed:
Ypbind, Yp-tools
In file/etc/sysconfig/network, add the following line:
Nisdomain=cluster
To open the NIS client program:
/sbin/chkconfig Ypbind on
/etc/init.d/ypbind start
Verifying NIS settings
– the command "Ypwhich" should display the primary node computer hostname.
– the command "Ypcat passwd" should display the user account on the primary node machine.
In order to be able to log on with NIS users, you also need to modify the/etc/nsswitch.conf file to include the following settings:
Passwd:files NIS
Shadow:files NIS
Group:files NIS
Hosts:files NIS DNS
Note that the user information should not be defined repeatedly from the node, that is, all NIS users should be available from the node (the command
"Ypcat passwd" appears) removed from file/etc/passwd and/etc/shadow, all NIS groups (available commands
"Ypcat Group" is displayed) removed from the file/etc/group.
After the NIS configuration is complete, creating a new user account is only done on the primary node (note that the user's home directory is placed under/home), and then runs the command "CD/VAR/YP;" Make "can be.
If you modify a user account information on the primary node, you should also run the above command once to refresh the NIS database.
NIS users cannot modify the user password from the "passwd" command on the node, but must be modified with the "YPPASSWD" command.
All of the above actions must be performed as root.
Further information about NIS allows you to search for keywords such as "NIS" online.
3 Setting Rsh
Confirm that the packages listed in 3.4.1 are installed.
Add the host name to the file/ETC/HOSTS.EQUIV.
Open RSH Service:
/sbin/chkconfig rsh on
Note: To enable the root user to execute remote commands with rsh, copy the/etc/hosts.equiv file/root/.rhosts,
and add "Rsh" to the file/etc/securetty.
The above operation must be done as root on all node machines.
Once you have completed the above settings, you should be able to execute remote commands on all nodes, including yourself, on any one of the node machines. You can test the following methods:
Log on to a node with an NIS user and run the command:
RSH another node machine name/bin/hostname
If configured correctly this command should display the opponent host name. If you make an error, you can view the error message in the/var/log/messages file.
Note that the shell initialization files (. CSHRC,. Profile,. BASHRC, etc.) are not exported to stdout and stderr any
Information, that is, the output of the command should not have any other content other than the host name. Failure to do so may affect the start of the MPI process.
4 Installation of Mpich
The installation of Mpich is exactly the same as the installation under a single machine, only on the main node, as all nodes are/usr/local
The directory is shared. In addition, the file needs to be
/etc/profile.d/mpich.sh
/etc/profile.d/mpich.csh
Copy to all node machines.
5 compiling and running of Mpich program
The Mpich program can be compiled on any node using MPICC (c), mpif77 (Fortran), MPICC (c + +), and so on. They are the shell scripts provided by Mpich, in the same way as ordinary C/c++/fortran compilers.
The way the Mpich program runs depends on the underlying driver that is selected when compiling the Mpich system. The compilation described here uses CH_P4 as the underlying driver, in which case there are two options for running the number of nodes and processes used by a MPI program, namely:
1. Mpirun-machinefile filename-NP 4 MPI program name [MPI program parameters]
The name of the node you want to use is listed in the file file name, one line:
 
Node Machine Name 1
 
Node Machine Name 2
 
Node Machine Name 3
 
Node Machine name 4
 
MPIRun will start the specified number of processes on the given node (4 here). When the number of processes is greater than the number of nodes, MPIRun initiates two or more processes on some of the node machines. The command "Mpirun-help" can display a brief usage description of a mpirun.
2./mpi program name-P4PG filename [MPI program parameters]
This method can precisely control the number of MPI processes and process numbers that are started on each node, and allows different executables to be started on different nodes (for parallel programs in Master/slave mode). The file file name lists the program names that are started on each node in the following format:
Node machine Name 1 0 executable file name 1
Node machine Name 2 1 executable file name 2
Node machine name 3 1 executable file name 3
... ...
Node machine name n 1 executable file name n
Where the node machine name 1 must be the node where the command was run, executable file name 1 must be named the same file as the MPI program on the command line. All executable file names must use absolute paths (such as/HOME/ZLB/TEST/CPI). Typically, all executable file names are the same. When the same node machine name appears multiple times, it means that multiple processes are started on the node machine.
For example, suppose the user has a compiled MPI program CPI in the/home/zlb/test directory of the node Node1. Create a file named P4file in the directory that contains the following:
Node1 0/HOME/ZLB/TEST/CPI
Node2 1/HOME/ZLB/TEST/CPI
Node1 1/HOME/ZLB/TEST/CPI
Node2 1/HOME/ZLB/TEST/CPI
The command "./CPI-P4PG P4file" will run four processes on Node1, Node2, where process 0 and process 2 are on Node1, process 1 and Process 3 are on Node2.