the establishment of MPI parallel computing environment first, the preparation work before the configuration
Suppose the cluster is 3 nodes. 1. Install the Linux (CentOS 5.2) system and ensure that the sshd service of each node can start normally.
Instead of using the real 3 machines, the author uses a virtual machine (VMware Workstation6.5) to simulate multiple Linux systems on a machine equipped with an XP system.
Precautions:
(1) because the author uses mpich2-1.3.2p1.tar.gz, this version of GCC, autoconf and other software package version requirements higher, in order to avoid errors, try to install the latest Linux system.
(2) in the use of VMware Workstation installation of Linux systems may encounter the problem of incompatible disk types, the author of the version of this problem, the main solution is as follows:
A. Start workstation choose to create a custom virtual machine;
B.SCSI Adapter Type Select LSI Logic (the Linux kernel is selected under 2.4 buslogic);
C. Select the Virtual disk type (IDE).
(3) Install VMware Workstation tools.
After Linux system startup, select Menu Bar-Virtual machine-install VMware Tools, follow the prompts to copy the appropriate installation package to the directory you want to execute the command:
Tar zxvf vmware-tools.tar.gz
CD Vmware-tools (enter extract directory)
./install.pl (depending on the version, the name is not necessarily the same, readers note that the name is similar to the execution) 2. Assign IP addresses to each node, and IP addresses are best allocated continuously, such as 192.168.1.2, 192.168.1.3, 192.168.1.4, ......。 (Do not assign 192.168.1.1) 3. Configure/etc/hosts file, which can implement IP address and machine correspondence resolution, all nodes of the file should be modified according to the following content:
192.168.1.2 Node1
192.168.1.3 Node2
192.168.1.4 Node3
Through the above configuration, the nodes can access each other through the machine name of each node. For example, you can test by pinging the Node2.
Precautions:
The test must be done under the condition that the Linux firewall is turned off, or it may fail.
Permanent entry into force: chkconfig iptables On/off (restart effective)
Immediate effect: Service iptables start/stop (reboot failure) Two, Mount NFS file system
Because the Mpich installation directory and user executable program need to save copies of all nodes in parallel computing, and the directory to each other, one node at a time the replication of a node is very troublesome, the use of NFS file system can realize all node content and the main node content synchronization updates, and automatically realize the directory of the corresponding. The NFS file system allows all machines to access files saved on the server in the same way as access to local files. In general, we will configure the Mpich installation directory and the parallel program Storage directory for NFS shared directories, which eliminates the hassle of copying files to each node and greatly improves productivity.
Examples of configuration methods for NFS file systems are as follows (assuming that the NFS server IP is 192.168.1.2, the configuration needs to be done under root). 1. Server-side configuration method (the following configuration is only done on the master node).
(1)/etc/exports file configuration
Add the following lines to the file/etc/exports:
/usr/cluster 192.168.1.3 (Rw,sync,no_root_squash,no_subtree_check)
/usr/cluster 192.168.1.4 (Rw,sync,no_root_squash,no_subtree_check)
These lines of text indicate that the NFS server shares its/usr/cluster directory (the directory must exist) to the 2 nodes of the IP address for 192.168.1.3,192.168.1.4, and that the nodes have the appropriate permissions (to query related documents). If more nodes can be filled in this way.
(2) Start NFS Service
Starting an NFS service requires only the following two commands:
Service Portmap Start
Note: In the latest kernel, the NFS daemon is changed to Rpcbind, and as a new kernel, the command to start the NFS daemon is Service rpcbind start.
Service NFS Start
The server to which this IP is 192.168.1.2 can already provide a file share of the/usr/cluster directory to the other two nodes. 2. Client Configuration method (requires the same configuration on all child nodes).
(1) Create a shared directory.
Establish the same shared directory as the server for sharing server files:
Mkdir/usr/cluster
(2) View the shared directory that the server already has (this step can be omitted).
SHOWMOUNT-E 192.168.1.2
This command allows us to view the directory where the IP address can be shared by the 192.168.1.2 server.
(3) Mount the shared directory.
Mount-t NFS 192.168.1.2:/usr/cluster/usr/cluster
This command mounts the shared directory on the NFS server 192.168.1.2 to the local/usr/cluster directory. We can also enter the following code in the/etc/fstab file of all child nodes to enable the file system to mount NFS automatically at startup:
192.168.1.2:/usr/cluster/usr/cluster NFS Defaults 0 0
At this point we can implement local access to the NFS shared directory, and the/usr/cluster folder for all child nodes shares the contents of the folder with the same name as the Server for NFS, and we can access the shared files like local files. The Mpich installation directory and the folders where the user holds the parallel Program require NFS sharing, thereby avoiding sending a copy of the program to each node each time. Third, configure SSH to implement the user's password-free access between MPI nodes
Since MPI parallel programs need to transmit information between nodes, it is necessary to implement a password-free access between all nodes 22. Password-free access between nodes is achieved by configuring SSH public key authentication.
For example, to configure SSH public key authentication for new user users, do the following on the Node1 first.
(1) The private key ID_DSA and the public key id_dsa.pub are generated, and the specific operation methods are as follows.
mkdir ~/.ssh
CD ~/.ssh
Ssh-keygen-t DSA
The system displays some information, encounters the System Inquiry direct return can.
(2) Use the key as authentication and access authorization. Execute at Node1 as follows.
CP ~/.ssh/id_dsa.pub ~/.ssh/authorized_keys
chmod go-rwx ~/.ssh/authorized_keys
(3) Copy the files in the ~/.ssh directory to all nodes.
Scp-r ~/.ssh Node2:
Scp-r ~/.ssh Node3:
(4) Check to see if you can log on to other nodes directly (without a password).
SSH Node1
SSH Node2