It is said that the new version of mpich2 uses HYDRA as the process management by default. At that time, I used MPD. I didn't know much about Hydra. Thank you for your advice ~~
Lenovo deep Teng 1800 cluster, 64-bit centos system, version 5.4, Root Account (non-root account configuration may be different, MPD. conf file)
1. Ensure that no password is required for mutual access between nodes:
[Root @ c0104 ~] # Ssh-keygen-t rsa # Press ENTER
[Root @ c0104 ~] # Cp. Ssh/id_rsa.pub. Ssh/authorized_keys
[Root @ c0104 ~] # Chmod go-rwx. Ssh/authorized_keys
Repeat each node. Collect the authorized_keys files under all nodes and merge them into one authorized_keys. The SCP command is distributed to ~ /. Ssh directory.
Try to log on to all nodes (including itself) on a certain node. For SSH noden, go to $ home /. generate a file named known_hosts under ssh/, containing the identity fingerprint for accessing the host, and copy the known_hosts file on each node.
Bay.
Ii. Install mpich2
I have installed mpich2 on all nodes, and NFS is not used.
# Tar zxvf mpich2-1.0.2p1.tar.gz #./configure
# Make
# Make install
OK! Whicm MPD, whicm mpdtrace check.
Iii. Configuration
[Root @ c0104 ~] # Touch/etc/MPD. conf
Then VI/etc/MPD. conf with the content secretword = something # Whatever character something can be replaced
[Root @ c0104 ~] # Chmod 600/etc/MPD. conf
The SCP command is distributed to other nodes ~ /Directory
[Root @ c0104 ~] # Vi MPD. hosts # I'm going to test it at node 4.
Content:
C0104 # name of each node in the cluster, one row
C0106
C0108
C0110
Iv. Test
[Root @ c0104 ~] # Mpdboot-N 4-f mpd. hosts # parameter-N indicates the number of nodes to be started, and-F indicates the file containing the name: MPD. Hosts
[Root @ c0104 ~] # Mpdtrace # view the started node and add the-l parameter to view the port number.
[Root @ c0104 ~] # Mpicc CPI. C-o CPI
Or G ++ test. cpp-I/usr/include/mpich2-lmpi-O test # mpicxx test. cpp-O test is easier to use.
[Root @ c0104 ~] # Mpiexec-N num./cpi
[Root @ c0104 ~] # Mpdallexit # exit from all nodes
5. Tangle Problems
1. The no_port error may be caused by the permission issue of the MPD. conf file. You can execute chmod 600 MPD. conf to solve the problem, and post the error message to Google for search.
2. error message:
[Root @ c0104 ~] # Mpdboot-N 3-f mpd. Hosts
Mpdboot_c01__0 (mpdboot 406): Error trying to start MPD (BOOT) at 2 {'host': 'c0108 ', 'cpus': 1, 'ifhn ':''}; output:
Mpdboot_c00000_2 (err_exit 415): MPD failed to start correctly on c0108
Reason: 2: Unable to ping local MPD;
Invalid MSG from MPD :{}:
** MPD may have disappeared, perhaps due to mismatched secretwords
** See msgs logged in Syslog and/tmp/mpd2.logfile * On c0108
Last printed output from MPD before becoming a daemon:
41819
Mpdboot_c00000_2 (err_exit 421): Contents of MPD logfile in/tmp:
Logfile for MPD with PID 4894
C01__41819: conn error in connect_rhs: No route to host
C01__41819 (connect_rhs 602): failed to connect to RHS at 192.168.1.7 49518
C0108_41819 (enter_ring 513): RHS connect failed
C0108_41819 (run 215): failed to enter Ring
Mpdboot_c01__0 (err_exit 415): MPD failed to start correctly on c0104
Possible solutions: Disable firewall and SELinux