The establishment of MPI parallel computing environment

Last Update:2018-07-27 Source: Internet

Author: User

Tags ssh iptables

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

the establishment of MPI parallel computing environment first, the preparation work before the configuration

Suppose the cluster is 3 nodes. 1. Install the Linux (CentOS 5.2) system and ensure that the sshd service of each node can start normally.

Instead of using the real 3 machines, the author uses a virtual machine (VMware Workstation6.5) to simulate multiple Linux systems on a machine equipped with an XP system.

Precautions:

(1) because the author uses mpich2-1.3.2p1.tar.gz, this version of GCC, autoconf and other software package version requirements higher, in order to avoid errors, try to install the latest Linux system.

(2) in the use of VMware Workstation installation of Linux systems may encounter the problem of incompatible disk types, the author of the version of this problem, the main solution is as follows:

A. Start workstation choose to create a custom virtual machine;

B.SCSI Adapter Type Select LSI Logic (the Linux kernel is selected under 2.4 buslogic);

C. Select the Virtual disk type (IDE).

(3) Install VMware Workstation tools.

After Linux system startup, select Menu Bar-Virtual machine-install VMware Tools, follow the prompts to copy the appropriate installation package to the directory you want to execute the command:

Tar zxvf vmware-tools.tar.gz

CD Vmware-tools (enter extract directory)

./install.pl (depending on the version, the name is not necessarily the same, readers note that the name is similar to the execution) 2. Assign IP addresses to each node, and IP addresses are best allocated continuously, such as 192.168.1.2, 192.168.1.3, 192.168.1.4, ......。 (Do not assign 192.168.1.1) 3. Configure/etc/hosts file, which can implement IP address and machine correspondence resolution, all nodes of the file should be modified according to the following content:

192.168.1.2 Node1

192.168.1.3 Node2

192.168.1.4 Node3

Through the above configuration, the nodes can access each other through the machine name of each node. For example, you can test by pinging the Node2.

Precautions:

The test must be done under the condition that the Linux firewall is turned off, or it may fail.

Permanent entry into force: chkconfig iptables On/off (restart effective)

Immediate effect: Service iptables start/stop (reboot failure) Two, Mount NFS file system

Because the Mpich installation directory and user executable program need to save copies of all nodes in parallel computing, and the directory to each other, one node at a time the replication of a node is very troublesome, the use of NFS file system can realize all node content and the main node content synchronization updates, and automatically realize the directory of the corresponding. The NFS file system allows all machines to access files saved on the server in the same way as access to local files. In general, we will configure the Mpich installation directory and the parallel program Storage directory for NFS shared directories, which eliminates the hassle of copying files to each node and greatly improves productivity.

Examples of configuration methods for NFS file systems are as follows (assuming that the NFS server IP is 192.168.1.2, the configuration needs to be done under root). 1. Server-side configuration method (the following configuration is only done on the master node).

(1)/etc/exports file configuration

Add the following lines to the file/etc/exports:

/usr/cluster 192.168.1.3 (Rw,sync,no_root_squash,no_subtree_check)

/usr/cluster 192.168.1.4 (Rw,sync,no_root_squash,no_subtree_check)

These lines of text indicate that the NFS server shares its/usr/cluster directory (the directory must exist) to the 2 nodes of the IP address for 192.168.1.3,192.168.1.4, and that the nodes have the appropriate permissions (to query related documents). If more nodes can be filled in this way.

(2) Start NFS Service

Starting an NFS service requires only the following two commands:

Service Portmap Start

Note: In the latest kernel, the NFS daemon is changed to Rpcbind, and as a new kernel, the command to start the NFS daemon is Service rpcbind start.

Service NFS Start

The server to which this IP is 192.168.1.2 can already provide a file share of the/usr/cluster directory to the other two nodes. 2. Client Configuration method (requires the same configuration on all child nodes).

(1) Create a shared directory.

Establish the same shared directory as the server for sharing server files:

Mkdir/usr/cluster

(2) View the shared directory that the server already has (this step can be omitted).

SHOWMOUNT-E 192.168.1.2

This command allows us to view the directory where the IP address can be shared by the 192.168.1.2 server.

(3) Mount the shared directory.

Mount-t NFS 192.168.1.2:/usr/cluster/usr/cluster

This command mounts the shared directory on the NFS server 192.168.1.2 to the local/usr/cluster directory. We can also enter the following code in the/etc/fstab file of all child nodes to enable the file system to mount NFS automatically at startup:

192.168.1.2:/usr/cluster/usr/cluster NFS Defaults 0 0

At this point we can implement local access to the NFS shared directory, and the/usr/cluster folder for all child nodes shares the contents of the folder with the same name as the Server for NFS, and we can access the shared files like local files. The Mpich installation directory and the folders where the user holds the parallel Program require NFS sharing, thereby avoiding sending a copy of the program to each node each time. Third, configure SSH to implement the user's password-free access between MPI nodes

Since MPI parallel programs need to transmit information between nodes, it is necessary to implement a password-free access between all nodes 22. Password-free access between nodes is achieved by configuring SSH public key authentication.

For example, to configure SSH public key authentication for new user users, do the following on the Node1 first.

(1) The private key ID_DSA and the public key id_dsa.pub are generated, and the specific operation methods are as follows.

mkdir ~/.ssh

CD ~/.ssh

Ssh-keygen-t DSA

The system displays some information, encounters the System Inquiry direct return can.

(2) Use the key as authentication and access authorization. Execute at Node1 as follows.

CP ~/.ssh/id_dsa.pub ~/.ssh/authorized_keys

chmod go-rwx ~/.ssh/authorized_keys

(3) Copy the files in the ~/.ssh directory to all nodes.

Scp-r ~/.ssh Node2:

Scp-r ~/.ssh Node3:

(4) Check to see if you can log on to other nodes directly (without a password).

SSH Node1

SSH Node2

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More