Now, Linux has a very important influence in the IT industry. Apart from its free, efficient, and reliable advantages, for computer scientists and scientists who need to perform a large number of computations, it is also a very powerful tool. Since Donald Becker initiated Beowulf cluster computing, Thomas Sterling, working at NASA's Goddard Space Flight Center, has expanded Linux applications in the high-performance parallel computing field. Today, a large number of general PC-based clusters appear in laboratories, Industrial Technology Centers, universities, and even small colleges at various levels. If someone asks you if a question about scientific computing can be solved through some loose computing resources? The answer is yes. We can use Beowulf clusters, which can be made into a cluster using many common PCs to solve our problems, in addition, the price advantage of such clusters is incomparable to that of traditional parallel computers.
How to Create a Beowulf Cluster
In fact, with an existing PC or an old PC, anyone can build their own parallel system to practice parallel programming or perform parallel operations. In a computer lab, we can make a PC into a dual-boot system (you can enter Windows or Linux as needed) for two purposes. In addition, for those machines that are no longer in use, they can be made into a parallel computing system like Stone SouperComputer.
No two Beowulf clusters are identical. In fact, the hardware and software configurations of such systems are so flexible that they can be easily customized into different combinations. Although every Beowulf cluster system is different and Its configuration is also based on the needs of the application, there are still some basic requirements that are the same. Next, let's take a look at some basic issues that need to be considered when creating a cluster.
Minimum requirements for creating a cluster
To create a cluster, each node should contain at least one Intel 486 CPU and motherboard. Although Intel 386 can work properly, its performance is not worth our effort. Memory requirements depend on the needs of the target application, but each node requires at least 16 MB of memory. Most applications require each node to have more than 32 MB of memory. By using centralized disk space, nodes can be started from floppy disks, small-capacity hard disks, or network file systems. After startup, the node can access its own Root partition in the file system through the network, which is generally implemented through NFS (Network File System. In an environment with high bandwidth and high-performance servers, this configuration will work very well. To achieve better performance, you should install the operating system and swap partitions on the local disk, and the data can be obtained at each node. Each node should have at least MB of disk space for operating system components and swap space, and MB or more space should be reserved for running the program. Each node must contain at least one NIC (preferably a high-speed Nic ). At last, each node requires a video card, a hard drive, and a power supply. The keyboard and display are only required for system installation and configuration.
It should be noted that drivers or corresponding modules are required for all selected hardware in Linux. In general, unless these hardware is outdated, it is not a problem. For the master node that needs to manage the entire cluster, it is best to install an X Server for convenience. During the installation process, if a specific component has a problem or no driver, you can go to the Forum for help.
Network Connection
If possible, each node should be in a separate LAN and have its own Hub. This ensures smooth network communication. The first or master node in the cluster should have two NICs, one of which is connected to the internal network and the other is connected to the public network. This is particularly useful for user login and file transfer. In an internal network, you must use an IP address that is not on the Internet. Generally, the simplest method is the 10.0.0.0 address of Class A, because these addresses are reserved for those networks without routes. In this example, the/etc/hosts file of each node looks as follows:
10.0.0.1 node110.0.0.2 node210.0.0.3 node310.0.0.4 node4
|
The/etc/hosts. equiv file of each node should be as follows:
node1node2node3node4.< /CODE>
|
A node number is 2, and the ifcfg-eth0 configuration file using Red Hat Linux is as follows:
DEVICE=eth0IPADDR=10.0.0.2NETMASK=255.0.0.0NETWORK=10.0.0.0BROADCAST=10.255.255.255ONBOOT=yes
|
In addition, we often need a DNS, especially for internal networks with frequently changed node names and addresses. DNS can run on the first node to provide name/address resolution for nodes in the internal network.
Local Storage
When loading the operating system, you need to make some storage configuration decisions before creating a Beowulf cluster. Once the installation is complete, all nodes need to be re-installed. Therefore, you must take a very careful consideration. Although most Linux-based Beowulf clusters run Red Hat Linux distributions, basically all Linux distributions support basic clusters. The installation of Red Hat is very simple. We can use a CD or install it on the first node of the cluster (the premise is that a copy of the release version already exists on the node ). In actual use, many people find that it is better to load the operating system to each node through FTP from the master node than to mount the Root partition through NFS. This method avoids unnecessary network communication and retains the bandwidth for information transmission when the application is running.
In the Red Hat Linux operating environment, each node only requires a disk space of about mb. However, in practice, it is necessary to include a compilation tool and other tools for each node. Therefore, in the configuration, each operating system requires about MB disk space. Although some clusters have configured swap partitions on common file systems, it is more efficient to use a dedicated swap partition on a local disk. Generally, the swap space of a node should be twice the memory size, and when the memory is larger than 64 MB, the swap space should be equal to the memory size. In practice, when the memory is 64 MB to 128 MB, we usually set the swap partition to MB. Therefore, if a node has 32 MB memory and two hard disks, we should load the Linux system to the master drive and use another hard disk as the swap space (64 MB) and local running space (138 MB ).
Cluster Management
System management and maintenance is a very tedious task, especially for large clusters. However, we can find some tools and scripts on the Internet to simplify the work. For example, a node must be in time and system files (/etc/passwd,/etc/group,/etc/hosts,/etc/hosts. equiv and so on), so a simple script that can be scheduled by cron can be used to complete the synchronization process.
Once all nodes are loaded and configured, we can develop and design parallel applications to make full use of the computing power of the new system.
Develop parallel applications for Cluster Computing
In Linux, we can use commercial compilers or free compilers. GCC, g ++, and FORTRAN (g77) Compilers are included in most Linux distributions. Among them, the C and C ++ compilers are already very good, and the FORTRAN compilers are constantly improving. Commercial compilers can be obtained from companies such as Absoft, Portland Group, and The Numerical Algorithms Group. If configured properly, some commercial FORTRAN-90 compilers can automatically implement parallel computing. In general, developing parallel code requires the use of PVM (Parallel Virtual Machine), MPI (information transfer interface), or other communication libraries between processors for clear information transmission. PVM and MPI are free of charge, and information transmission between nodes can be realized through simple library calls in the computing process.
Of course, not all computing tasks are suitable for parallel computing. Generally, to make full use of the advantages of parallel computing, we usually need to develop tasks. Many scientific problems can be subdivided, that is, they can be divided into relatively independent modules so that they can be processed on each independent node. For example, an image processing task can be subdivided so that each node can process a certain part of the image. When an image can be processed independently (such as processing this image without other information), the effect is better.
For parallel computing, the most dangerous defect is that it turns a computing problem into a communication problem (whether it is using the existing parallel computing code or the new code developed by yourself ). This problem generally occurs when the task is too detailed, so that the time for each node to transmit data to maintain synchronization exceeds the CPU computing time. In this case, using fewer nodes may get more running time and make full use of resources. This means that different parallel applications should be adjusted and optimized based on the load and communication load calculated on local nodes.
Finally, when developing parallel algorithms, if the nodes in the cluster environment are different, you must fully consider this issue. In fact, when running parallel applications, the CPU speed between each node is very critical. Therefore, in a cluster with different configurations, only the tasks are evenly allocated, therefore, a fast CPU must wait for a slow CPU to complete its tasks. This is obviously unreasonable. Therefore, designing an appropriate algorithm can handle this situation well. Of course, no matter what algorithm is used, the communication overload problem must be fully considered.
Parallel processing can be organized in many ways, but the master/Slave organization is easier to understand and write programs. In this mode, one node acts as the master and the other as the Slave. The Master node usually determines how to split the task and transfer the command information. The Slave node is only responsible for processing the assigned task and reporting it to the master when the task is completed.
Summary
In fact, there is no strict rule when developing parallel code, but it should be done according to the actual situation. The premise of optimizing hardware configurations and algorithms is to know the details of the application to be run. In different clusters configured on each node, load balancing and communication between nodes depend on the specific hardware conditions. In an environment with fast communication speed, you can assign more details to tasks, and vice versa, it is not appropriate to over-fine the tasks.
It should be said that "Beowulf movement" will make parallel computing popular. Parallel code developed using the standard information transfer library in the Beowulf system can run directly on a commercial supercomputer without any changes. Therefore, the Beowulf cluster can be used as an entry point to transition to the mainframe as needed. In addition, cheap and general clusters mean that parallel computer environments can be used for specific tasks, while large commercial supercomputers cannot focus on a single application because they are too expensive. Obviously, as the parallel environment is increasingly applied to practical work, it will further promote its application in various fields.