Linux cluster Overview

Source: Internet
Author: User

Computing the number of cluster projects in Linux is the same as computing the number of startup companies in Silicon Valley. Unlike Windows NT, which has been blocked by its own closed environment, Linux has a large number of cluster systems to choose from and is suitable for different purposes and needs. However, it is not easy to determine which cluster should be used.

Part of the problem is that the term cluster is used in different scenarios. IT managers may be concerned about how to make the Server Run longer or applications run faster, while mathematicians may be more concerned about large-scale numerical computing on the server. Both require clusters, but each requires clusters with different features.

This article investigates different forms of clusters and a part of many implementations. These implementations can be purchased or obtained in the form of free software. Although not all of the solutions listed are open source code, most software follows the common practice of distributing Linux source code, especially because those who implement clusters often want to adjust system performance, to meet your needs.

A cluster always involves hardware connections between machines. In most cases, this only refers to the "Fast Ethernet" Nic and hub. However, in the field of cutting-edge science, there are many network interface cards designed specifically for clusters.

These include Myricom's Myrinet, Giganet's cLAN, and IEEE 1596 standard scalable consistent interface (SCI ). The functions of those cards not only provide high bandwidth between nodes in the cluster, but also reduce the time used to send delayed messages ). Latency is critical when status information is exchanged between nodes to keep operations synchronized.

Myricom provides network adapters and switches. The unidirectional interconnection speed can reach up to 1.28 Gbps. Two network interfaces are available: copper and optical fiber. A copper LAN can communicate at full speed within a distance of 10 feet and operate at half speed within a distance of 60 feet. The fiber-type Myrinet can run at full speed on a 6.25-mile-long single-mode fiber or 340 feet-long multi-mode fiber. Myrinet only provides point-to-point, hub-based, or vswitch-based network configurations, but there is no limit on the number of switches that can be connected together. Adding an optical switch only increases the latency between nodes. The average latency between two directly connected nodes is 5 to 18 microseconds, much faster than Ethernet.

Cluster type
The three most common cluster types are high-performance scientific clusters, Server Load balancer clusters, and high-availability clusters.

Scientific Cluster
Generally, the first type involves developing parallel programming applications for clusters to solve complex scientific problems. This is the foundation of parallel computing, although it does not use a dedicated Parallel Supercomputer, which is composed of ten to ten thousand independent processors. However, it uses commercial systems, such as a group of single-processor or dual-processor PCs linked through high-speed connections, and communicates on the public message passing layer to run parallel applications. Therefore, you will often hear that another cheap Linux supercomputer is available. However, it is actually a computer cluster with the same processing capability as a real supercomputer. Generally, the overhead of a set of identical cluster configurations is more than $100,000. This seems too expensive for average people, but it is cheaper than dedicated supercomputer worth millions of dollars.

Server Load balancer Cluster
Server Load balancer clusters provide a more practical system for enterprises. As the name implies, the system enables the load to be evenly distributed in the computer group as much as possible. This load may be the load to be balanced by the application processing load or network traffic load. Such a system is very suitable for a large number of users running the same group of applications. Each node can handle part of the load and dynamically allocate the load between nodes to achieve a balance. This is also true for network traffic. Generally, network server applications accept too much incoming traffic and cannot be processed quickly. Therefore, you need to send the traffic to network server applications running on other nodes. It can also be optimized based on different available resources on each node or the special environment of the network.

High Availability Cluster
High Availability clusters are designed to make the overall services of the cluster as available as possible, so as to consider the error tolerance of computing hardware and software. If the master node in the HA cluster fails, it will be replaced by the secondary Node during this period of time. The secondary node is usually an image of the master node, so when it replaces the master node, it can take over its identity completely, and thus make the system environment consistent with the user.

The three basic types of a cluster are often mixed and mixed. As a result, it can be found that high-availability clusters can also balance user load between their nodes, while still trying to maintain high-availability. Similarly, you can find a parallel cluster from the group to be compiled into the application, which can perform load balancing between nodes. Although the cluster system itself is independent of the software or hardware it is using, the hardware connection plays a key role in the effective operation of the system.

Giganet is the first supplier of Virtual Interface (VI) architecture cards for Linux. It provides cLAN cards and switches. The VI architecture is a platform-independent software and hardware system developed by Intel for cluster creation. It uses its own network communication protocol to directly exchange data between servers, rather than IP addresses, and it is not intended to become a WAN-routable system. Now, the future of VI depends on the ongoing "system I/O group" work, this team is the merger of Intel's next generation I/O team and the future I/O team led by IBM and Compaq. Currently, Giganet provides 1 Gbps unidirectional communication between nodes, with a minimum latency of 7 microseconds.

IEEE Standard SCI has a latency of less than 2.5 microseconds, and its unidirectional speed can reach 400 Mb/s (3.2 Gbps ). SCI is a ring-based network system, Unlike Ethernet. This will enable faster communication between large-scale nodes. What's more useful is the circular topology network, which has many ring structures between nodes. The two-dimensional ring surface can be represented by a grid of n multiplied by m, with a ring network in each row and each column. The three-dimensional ring surface is similar. It can be represented by a three-dimensional node mesh with a ring network on each layer. The intensive supercomputing parallel system uses a circular topology network to provide the fastest path for communication between hundreds of nodes.

Most operating systems are not restricted by operating systems or network interfaces, but by the internal PCI bus system of the server. Almost all desktop PCs usually have basic 32-bit, 33-MHz PCI, and most low-end servers only provide 133 Mb/s (1 Gbps), which limits the capabilities of those NICs. Some expensive high-end servers, such as the Compaq Proliant 6500 and IBM Netfinity 7000 series, all have 64-bit, 66-MHz NICs that can run at four times the speed. Unfortunately, the conflict is that more companies use low-end systems, so most vendors eventually produce and sell more low-end PCI NICs. There are also dedicated 64-bit, 66-MHz PCI NICs, but the price is much more expensive. For example, Intel provides this type of "Fast Ethernet" Nic, which costs about $400 to $500, almost five times the price of common PCI.

Scientific Cluster
Some parallel cluster systems can achieve such high bandwidth and low latency because they usually bypass the use of network protocols, such as TCP/IP. Although the Internet protocol is very important to the wide area network, it contains too much overhead, which is unnecessary in closed network clusters known to each other on nodes. In fact, some of those systems can use Direct Memory Access (DMA) between nodes, which is similar to how graphics cards and other peripheral devices work on one machine. Therefore, distributed shared memory can be directly accessed across clusters through any processor on any node. They can also use a low-cost messaging system to communicate with each other.

The message passing interface (MPI) is the most common implementation of the Message Passing layer between parallel cluster systems. MPI has several derivative versions, but in all cases, it provides a public API for developers to access parallel applications, so developers do not have to manually solve how to distribute code segments between cluster nodes. One of them, Beowulf first uses MPI as a public programming interface.

It is difficult to decide which high-performance cluster package to use. Many provide similar services, but the specific requirements of computing are the decisive factors. In many cases, the research work in those systems is only half the solution, and the use of those software requires the special help and cooperation of cluster package developers.

When talking about Linux Clusters, many people first reflect Beowulf. It is the most famous Linux science software cluster system. No package is called Beowulf. In fact, it is a term applicable to a group of public software tools running on the Linux kernel. These include popular software Message Passing APIs, such as the "message passing interface" (MPI) or "Parallel Virtual Machine" (PVM) to modify the Linux kernel, allows you to combine several Ethernet interfaces, high-performance network drives, changes to the virtual memory manager, and distributed inter-process communication (DIPC) services. The public global process identity space allows the DIPC mechanism to access any process from any node. Beowulf also supports a series of hardware connectivity options between nodes.

Beowulf may be the first high-performance cluster system to be noticed during Linux, only because of its wide use and support. There are many documents and books on this topic. The difference between Beowulf and some of the following scientific cluster systems can be actual, or there is only a difference in the product name. For example, AltaCluster of Alta Technologies is a Beowulf system despite its different names. Some vendors, such as ParTec AG, a German company, provide derivative versions of the Beowulf model to include other management interfaces and communication protocols.

Giganet cLAN
Giganet provides a customized hardware-based solution that uses non-IP protocols to communicate between nodes in a scientific cluster. As mentioned above, the "Virtual Interface" protocol removes many protocol overhead, such as IP addresses, to support faster communication between servers. In addition, the hardware system can run at a gigabit speed with a short latency, making it ideal for building scientific clusters with up to 256 nodes. This vendor supports MPI so that many parallel applications can run on similar systems such as Beowulf.

It also has the disadvantage of Beowulf, that is, it cannot be used as a network load sharing system, unless you want to write an application to monitor and distribute network packets transmitted between servers.

Legion tries to build a real multi-computer system. This is a cluster, where each node is an independent system, but in the user's opinion, the entire system is only a computer. Legion is designed to support a World-Wide Computer consisting of millions of hosts and trillions of software objects. In Legion, users can create their own cooperation groups.

Legion provides high-performance concurrency, load balancing, distributed data management, and fault tolerance.

Legion provides high-performance concurrency, load balancing, distributed data management, and fault tolerance. It supports high availability through its fault tolerance management and Dynamic Reconfiguration Between Member nodes. It also has an extensible core that can be dynamically replaced or upgraded when new improvements and advances occur. The system does not only accept a single control, but can be managed by any number of organizations, and each organization supports the overall autonomy. The Legion API provides high-performance computing through its built-in parallelism.

Legion needs to use specially written software so that it can use its API library. It is located on top of the user's computer operating system and coordinates local resources and distributed resources. It automatically handles resource scheduling and security, and manages the context space to describe and access hundreds of millions of objects across the system. However, when running on each node, you do not need to use the system administrator privilege, and you can use a non-privileged user account for work. This will increase the flexibility of nodes and users joining the Legion.

The Computational Plant in Sandia National Lab is a large-scale overall parallel cluster used for TeraFLOP trillion floating point operations) computing and built on commercial components. The entire system consists of "scalable units" that can be divided into computing, disk I/O, network I/O, and service management for different purposes ). Each node in the cluster is a Linux system with a specially developed kernel-level module that provides partition services. The function of each partition can be modified by loading and detaching kernel-level modules.

The project is completed in three phases. The initial phase is the prototype. There are 128 Systems Based on 433-MHz DEC Alpha 21164, each of which has 192 mb ram and 2 GB drive, connect to each other using the Myrinet Nic and the 8-port SAN Switch. The 1st phase expands it to 400 21164-based workstations that run at 500 MHz with 192 mb ram and no memory, connect with a 16-port SAN Switch in a hyper-Cube Structure and run Red Hat 5.1. The current 2nd stage has 592 machines based on DEC 21264, which run at 500 MHz and have 256 mb ram with no drives. Each node uses a 64-bit, 33-MHz PCI Myrinet Nic, and is still connected using a 16-port switch in a hyper-cube structure.

Applications running on Cplant include Optimization of Computing Systems in sparse linear systems, fluid mechanics and structural mechanics, simulation of molecular mechanics, and finite element analysis of Linear Structural Mechanics, and a Dynamic Load Balancing library for parallel applications.

The system research team at the University of Hong Kong has a Java-based cluster called Java-Supported Single-system image computing architecture (JESSICA), which serves as a middleware layer to fulfill the illusion of a single system image. This layer is a global thread space for all threads running on each node that uses the Distributed Shared Memory (DSM) system for communication. This project uses ThreadMark DSM, but will eventually use their own JiaJia Using Migrating-home Protocol (JUMP ). They use customized Java-based ClusterProbe software to manage 50 nodes of the cluster.

The French IRISA Institute's "programming parallel and distributed systems for Large-scale Digital Simulation Applications" (PARIS) project provides several tools for creating Linux Server clusters. The project consists of three parts: the cluster resource management software, the runtime environment of the parallel programming language, and the software tool for distributed digital simulation.

Resource management software includes Globelins distributed systems used to share memory, disks, and processor resources, and Dupleix and Mome distributed shared memory systems.

Server Load balancer Cluster
A server Load balancer cluster distributes network or computing workloads among multiple nodes. In this case, the difference is that a single parallel program runs across nodes. In most cases, each node in the cluster is an independent system that runs independent software. However, whether directly communicating between nodes or controlling the load of each node through the central server Load balancer, there is a public relationship between nodes. Generally, a specific algorithm is used to distribute the load.

Network Traffic Load Balancing is a process. It checks the inbound traffic of a cluster and distributes the traffic to each node for proper processing. It is most suitable for large network applications, such as Web or FTP servers. The server Load balancer network application service requires the cluster software to check the current load of each node and determine which nodes can accept new jobs. This is ideal for running serial and batch processing jobs such as data analysis. These systems can also be configured to focus on the hardware or operating system features of a specific node: in this way, nodes in the cluster do not need to be consistent.

Linux Virtual Server
The "Linux virtual server" project has implemented many Kernel patches that create a Server Load balancer System for inbound TCP/IP traffic. The LVS software checks incoming traffic and redirects the traffic to a group of servers acting as clusters based on the load balancing algorithm. This allows network applications, such as Web servers, to run on a node cluster to support a large number of users.

LVS can be used as a server Load balancer to directly connect to a cluster node on the same LAN. However, LVS can also connect to a remote server by transmitting IP packets through channels. The latter method includes compressing the Server Load balancer requests from IP packets that are directly sent from the Server Load balancer server to the remote cluster node. Although LVS can remotely support Server Load balancer for websites, the Server Load balancer algorithm used by LVS is still ineffective for wide-area Web servers in virtual clusters. Therefore, if the Web servers are all in the same LAN, LVS is best used as a server Load balancer instance.

Several hardware implementations of the Server Load balancer system run faster than common operating systems, such as Linux. They include hardware from Alteon and Foundry, whose hardware logic and minimal operating system can perform traffic management in the hardware, and the speed is faster than that of pure software. They are also very expensive, usually more than $10,000. If you need a simple and inexpensive solution, a medium Linux system with a lot of memory (256 MB) will be a good Load Balancing System.

TurboLinux TurboCluster and enFuzion
In TurboLinux, a product called TurboCluster was initially based on the kernel patch developed by the "Linux virtual server" project. Therefore, it can obtain most advantages, but its disadvantages are the same as those of the original project. TurboLinux has also developed some tools to monitor cluster behavior that increases the availability of products. The commercial support of a major supplier also makes it more attractive to large websites.

EnFuzion supports automatic load balancing and resource sharing between nodes, and can automatically reschedule failed jobs.

EnFuzion is a scientific cluster product to be launched by TurboLinux. It is not based on Beowulf. However, it supports hundreds of nodes and many different non-Linux platforms, including Solaris, Windows NT, HP-UX, ibm aix, SGI Irix, and Tru64. EnFuzion is very interesting because it runs all existing software and does not need to write custom parallel applications for the environment. It supports automatic load balancing and resource sharing between nodes, and can automatically reschedule failed jobs.

Platform Computing LSF Batch Processing
As a veteran in the field of cluster Computing, Platform Computing now provides the "LSF batch processing" software on the Linux Platform. LSF batch processing allows the central controller to schedule jobs to run on any number of nodes in the cluster. Conceptually, it is similar to TurboLinux enFuzion and supports running any type of applications on nodes.

This method is very flexible for the cluster size, because you can clearly select the number of nodes, or even the nodes that run the application. Therefore, a cluster with 64 nodes can be divided into smaller logical clusters, and each logical cluster runs its own batch processing application. In addition, if an application or node fails, it can reschedule the job on another server.

Platform products run on major Unix systems and Windows NT. Currently, only their LSF batch processing products have been migrated to Linux. In the end, the rest of the LSF Suite components will be immediately followed and transplanted to Linux.

Resonate Dispatch Series
Resonate has a software-based load balancing method, similar to a Linux virtual server. However, it supports more features and better load balancing algorithms. For example, you can use Resonate to mount a proxy on each cluster node to determine the current system load of the node. The server Load balancer server then checks the proxy of each node to determine which node has the least load and sends new traffic to it. In addition, Resonate can also use its Global Dispatch product to more effectively support regional distributed servers.

Resonate has thoroughly tested the software on Red Hat Linux and is sure it can run on other releases. The Resonate software can also run on a variety of other platforms, including Solaris, AIX, and Windows NT. It can also perform load balancing in a hybrid environment.

MOSIX uses the new Linux kernel version to implement the process load balancing cluster system. In this cluster, any server or workstation can join or exit as specified, that is, the total processing capacity added to or removed from the cluster. According to its documentation, MOSIX uses adaptive process load balancing and memory boot algorithms to maximize overall performance. Application processes can be first migrated between nodes to take advantage of the best resources, similar to symmetric multi-processor systems that can switch applications between processors.

MOSIX is completely transparent at the application layer and does not need to be re-compiled or re-linked to the new library, because everything happens at the kernel level. You can configure it as a multi-user shared environment cluster in several ways. All servers can share a pool. The system can be part of a cluster, or the cluster can be dynamically divided into several subgroups. Each method has different purposes. A Linux workstation can also be a part of a cluster, either fixed or temporary, or just used as a batch job submitter. As a temporary cluster node, the workstation can be used to increase the cluster processing capability when it is idle. Clusters can also be used only in batches. In this mode, the cluster is configured to accept batch processing jobs through queues. The daemon then removes jobs and sends them to the cluster node for processing.

The disadvantage of MOSIX is that it changes some of the core components of Linux kernel behavior, so system-level applications will not run as expected.

In addition to high-performance scientific computing, MOSIX provides an interesting option for setting up a cluster environment together. By using idle resources on servers and workstations, it can create and run applications faster and more efficiently. Because multiple servers are accessed, and the cluster size can be dynamically adjusted and load balancing rules can be changed, it can also provide high server availability. The disadvantage of MOSIX is that it changes some of the core components of Linux kernel behavior, so system-level applications will not run as expected. To use a network application that uses a socket connection based on a single server address, MOSIX is usually restricted. This means that when the network application starts running on a server node, if the IP address is bound to the socket, it must continue to run on the node. Apparently, MOSIX is still starting to migrate sockets, so this soon becomes the focus of debate.

High Availability Cluster
High Availability (HA) clusters are dedicated to making the server system run as fast as possible and respond as quickly as possible. They often use redundant nodes and services running on multiple machines for mutual tracking. If a node fails, its replacement takes over its responsibilities in seconds or less. Therefore, the cluster will never stop.

Some HA clusters can also maintain redundant applications between nodes. Therefore, the user's application will continue to run even if the node he or she uses fails. The running application will be migrated to another node within a few seconds, and all users will only notice that the response is a little slower. However, such application-level redundancy requires that the software be designed to be cluster-aware and know what to do when the node fails. But for Linux, most of them cannot do it now. Because the Linux system does not have HA cluster standards, and there is no public API to supply software built by program developers with cluster awareness.

The HA cluster can perform load balancing, but usually the master server runs jobs, while the system keeps the auxiliary server idle. The secondary server is usually an image set by the operating system of the primary server, although the hardware itself is slightly different. The secondary node monitors the activity or heartbeat of the master server to check whether the master server is still running. If the heartbeat timer does not receive a response from the master server, the secondary node takes over the network and system identity. If it is a Linux system, it is the IP host name and address ).

However, Linux is still ignored in this field. The good news is that a well-known vendor is working to develop a high-availability cluster as soon as possible because it is a required feature for enterprise-level servers.

Linux-HA Project
The high-availability Linux project aims to provide a high-availability solution for Linux based on its objective statement to improve reliability, availability, and service capabilities through community development results. When Linux reaches the high availability cluster, it is an attempt to give Linux and advanced Unix systems, such as Solaris, AIX and HP/UX, the same competitive features. Therefore, the goal of the project is to reach the specific functional level of Expert Group D. H. Brown in the Unix cluster comparison report ( by 2001.

The project contains software that can maintain the heartbeat between nodes and take over the IP addresses of failed nodes. If a node fails, it uses the "forged redundant IP Address" package to add the address of the failed node to the work node to assume its responsibilities. Therefore, the failed node can be automatically replaced within several milliseconds. In actual use, the heartbeat is usually within a few seconds, unless there is a dedicated network connection between nodes. Therefore, user applications in the failed system still need to be restarted on the new node.

Ubiquitous Clusters
For Linux, many cluster systems are available. At the same time, several of those projects are non-commercial or even experimental. Although there is no problem with academia and some organizations, large companies usually prefer the commercial support platform of famous suppliers. Vendors such as IBM, SGI, HP, and Sun provide products and services for building scientific clusters in Linux because clusters are popular and can sell a large number of server devices. Once commercial organizations believe that other forms of clusters are reliable, those same server vendors may create their own products around the open source code cluster solution.

The importance of Linux as a server platform depends on the ability to support large servers and server clusters. This allows it to compete with the UNIX servers of Sun, HP, IBM and other companies at a higher level. Although Windows NT and 2000 do not support the cluster range that Linux can support, the availability of the HA cluster's formal methods and APIs used to build cluster awareness also allow it to compete.

If you are considering building a cluster, you should carefully check the possibilities and compare them with your needs. You may find that the target cannot be a complete solution, or you may find a ready-made solution. In either case, believe that many existing companies entrust their applications to Linux system clusters that perform deep computing and provide a large number of web pages. A cluster is an enterprise system service that has been successfully tested in Linux. Although the new cluster will appear, the diversity of selection is the advantage of Linux over other systems, such as Windows NT.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.