1. Cluster
1.1 What is a cluster
Simply put, a cluster (cluster) is a group of computers that provide users with a set of network resources as a whole. These individual computer systems are the nodes (node) of the cluster. An ideal cluster is that the user never realizes the nodes at the bottom of the cluster system, in his/her view, the cluster is a system, not multiple computer systems. And the administrators of the cluster system can add and revise the nodes of the cluster system at will.
1.2 Why do I need a cluster
Cluster is not a new concept, in fact, as early as the 70 computer manufacturers and research institutions began to the cluster system research and development. These systems are not well known because they are mainly used in scientific engineering calculations. It was not until the advent of the Linux cluster that the concept of clustering was widely disseminated.
The research of cluster originates from the good performance Scalability (scalability) of the cluster system. Increasing CPU frequency and bus bandwidth are the primary means of providing computer performance. However, this method provides limited performance for the system. Then people increase the number of CPUs and memory capacity to improve performance, so there are vector machines, symmetric multiprocessor (SMP) and so on. But when the number of CPUs exceeds a certain threshold, the scalability of multiprocessor processors like SMP becomes extremely poor. The main bottleneck is that the bandwidth of CPU access memory does not grow as the number of CPUs increases. In contrast to SMP, the performance of a clustered system is almost linearly variable as the number of CPUs increases. Figure 1 shows the situation in this case.
Figure 1. Scalability of several computer systems
The advantage of the cluster system is not only that. The main benefits of a clustered system are listed below:
High scalability: as described above.
High availability: A node in a cluster is invalidated and its tasks can be passed to other nodes. Can effectively prevent single point of failure.
High performance: Load-balancing clusters allow the system to access more users at the same time.
High cost performance: A high-performance system can be constructed using inexpensive hardware that conforms to industry standards.
1.2. Classification of 1 cluster systems
Although according to the different characteristics of the cluster system can have a variety of classification methods, but generally we divide the cluster system into two categories:
Highly available (high availability) cluster, referred to as HA cluster. Such clusters are dedicated to providing highly reliable services.
High-performance Computing (high perfermance Computing) cluster, referred to as HPC cluster. Such clusters are dedicated to providing powerful computing power that a single computer cannot provide.
second page
2, High available cluster
2.1 What is high availability
The availability of computer systems (availability) is measured by system reliability (reliability) and maintainability (maintainability). The average failure time (MTTF) is usually used to measure the reliability of the system, and the average maintenance time (MTTR) is used to measure the maintainability of the system. The availability is then defined as:
mttf/(mttf+mttr) *100%
The industry divides computer systems into the following categories based on usability:
Table 1. System Availability Classification
Availability classification of available proportional annual downtime
99.5 3.7-day conventional system (conventional)
99.9 8.8-Hour available system (Available)
99.99 52.6-minute high availability system (highly Available)
99.999 5.3 min Fault Resilient
99.9999 32 sec Fault Tolerant
Downtime is usually catastrophic for critical business. Because of the loss caused by downtime is also huge. The following statistics illustrate the loss of application system downtime for different types of enterprises.
Table 2. Loss caused by downtime to business
Application system loss per minute (USD)
Call center 27000
Enterprise Resource Planning (ERP) system 13000
Supply Chain Management (SCM) system 11000
E-commerce (ecommerce) system 10000
Customer Service Center System 27000
As enterprises become more and more dependent on information technology, the loss caused by system downtime is more and more great.
2.2 Highly available clusters
High-availability clustering is the use of cluster technology to achieve high availability of computer systems. High-availability clusters typically have two ways of working:
Fault-tolerant systems: Usually the master-slave server approach. Detects the state of the primary server from the server and does not provide services from the server when the primary service is working properly. But once the primary server fails, the server starts to serve the customer instead of the primary server.
Load balancing system: All nodes in the cluster are active and they share the workload of the system. Common Web server clusters, database clusters, and application server clusters all belong to this type.
There's a lot of discussion about highly available clusters and there's no further elaboration here.
Third page
3. High Performance Computing Cluster
3.1 What is a high-performance computing cluster
In short, high performance computing (High-performance Computing) is a branch of computer science that is dedicated to developing supercomputers, researching parallel algorithms, and developing related software. High-performance computing focuses on the following two types of problems:
Large-scale scientific problems, such as weather forecasting, terrain analysis and bio-pharmacy;
Storing and processing massive data, such as data mining, image processing and gene sequencing;
As the name suggests, high-performance clustering is the use of cluster technology to study high-performance computing.
3.2 High Performance Computing classification
There are many classification methods for High-performance computing. This paper classifies High-performance computing from the perspective of the relationship between parallel tasks.
3.2.1 High Throughput calculation (High-throughput Computing)
There is a class of high-performance computing, which can be divided into several subtasks that can be parallel to each other, and there is no association between each child task. Searching for Aliens (SETI@HOME-Search for extraterrestrial Intelligence at home) is the type of application. The project uses unused computing resources on the Internet to search for aliens. The SETI project's server sends a set of data and data patterns to the SETI computing node on the Internet, the compute node searches for a given pattern on the given data, and then sends the results of the search to the server. The server is responsible for aggregating the data returned from each compute node into full data. Because a common feature of this type of application is the search for certain patterns on massive amounts of data, this kind of computation is referred to as high throughput computation. So-called Internet computing falls into this category. According to the classification of Flynn, the high throughput computation belongs to the category of SIMD (single instruction/multiple Data).
3.2.2 Distribution calculation (distributed Computing)
Another type of computation is just the opposite of the high throughput calculation, although they can be divided into several parallel subtasks, but the subtasks are closely related and require a large amount of data exchange. According to the classification of Flynn, distributed high-performance Computing belongs to the category of MIMD (multiple instruction/multiple Data).
3.3 Linux High Performance cluster system
When it comes to Linux's high-performance clusters, many people's first reaction is Beowulf. At first, Beowulf was just a famous scientific computing cluster system. Many future clusters adopt Beowulf similar architectures, so, in fact, Beowulf is now a type of widely accepted high-performance cluster. Despite their different names, many cluster systems are derivatives of the Beowulf cluster. Of course there are also different from the Beowulf cluster system, cow and Mosix is another two kinds of well-known cluster system.
3.3.1 Beowulf Cluster
Simply put, Beowulf is an architecture that enables multiple computers to be used for parallel computing. Typically Beowulf systems consist of multiple compute nodes and management nodes that are connected by Ethernet or other networks. The management node controls the entire cluster system while providing file services and external network connectivity for compute nodes. It uses common hardware devices like normal PCs, Ethernet cards, and hubs. It rarely uses specially tailored hardware and special equipment. Beowulf cluster software is also ubiquitous, like Linux, PVM, and MPI.
The next few sections of this article will detail the hardware, network, software, and application architecture of the Beowulf cluster system.
3.3.2 Beowulf cluster and cow cluster
Like Beowulf, COW (Cluster of Workstation) is also built from the most common hardware devices and software systems. It is usually composed of a control node and multiple compute nodes. The main differences between cow and Beowulf are:
Cow computing nodes are mainly idle computing resources, such as desktop workstations in the office, they are ordinary PCs, using a common LAN to connect. Because these compute nodes are used as workstations during the day, the primary cluster calculation occurs during evenings and weekends and other idle times. The computing nodes in Beowulf are all allied to parallel computing, and performance optimization is performed. They use message passing (PVM or MPI) on high-speed networks (Myrinet or Giganet) for interprocess communication (IPC).
Because the computing nodes in cow are primarily intended for desktop applications, they all have peripherals such as monitors, keyboards, and mice. The Beowulf compute nodes usually do not have these peripherals, and the access to these compute nodes is usually done on the management node via a network or serial line.
Because the computing node in the connection cow is typically a common LAN, high performance applications on cow are usually high throughput computations such as SETI@home. However, Beowulf has been specially optimized for MIMD applications requiring frequent exchange of data from hardware, network and software.
3.3.3 Mosix Cluster
In fact, it is quite far-fetched to put the Mosix cluster in a high-performance cluster, but compared to other clusters such as Beowulf, the Mosix cluster is indeed a very special kind of cluster, dedicated to the implementation of a single system image SSI on a Linux system with the cluster system Image). The Mosix cluster connects computers running Linux on the network to a clustered system. The system automatically balances the load between nodes. Because Mosix is a cluster implemented in the Linux system kernel, user-state applications can run on Mosix clusters without any modification. Typically, users will rarely notice the difference between Linux and Mosix. For him, the Mosix cluster is a PC running Linux. Although there are a lot of problems now, Mosix is always a compelling cluster system.
Excerpt from: http://hi.baidu.com/movieyouth/blog/item/bbe6d658bad56385800a18b6.html