1. Cluster
1.1 What is a cluster
Simply put, a cluster (cluster) is a group of computers that provide users with a set of network resources as a whole. These individual computer systems are nodes of the cluster. An ideal cluster is that users never realize the underlying nodes of a clustered system, and in his/her view, the cluster is a system, not multiple computer systems. And the administrators of the cluster system can arbitrarily add and snip the nodes of the cluster system.
1.2 Why clustering is required
Cluster is not a new concept, in fact, in the 70 's, computer manufacturers and research institutions began the research and development of cluster systems. These systems are not well known because they are primarily used for scientific project calculations. Until the advent of the Linux cluster, the concept of clustering was widely disseminated.
The research on clusters originates from the good performance scalability of cluster system (scalability). Increasing CPU frequency and bus bandwidth are the primary means of providing computer performance initially. However, this approach provides limited performance for the system. Then, by adding the number of CPUs and memory capacity to improve performance, so the emergence of vector machines, symmetric multiprocessor (SMP) and so on. But when the number of CPUs exceeds a certain threshold, the scalability of these multiprocessor system like SMP becomes very poor. The main bottleneck is that CPU access to memory bandwidth does not increase with the number of CPUs added. In contrast to SMP, the performance of a clustered system is almost linearly variable as the number of CPUs is added. Figure 1 shows the situation in this case.
Figure 1. Scalability of several computer systems
The strength of a clustered system is not just that. The following is a list of the key strengths of the cluster system:
High scalability: as described above.
High availability: One node in a cluster fails, and its tasks are passed to other nodes. Can effectively prevent single point failure.
High performance: Load balancing the cluster agrees that the system accesses many other users at the same time.
Cost-effective: the ability to build high-performance systems with inexpensive, industry-standard hardware.
1.2.1 Classification of cluster systems
Although there are many classification methods according to the different characteristics of the cluster system, we generally divide the cluster system into two categories:
Highly available (high availability) clusters, referred to as HA clusters. Such clusters are dedicated to providing highly reliable services.
High Performance Computing (perfermance Computing) clusters, referred to as HPC clusters. Such clusters are dedicated to providing powerful computing power that a single computer cannot provide.
Second page
2. High Availability Cluster
2.1 What is high availability
The availability of computer systems (availability) is measured by system reliability (reliability) and maintainability (maintainability). Project on-the-go often uses mean time-to-failure (MTTF) to measure the reliability of the system and to measure the maintainability of the system with mean time to repair (MTTR). The availability is then defined as:
mttf/(mttf+mttr) *100%
The industry divides computer systems into categories such as the following, depending on availability:
Table 1. Classification of system availability
Available scales annual Downtime availability classification
99.5 3.7-day conventional system (conventional)
99.9 8.8-Hour available system (Available)
99.99 52.6-minute high-availability system (highly Available)
99.999 5.3 min Fault Resilient
99.9999 32 sec Fault Tolerant
For critical business, downtime is generally catastrophic. The loss caused by downtime is also huge. The following statistics list the losses caused by the downtime of different types of enterprise application systems.
Table 2. The loss caused by the downtime to the enterprise
Application system loss per minute (USD)
Call center 27000
Enterprise Resource Planning (ERP) system 13000
Supply Chain Management (SCM) system 11000
E-commerce (ECoMMerce) system 10000
Customer Service Center System 27000
As businesses become more reliant on information technology, the loss of system downtime increases.
2.2 High-availability clusters
High Availability cluster is the use of clustering technology to achieve high availability of computer systems. Highly available clusters typically have two ways of working:
Fault-tolerant systems: Generally, the master-slave server approach. From the server, the state of the primary server is detected and the service is not available from the server when the primary service is working correctly. However, once the primary server fails, the server will start to replace the primary server to provide services to the customer.
Load balancing system: All nodes in the cluster are active, and they share the workload of the system. General webserver clusters, DB clusters, and application server clusters are of this type.
There is a great deal of discussion about highly available clusters, and this is not an in-depth exposition.
Third page
3. High-Performance computing cluster
3.1 What is a high-performance computing cluster
Simply put, high-performance computing (High-performance Computing) is a branch of computer science that is dedicated to developing supercomputers, researching parallel algorithms and developing related software. The main research for high performance computing is two types of problems:
Large-scale scientific issues, such as weather forecasts, topography analysis and biopharmaceutical;
Store and process massive amounts of data, such as data mining, image processing, and gene sequencing;
As the name implies, high performance clustering is the use of clustering technology to study high-performance computing.
3.2 High Performance Computing classification
There are many classification methods for high performance computing. This classifies high-performance computing from the perspective of the relationship between parallel tasks.
3.2.1 High Throughput calculation (High-throughput Computing)
There is a class of high-performance computing that divides it into sub-tasks that can be parallelized, and that each sub-task has no relation to each other. Like searching for aliens at home ([email protected] – Search for extraterrestrial Intelligence at home) is this type of application. This project uses idle computing resources on the Internet to search for aliens. The SETI project's server sends a set of data and data patterns to a compute node on the Internet that participates in SETI, and the compute node searches for the given pattern on the given data and sends the results of the search to the server. The server is responsible for pooling the data returned from each compute node into complete data. Because a common feature of this type of application is to search for certain patterns on massive amounts of data, this kind of calculation is called high-throughput computing. So-called Internet computing belongs to this category. According to the classification of Flynn, high throughput calculation belongs to the scope of SIMD (single instruction/multiple Data).
3.2.2 Distribution calculation (distributed Computing)
There is also a class of calculations that are just as good as high throughput calculations, although they can be divided into several parallel sub-tasks, but the sub-task between the very close, need a lot of data exchange. According to the classification of Flynn, distributed high-performance computing belongs to the MIMD (multiple instruction/multiple Data) category.
3.3 Linux High Performance cluster system
When it comes to Linux high-performance clusters, many people's first reflection is Beowulf. At first, Beowulf was only a well-known scientific computing cluster system. In the future, very many clusters use Beowulf similar architectures, so, in fact, today Beowulf has become a class of widely accepted types of high-performance clusters. Although the names vary, very many cluster systems are derivatives of Beowulf clusters. Of course there is a cluster system different from Beowulf, and cow and mosix are other two kinds of famous cluster system.
3.3.1 Beowulf Cluster
Simply put, Beowulf is an architecture that allows multiple computers to be used for parallel computing. Typically, Beowulf systems consist of multiple compute nodes and management nodes that are connected over Ethernet or other networks. The management node controls the entire cluster system, providing file services and external network connections to compute nodes at the same time. It uses common hardware devices like ordinary PCs, Ethernet cards, and hubs. It is very rare to use specially customized hardware and special equipment. Beowulf clusters of software are ubiquitous, like Linux, PVM, and MPI.
In the later part of this paper, the hardware, network, software and application architecture of Beowulf cluster system are introduced in detail.
3.3.2 Beowulf clusters and cow clusters
Like Beowulf, COW (Cluster of Workstation) is also built from the most common hardware devices and software systems. It is usually composed of a control node and multiple compute nodes. The main differences between cow and Beowulf are:
The compute nodes in cow are mostly idle computing resources, such as desktop workstations in the office, they are ordinary PCs, which are connected by ordinary LAN. Because these compute nodes are used as workstations during the day, basic cluster calculations occur at night and weekends, etc. spare time. The compute nodes in Beowulf are dedicated to parallel computing, and the performance is optimized. They use message passing (PVM or MPI) on the Fast Web (myrinet or giganet) for interprocess communication (IPC).
Because compute nodes in cow are primarily intended for desktop applications, they all have peripherals such as displays, keyboards, and mice. While Beowulf compute nodes usually do not have these peripherals, access to these compute nodes is generally achieved through a network or serial line on the management node.
Due to the general local area network that connects the compute nodes in the cow, high-performance applications on cow are typically high-throughput computations like [email protected], which are SIMD. Beowulf, however, is particularly optimized for MIMD applications that require frequent data exchange, regardless of hardware, network, or software.
3.3.3 Mosix Cluster
The fact that the Mosix cluster is actually placed in a high-performance cluster is quite farfetched, but compared to other clusters such as Beowulf, the Mosix cluster is really a very special cluster that is dedicated to implementing a single system mapping SSI for a clustered system on a Linux system Image). The Mosix cluster connects the computers that perform Linux on the network into a clustered system. The system itself actively balances the load between nodes. Since Mosix is a cluster implemented in the kernel of a Linux system, user-state applications do not need to be able to execute on mosix clusters, regardless of any changes. Frequent users rarely notice the difference between Linux and Mosix. For him, the Mosix cluster is a PC that executes Linux. Although there are a lot of problems today, Mosix is always a compelling cluster system.
Excerpt from: http://hi.baidu.com/movieyouth/blog/item/bbe6d658bad56385800a18b6.html
What is a cluster (cluster)