Cluster (cluster) principle

Source: Internet
Author: User
Tags file system time interval

First, what is a cluster

A cluster (cluster) is a group of computers that provide users with a set of network resources as a whole. These individual computer systems are nodes of the cluster. An ideal cluster is that users will never be aware of the nodes at the bottom of the cluster system, and in his/her view, the cluster is a system, not multiple computer systems. And the administrators of the cluster system can arbitrarily add and remove nodes of the cluster system.

Second, the advantages of the cluster

1. High scalability

2. High Availability (HA)

One node in the cluster fails, and its tasks can be passed to other nodes. Can prevent single point of failure

3. High Performance

Load Balancing cluster allows the system to access more users simultaneously

4. High performance price ratio

High-performance systems can be constructed using inexpensive, composite industry-standard hardware.

III. Classification of cluster systems

Although, according to the different characteristics of cluster system can have a variety of classification methods, but generally divides the cluster system into three categories:

1. Highly available (high availability) clusters, referred to as HA clusters.

Such clusters are dedicated to providing highly reliable services. is to use the fault tolerance of the cluster system to provide 7 * 24-hour uninterrupted service, such as high-availability file server, database services and other key applications.

2. Load Balancing cluster
So that the task can be distributed in the cluster as evenly as possible on different computer processing, take full advantage of the processing capacity of the cluster, improve the processing efficiency of the task. These types of clusters may be mixed in real-world applications to provide a more stable service, such as a highly available network file system and highly available network services in a cluster that uses network traffic load balancing.

3. Performance computing (high perfervidmance Computing) clusters, referred to as HPC clusters, are also known as compute clusters.

Running in this cluster is a specially developed parallel application that can divide the data of a problem into multiple computers and use the common resources of these computers to accomplish the task, which can solve the work that the single machine is not capable of doing (if the problem is too large and the machine is too slow to compute).

Such clusters are dedicated to providing powerful computing power that a single computer cannot provide. such as weather forecasts, oil exploration and reservoir simulation, molecular simulation, bio-computing and so on.

Iv. What is high availability (HA)

The availability of computer systems (availability) is measured by system reliability (reliability) and maintainability (maintainability). The average trouble-free time (MTTF) is usually used to measure the reliability of the system, and the maintainability of the system is measured by mean time (MTTR). The availability is then defined as: mttf/(mttf+mttr) *100%

V. High availability of load balancing servers

In order to shield the load Balancer server from failing, a backup machine needs to be established. The High availability monitor is run on both the primary server and the backup machine to monitor the health of each other by transmitting information such as "I am Alive". When the backup machine is unable to receive such information in a certain amount of time, it takes over the primary server IP and continues to provide the service, and when the backup manager receives the "I AM Alive" message from the manager, he releases the IP address, and the manager starts the cluster management work again. In the case of failure of the primary server, the system can work properly, we in the main, the backup machine to implement the load cluster system configuration information synchronization and backup, to maintain the basic consistency between the two systems.

Six, ha fault-tolerant redundancy operation process

1. Automatic detection (Auto-detect) stage

By the software on the host through the redundant detection line, through the complex listener program. Logical judgment, detect each other's operation, the items inspected are: Host hardware (CPU and peripheral), host network, host operating system, data engine and other applications, host and disk array connection. To ensure the correctness of the detection, and to prevent false judgment, can set the security detection time, including detection time interval, the number of detection to adjust the safety factor, and by the host's redundant communication line, The collected information is recorded, for maintenance reference.

2. Auto-switch (auto-switch) stage

If a host fails to confirm the other, the normal host will continue to perform the original task, and then take over the pre-defined redundancy procedures and follow up the procedures and services according to various fault-tolerant redundancy modes.

3. Auto-recovery (auto-recovery) phase

After the normal host replaces the fault machine, the fault machine can be repaired offline. After the failure of the host repair, through the redundant communication line and the original host connection, automatically switch back to the repair completed on the host. The whole reply process complete has edi-ha automatic completion, may also rely on the pre-configuration, chooses the reply action to be semi-automatic or does not reply.

Seven, Ha three ways of working

1. Master-Slave mode (asymmetric mode)

Working principle: The host work, the standby machine is in the monitoring condition; When the host is down, the standby machine takes over all the work of the host, after the host returns to normal, according to the user's settings to automatically or manually switch services to the host, the consistency of data through the shared storage system to solve.

2. Dual-machine Duplex mode (Mutual assistance)

How it works: two hosts simultaneously running their own service work and mutual detection, when any one host down, another host immediately take over all its work, to ensure that the work in real-time, application services system Key data stored in the shared storage system.

3. How the cluster works (multi-server interoperability)

How it works: Multiple hosts work together, each of which runs one or several services, each defining one or more alternate hosts, and when a host fails, the services running on it can be taken over by other hosts.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.