The concept of cluster and high availability (HA), clusterha

Last Update:2017-10-20 Source: Internet

Author: User

Tags node server

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The concept of cluster and high availability (HA), clusterha

Reposted from: http://www.cnblogs.com/blackwizard2016/p/5143816.html, which is used only for learning.

1.1 What isCluster

To put it simply,Cluster (ClusterIs a group of computers,They provide users with a set of network resources as a whole. TheseA single computer system is a cluster node ). An ideal cluster is that the user will never realize the underlying node of the cluster system. In his/her view, the cluster is a system, not multiple computer systems..And the cluster systemManagementYou can add or delete nodes in the cluster system at will.

In more detail, a cluster (a group of computers working collaboratively) is an important concept to make full use of computing resources,Because it can migrate workloads from an overloaded system (or node) to another system in the cluster. Its processing capability is comparable to that of dedicated computers (minicomputers and mainframes), but its cost effectiveness is higher than that of dedicated computers.Common Hardware:: Node, network, storage, software: cluster system, node system, application support software.

ClusterClusterTechnology can be defined as follows: a group of independent servers act as a single system in the network and are managed in a single system mode. This single system provides high reliability services for the customer workstation.In most modes, all the computers in the cluster share a common name,Services running on any system in the cluster can be used by all network customers. The Cluster must be able to coordinate and manage the errors and failures of the separated components, and add components to the Cluster transparently.A Cluster contains multiple (at least two) servers with shared data storage space.When any server runs an application, application data is stored in the shared data space. The Operating System and Application Files of each server are stored in their local storage space. Each node server in the Cluster communicates with each other through an internal LAN.When a node server fails, the applications running on this server will be automatically taken over on another node server. When an application service fails, the application service will be restarted or taken over by another server. When any of the above faults occurs, the customer will be able to quickly connect to the new application service.

1.2 Main advantages of the cluster system:

(1) high scalability:

(2) HA: a node in the cluster fails, and its tasks can be passed to other nodes. It can effectively prevent single point of failure.

(3) high performance:LoadThe balanced cluster allows the system to access more users at the same time.

(4) Cost-effective: high-performance systems can be constructed using cheap hardware that meets industrial standards.

2.1 Classification of Cluster Systems

Although there are multiple classification methods based on different features of the cluster systemCluster systems are generally divided into two types:

(1) High Availability cluster (HA cluster.

These clusters are dedicated to providing highly reliable services.YesThe cluster system fault tolerance is used to provide external services for 74 hours.* 2 uninterrupted services,Such as highly available file servers, database services, and other key applications.

Server Load balancerCluster:This allows tasks to be evenly distributed among different computers in the cluster, making full use of the Cluster's processing capabilities to improve the processing efficiency of tasks.

In practical applications, these cluster types may be used together to provide more efficient and stable services.For example, in a networkTrafficA server Load balancer cluster contains a high-availability network file system and a high-availability network service.

(2) High-performance Computing (HPC) clusters, also known as scientific Computing clusters.

This cluster runs a specially developed parallel application that distributes problematic data to multiple computers and uses the resources of these computers to complete computing tasks, this can solve the problem that a single machine cannot perform (for example, the problem scale is too large and the computing speed of a single machine is too slow ).

These clusters are designed to provide powerful computing capabilities that a single computer cannot provide. Such as weather forecast, oil exploration and reservoir simulation, molecular simulation, and biological computing.

3.1 What is high availability (HA)

Computer SystemAvailability)Through the systemReliability)And Maintenance(Maintainability).Generally, MTTF is used to measure the reliability of the system, and MTTR is used to measure the maintainability of the system.ThereforeAvailability is defined as: MTTF/(MTTF + MTTR) * 100%

The high availability of Server Load balancer requires a backup machine to prevent the failure of Server Load balancer. The master server and Backup Server Run the High Availability monitoring program to monitor the running status of each other by transmitting information such as "I am alive. When the backup machine cannot receive such information within a certain period of time, it takes over the service IP address of the master server and continues to provide services; when the backup manager receives the message "I am alive" from the master manager, it releases the service IP address, and the master manager starts to manage the cluster again.To ensure that the system works properly when the primary server fails, we can synchronize and back up the configuration information of the Server Load balancer system between the primary and backup servers to ensure the basic consistency between the two systems.

HA fault tolerance and backup operations

In the Auto-Detect phase, the software on the host uses redundant detection lines through complex listeners. Logical judgment to detect the running status of each other,The checked items include host hardware (CPU and perimeter), host network, host operating system, database engine and other applications, and connection between hosts and disk arrays. To ensure the correctness of the detection and prevent misjudgment, you can setSecurityDetection time, including detection interval, number of detection times to adjust the security factor, and the redundant communication connections of the host will record the collected information for maintenance reference.

FromAuto-Switch stageIf a host confirms the fault of the other party, the normal host will not only continue with the original task, but also take over the pre-configured backup and support operations based on various fault tolerance and backup modes, and subsequent procedures and services.

Auto-Recovery phaseAfter a normal host works on behalf of a faulty host, the faulty host can be repaired offline. After the faulty host is repaired, it connects to the original normal host through redundant communication lines and automatically switches back to the repaired host.The entire reply process is automatically completed by the EDI-HA, you can also choose to reply to the semi-automatic action or do not reply according to the pre-configuration.

3.2. Three Methods of Work for HA:

(1) master-slave mode (Asymmetric mode)

Working principle:When the host is down, the slave host takes over all the work of the host. After the host returns to normal, switch the service to the host automatically or manually according to user settings,Data Consistency through sharingStorageSystem solution.

(2) Dual-Host Mode (mutual backup)

Working principle:The two hosts run their respective services at the same time and monitor each other. When either host goes down, the other host immediately takes over all of its work to ensure real-time work,Key data of the application service system is stored in the shared storage system.

(3) cluster work mode (multi-server mutual backup mode)

Working principle:Multiple Hosts work together to run one or more services, each defining one or more backup hosts for the service. When a host fails, services running on them can be taken over by other hosts.

When a person cannot find a way out, the best way is to achieve the ultimate in what can be done now, so that no one can do it.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More