Linux High Availability Cluster (HA) rationale (reproduced)

Source: Internet
Author: User
Tags switches

I. What is a highly available cluster

A highly available cluster is when one node or server fails, another node automatically and immediately provides services, and the resources on the failed node are transferred to another node, so that the other node has resources that can provide services to the outside. When a highly available cluster is used for a single node failure, the ability to automatically switch resources and services can ensure that the service remains online. In this process, it is transparent to the client.

Ii. metrics for high-availability clusters

High-availability clusters are generally measured by system reliability (reliability) and System maintainability (maintainability). The reliability of the system is usually measured with mean time-out (MTTF), and the maintainability of the system is measured by mean maintenance time (MTTR). Therefore, a highly available Cluster service can be defined like this: ha=mttf/(mttf+mttr) *100%.

There are several criteria for generally high-availability clusters:

99%: Less than 4 days of downtime for one year

99.9%: Less than 10 hours of downtime for one year

99.99%: Less than 1 hours of downtime for one year

99.999%: Less than 6 minutes of downtime for one year

Three, three ways of high-availability clusters

There are three ways to implement a highly available cluster:

(1), master-slave mode (asymmetric)

A highly available cluster that is formed in this way typically consists of 2 nodes and one or more servers, one as the primary node (active) and the other as a backup node (Standy). The backup node detects the health of the primary node at any time, and when the primary node fails, the service automatically switches to the backup node to ensure that the service is running properly.

In this way the high-availability cluster where the backup node usually does not start the service, only in the event of a failure will be useful, so it feels more wasteful.

(2), Symmetrical way

This approach typically consists of 2 nodes and one or more services, each of which runs different services and is backed up by each other, and two nodes detect each other's health, so that when one of the nodes fails, the service on that node automatically switches to the other node. This ensures that the service runs correctly.

(3), multi-machine mode

This cluster contains multiple nodes and multiple services. Each node may run and not run the service, and each server monitors several specified services, and when one of the nodes fails, it automatically switches to a node in the set of servers.

Iv. components of a highly available cluster

Implementing a highly available cluster requires the following components:

1, Messaging layer: Can be understood as the information layer, the main role is to pass the heartbeat information of the current node, and inform the other side, so that the other party knows whether the other nodes online. If you are not in a line, you can implement a resource transfer so that the other node can act as the primary node and provide the service as expected. Passing heartbeat information is usually connected using a heartbeat line, which can be connected using a serial interface or an Ethernet interface. Each node contains an information layer.

The software that can provide this component is:

(1), Heartbeat

Heartbeat has three versions of Heartbeat V1, Heartbeat v2 and Heartbeat v3

Heartbeat V1 is the older version, Heartbeat v2 is the current stable version, which is used when doing experiments.

(2), Corosync(Openais sub-project)

(3), keepalive

(4), Cman

Heartbeat is a more commonly used software, keepalived configuration is relatively simple, and ultramonkey seems not commonly used, corosync than Heartbeat function is also strong, more rich features .
The following three kinds of experimental process will be used to

2, Crm:cluster Resource messager, this component is called the Resource Manager, it is mainly used to provide high availability for those services that do not have high availability. It needs to work with the messaging layer, so it works on the messaging layer layer. The main work of the resource manager is to determine the start, stop, and resource transfer, resource definition, and resource allocation of a service based on the health information passed by the messaging layer. Each node contains a CRM, and each CRM maintains the CIB (Cluster Internet base, cluster repository), only the CIB on the master node can be modified, and the CIB on the other nodes is copied from the master node. Components such as LRM and DCs are also included in CRM.

The software that can provide CRM is:

Heartbeat V1 comes with resource management for Haresource

Heartbeat V2 's own resource management has Haresource and CRM

Where CRM because the configuration file is XML format, most people if they do not understand the syntax format, there may be configuration errors. So CRM provides a listening port that can be configured with other GUI tools to manage the cluster

Heartbeat V3 After the explorer is independent, not as part of the Heartbeat, its name is pacemaker features exceptionally powerful, but also provides a command line tool to manage the cluster.

Cman is a resource manager developed by Red Hat in red Hat 5. X version may be encountered, 6.x version after Red Hat also began to use the powerful pacemaker

3. Lrm:local Resource messager, called a local resource manager, is a subcomponent of CRM that is used to obtain the state of a resource and to manage local resources. For example, when a heartbeat message is detected, the local service is started.

4, DC: Can be understood as a transaction coordinator, this is when multiple nodes are not receiving each other's heartbeat information, so that each node will think that the other has failed, so it will produce dust split status (group). And all of the services are running, so there will be resource contention. As a result, the coordinator of the transaction arose in this context. The transaction Coordinator determines which nodes start the service and which nodes stop the service, based on the number of legal votes per group. For example, a highly available cluster has 3 nodes, of which 2 nodes can pass the heartbeat information normally, and the other node cannot pass the heartbeat information to each other, so that 3 nodes are divided into 2 groups, each of which will elect a DC to collect the transaction information of the cluster in each group and form the CIB, and synchronized to each cluster node. At the same time, the DC will also count the number of legal votes per group (quorum), when the number of legal votes in the group is greater than one-second, it means to start the service on the group node, otherwise stop the service on the node. For some of the more powerful nodes, it can cast multiple tickets, so the legal votes per node is not only one vote, it needs to be determined according to the performance of the server. The DC is generally located on the primary node.

5, PE and TE

PE and TE are also sub-components of DC, where:

PE: The policy engine, which defines a set of transfer methods for resource transfers, but only as a strategist who does not personally participate in the process of resource transfer, but rather allows te to execute its own strategy.

TE (Transition Engine): is to execute the strategy made by PE and only run PE and TE on DC.

6. STONITHD Components
STONITH (Shoot the other node in the head, "headshot"), this way directly to operate the power switch, when a node fails, the other node if it can detect, will be issued through the Network command, control the fault node power switch, through the temporary power, And power-up means that the faulty node is restarted or is directly powered off, which requires hardware support.

If the backup node does not receive the heartbeat information of the primary node at a certain point, then if the backup node immediately seizes the resource at this point, and the primary node is performing a write operation at this time, the backup node, once also performing the corresponding write operation, can cause file system confusion or server crash. Therefore, resource isolation mechanisms can be used to prevent such events from occurring when resources are preempted. We often use STONITHD (that is, headshot) to keep the primary node from seizing resources.

Where resource isolation includes:

(1), node level

Using STONITHD devices to achieve

(2), resource level

For example, using FC SAN switch enables you to deny access to a node at the storage resource level

7. Shared storage

Some services, such as HTTP, MySQL, and so on, need to share some data so that when different nodes are used to access the storage device, the correct information can be returned. If you do not use a storage device, assuming that the HTTP service is an example, when a customer wants to access a picture, if the image is placed on a specific server, once the server is hung up, the HTTP service switches to another device, and the other device does not have the picture on it. Then the user will not be able to access the picture at this time, of course, this situation is we do not want to see. To solve this kind of event, you can use a shared storage device to put the relevant data on a shared device, so that no matter where the server hangs, it will not affect the user's access.

Common shared storage devices are available in the following three ways:

Das:direct attached Storage, direct attach Storage

Nas:network attached Storage, network attached storage

San:storage area Network, storage areas networks

Therefore, the component architecture for a highly available cluster service is probably the same:

8. Resources

There are many places in the above to talk about resources, then what is a resource? What are the resources needed to achieve a high level of availability?

In fact, resources are the sub-projects that are required to start a service. For example, to start a httpd service that requires IP, a service script, and a file system (to store data), we can collectively refer to it as a resource. Therefore, achieving a highly available cluster typically requires

IP, services (scripts) and file systems (storing data), of course, some high-availability clusters do not require storage devices.

There are also types of resources that can be divided into such categories as:

(1), primitive: can understand the main resources, sometimes see will be native, is a meaning, the resource is only on the master node. (Of course, once the backup node takes the resource, it becomes the master node, so the master node is relatively)

(2), Group: resource, bind multiple resources on one group and run on the same node.

(3), clone: is to clone the primitive resources N parts and run on each node

(4), Master/slave: Also will primitive clone 2 copies, wherein Master and slave node each runs one copy, and can only run on these 2 nodes.

For some cluster services, the startup-related resources are sequential. For example, to start a MySQL Cluster service, you should first mount a shared storage device, or the instant MySQL service starts up, and the user cannot access the data. So, generally, we need to constrain resources. There are several types of resource constraints:

(1), Position constraint (location): the degree to which a resource tends to be a node, usually defined by a fraction (score), which indicates that the resource tends to be associated with this node when score is positive, and a negative value indicates that the resource tends to flee from this node. You can also define score as-inf (negative infinity) and inf (positive infinity). For example: There are three nodes rs1, RS2, RS3 when the rs1 is the primary node and fails, the RS2 value of RS3 and score is compared, and who is positive, which node the resource will be transferred to.

(2), permutation constraint (colocation): Used to define whether a resource can be together, usually using a score. When the score is positive, the resource can be together; otherwise, it cannot be together. You can also bind all resources together by defining a resource type of group.

(3), Order constraint (order): Used to define the order in which resources start and stop. For example, you should first mount the shared storage and start the httpd or mysqld service.

Resource stickiness: Used to define whether a resource prefers to remain on that node. Typically, score is used to define when score is positive to remain in the current node, and negative numbers indicate that it is not happy to remain in the current node.

When a highly available cluster contains resource stickiness and a location constraint, once the node fails, the resource is transferred to the other node. However, when the previous node is back to normal, it is necessary to compare all of the resource stickiness and the sum of all the constraints and who is the big one, so that the resources will remain on the big side.

Resource transfer

The VIP of the faulty node is set to another node, and the corresponding service is enabled on the other node, the corresponding storage device is mounted, and so on, it can be called resource transfer.

9. Resource Agent (Resource agent)

RA is actually responsible for starting the resource, LRM is used to manage the local resources, but cannot start the resource, when the need to start the resource will invoke the RA to start, the RA is a script file, on one node may have more than one RA. Usually on Rhel, not all of the boot system services are script files. Common RA has the following styles:

(1), LSB (Linux standard Base), which is one of our common standards for Linux scripting styles like/etc/init.d/.

(2),OCF (Open Cluster framwork): OCF script is a more powerful script than LSB, supporting more parameters

Generally speaking, building a highly available Cluster service requires the above components to complete.

Linux High Availability Cluster (HA) rationale (reproduced)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.