Linux High Availability (HA) cluster basic concepts

Source: Internet
Author: User

Outline
Definition of a highly available cluster
Ii. metrics for high-availability clusters
III. hierarchical structure of highly available clusters
Iv. classification of high-availability clusters
Five, high-availability cluster common software
Vi. shared storage
Vii. cluster file system and cluster LVM
How high-availability clusters work

Recommended reading:

CentOS 6.3 Under Drbd+heartbeat+nfs configuration note http://www.linuxidc.com/Linux/2013-06/85599.htm

Heartbeat_ldirector+lb+nfs for HA and LB, file sharing http://www.linuxidc.com/Linux/2013-06/85292.htm

HEARTBEAT+DRBD+NFS Environment Deployment Http://www.linuxidc.com/Linux/2013-01/78619.htm

Installation and configuration of CentOS 6.3 under HEARTBEAT+DRBD http://www.linuxidc.com/Linux/2012-12/76141.htm

definition of a highly available cluster

High-availability cluster, the English text for the higher Availability Cluster, abbreviated Hacluster, simply said, the cluster (Cluster) is a group of computers, as a whole to provide users with a set of network resources.  These individual computer systems are nodes of the cluster. The emergence of high-availability clusters is intended to make the overall service of the cluster as usable as possible, thereby reducing the loss caused by computer hardware and software error-prone. If a node fails, its redundancy node will take over its responsibilities within a few seconds. As a result, the cluster will never stop for the user.
The main function of high-availability cluster software is to realize the automation of fault checking and business switching. A highly available cluster of only two nodes is also known as a dual-machine hot standby, even with two servers backing up each other. When one server fails, a service task can be performed by another server, which automatically ensures that the system can continue to provide services without human intervention. Dual-Machine hot standby is only a high-availability cluster, high-availability cluster system can support more than two nodes, provide more than the dual-machine hot standby more advanced features, more to meet the needs of users constantly changing.

Ii. metrics for high-availability clusters

HA (High Available), highly available clusters are measured by system reliability (reliability) and maintainability (maintainability). In engineering, the reliability of the system is usually measured with mean time-free (MTTF), and the maintainability of the system is measured by mean time to repair (MTTR). The availability is then defined as: ha=mttf/(mttf+mttr) *100%
Specific HA measurement criteria:
99% downtime of less than 4 days a year

99.9% downtime of less than 10 hours a year

99.99% downtime of less than 1 hours a year

99.999% downtime less than 6 minutes a year

III. hierarchical structure of highly available clusters

Description: A highly available cluster can be divided into three hierarchies, respectively, by the red part of the messaging and membership layer, the blue part of the cluster Resource Manager (CRM) layer, the green part of the local Resource Manager (LRM) and Resource Agent (RA), let's specify (for example),
1. At the bottom of the information and Membership layer (Messaging and membership), Messaging is primarily used to pass heartbeat information between nodes, also known as the heartbeat layer. Transmission of heartbeat information between nodes can be broadcast, multicast, unicast and so on. Membership (membership) layer, the most important function of this layer is the information provided by the messaging layer by the primary node (DC) through cluster Consensus menbership service (CCM or CCS). To produce a complete membership. This layer is mainly to achieve the role of connecting, bearing, the lower layer of information production member diagram passed to the upper layer to notify the working state of each node; The upper layer is specifically implemented to isolate a particular device.
2. The Cluster resource management layer (Cluster Resource Manager) truly implements the tier of the Cluster service. Each node in the layer runs a cluster resource manager (Crm,cluster Resource Manager), which provides core components for high availability, including resource definitions, attributes, and so on. On each node CRM maintains a CIB (cluster repository XML document) and LRM (local resource manager) components. For the CIB, only documents that work on the DC (master node) can be modified, and the other CIB is copied from that document on the DC. For LRM, it is the specific executor who performs the execution and stop of a resource on-premises executed by CRM. When a node fails, it is up to the DC through the PE (Policy engine) and TE (Implementation engine) to decide whether to rob the resource.
3. Resource Agent layer (Resource Agents), cluster resource agent (a script capable of managing the start, stop, and status information of a resource belonging to a cluster resource on this node), the resource agent is divided into: LSB (/etc/init.d/*), OCF (more professional than LSB, more general) , Legacy Heartbeat (V1 version of resource management).

A specific description of the core component (e.g.):
1.CCM component (Cluster Consensus menbership Service): function, link, monitor the underlying heartbeat information, when the heartbeat information is not detected, the entire cluster of votes and convergence status information, and the results are forwarded to the upper layer, To make decisions about what to do, CCM can also generate a topology overview of the state of each node, taking this node as a perspective to ensure that the node can take corresponding actions in special cases.
2.CRMD components (Cluster Resource Manager, cluster resource Manager, also known as pacemaker): to achieve the allocation of resources, each action of resource allocation through the CRM to achieve, is the core of the formation, The CRM on each node maintains a CIB to define resource-specific properties and which resources are defined on the same node.
3.CIB components (cluster information base library, Cluster infonation base): is an XML-formatted configuration file, a configuration file of a clustered resource in memory in an XML format, primarily saved in a file, is resident in memory at work, and needs to be notified to other nodes, Only the CIB on the DC can be modified, and the CIB on the other nodes is on the copy DC. There are methods for configuring the CIB file, based on the command line configuration and the foreground-based graphical interface configuration.
4.LRMD Component (local Resource manager, native Explorer): Used to obtain the state of a local resource, and to implement local resource management, such as when the heartbeat information is detected, to start the local service process, and so on.
5.pengine components:
PE: The policy engine, which defines a set of transfer methods for resource transfers, but only as a strategist who does not personally participate in the process of resource transfer, but rather allows te to execute its own strategy.

TE (Transition Engine): is to execute the strategy made by PE and only run PE and TE on DC.

6.STONITHD components
STONITH (Shoot the other node in the head, "headshot"), this way directly to operate the power switch, when a node fails, the other node if it can detect, will be issued through the Network command, control the fault node power switch, through the temporary power, And power-up means that the failed node is restarted, which requires hardware support.
Stonith Application Case (master-slave server), the primary server at one end of the time due to busy service, no time to respond to heartbeat information, if this time the standby server to take the service resources at once, but this time the main server has not been down, which will lead to resource preemption, In this way, users can access the master-slave server, if only the read operation is OK, if there is write operation, it will lead to file system crashes, so all play, so in the resource preemption, you can use a certain isolation method to achieve, is the standby server to seize resources, Directly to the main server to Stonith, is what we often say "headshot."

Iv. Classification of high-availability clusters

1. Two-Machine hot-standby (active/passive)
Official note: Two-node active/passive clusters using Pacemaker and DRBD is a cost-effective solution for many high availability Si Tuations.

2. Multi-node hot standby (n+1)
Official note: By supporting many nodes, Pacemaker can dramatically reduce hardware costs by allowing several active/passive cluster s to be combined and share a common backup node.

3. Multi-node shared storage (N-TO-N)
Official note: When GKFX storage is available, every node can potentially was used for failover. Pacemaker can even run multiple copies of services to spread out the workload.

4. Shared storage hot standby (Split Site)
Official note: Pacemaker 1.2 would include enhancements to simplify the creation of split-site clusters.

v. High-availability cluster software


Messaging and Membership layer (information and relationship layers):
Heartbeat (V1,v2,v3), Heartbeat v3 split heartbeat pacemaker Cluster-glue

Corosync

Cman

Keepalived

Ultramokey

Cluster Resource Manager Layer (resource management, for short: CRM):
HARESOURCE,CRM (Heartbeat v1/v2)

Pacemaker (Heartbeat V3/corosync)

Rgmanager (Cman)

Common combinations:
Heartbeat V2+haresource (or CRM) (description: Typically used in CentOS 5.X)

Heartbeat V3+pacemaker (description: Usually used in CentOS 6.X)

Corosync+pacemaker (Description: The most commonly used combination now)

Cman + rgmanager (description: Components in Red Hat cluster suite, also including GFS2,CLVM)

Keepalived+lvs (Description: Highly available for LVS)

Summary: We often see in the technical blog, heartbeat+pacemaker to achieve high availability of MySQL, or corosync+pacemaker to achieve high availability of MySQL, and some Bo friends will ask, we exactly what good? After the above instructions, you should have some understanding!

Vi. shared storage

When it comes to clustering, we have to say that shared storage, because not managed is web high availability Also, MySQL high availability, their data are shared on one copy, all must be placed in the shared storage, the master node can access, from the node can also be accessed. Let's briefly talk about shared storage.
1.DAS: (direct attached storage) directly attached storage
Description: The device directly connected to the host bus, the distance is limited, but also to re-mount, there is a delay between data transmission
RAID Array

SCSI Array

2.NAS: (Network attached storage) networked attached storage
Description: File-level sharing
Nfs

Ftp

Cifs

3.SAN: (storage area Network) storage Zone networks
Description: Block-level, emulated SCSI protocol
FC Optical Network (switch optical interface Super expensive, a nearly 20,000, if you use this, the price is too high)

Ipsan (iSCSI) Access fast, block-level, inexpensive

vii. cluster file system and cluster LVM (cluster logical Volume management CLVM)

Cluster file system: GFS2, OCFS2
Cluster LVM:CLVM
Note: Typically used in highly available dual-master models (e.g.)

how high-availability clusters work


Description: Here the main/slave node high can be used to illustrate the principle of operation.
The primary server and the server to build a dual-machine hot standby, basically sharing a storage, for example, MySQL. Typically, the database files are mounted on the primary database server, and the user connects to the primary server for database operations. When the primary server fails, the database file is automatically mounted from the server and takes over the work of the primary server. The user is not notified of the operation by connecting to the database file from the database. After the failure of the primary server repair, you can re-provide services;
Then, from the server is how to know the main server hangs, this will use a certain detection mechanism, such as heartbeat detection, that is, each node will periodically notify the other nodes of their heartbeat information, especially the primary server, if from the server in a few heart cycles (can set their own heartbeat cycle) has not been detected, Think that the main server is down, and this period in the notification heartbeat information of course can not use TCP transmission, if the use of TCP detection, but also after three handshake, and so on, after a few heartbeat cycle, so in the heartbeat information when the use of UDP port 694来 for transmitting information, If the primary server at one end of the time due to busy service, no time to respond to heartbeat information, this time from the server if the main service resources to rob the past (shared data files), but this time the main server has not been down, this will lead to resource preemption, so that users can access the master and slave, if only the read operation is OK, If there is a write operation, it will cause the file system crashes, so that all play, so in the resource preemption, you can use a certain isolation method to achieve, that is, from the server to seize resources, directly to the main server to "STONITH", is what we often say "headshot";
So, how do we detect heartbeat information? is through the heartbeat line to detect. Heartbeat running on the slave server can detect the running state of the primary server over an Ethernet connection and automatically take over the resources of the primary server once it cannot detect the "heartbeat" of the primary server. Typically, the heartbeat connection between the primary and slave servers is a separate physical connection, which can be a serial cable, an Ethernet connection implemented by a crossover line. Heartbeat can even detect the working state of the primary server through multiple physical connections, and the primary server is considered to be in a normal state as long as it can receive information about the active state of the primary server through one of the connections. From a practical point of view, it is recommended to configure multiple independent physical connections for heartbeat to avoid a single point of failure in the heartbeat communication line itself.
In the above principle we mentioned the "Isolation method", below we say that there are two methods of isolation, one is node isolation, the other is resource isolation. Node isolation is what we often say Stonith (Shoot the other node in the head, commonly known as "headshot"), meaning is directly cut off the power supply; The common method is that all nodes are connected to a power switch, if there is a failure, it directly causes the voltage instability of the node, or power down, and let the faulty node restart or shut down. (such as), and resource isolation, is fencing directly to the interception of certain resources.

Here we say the type of "heart Route", one is the serial cable, the other is we often see the Ethernet cable (crossover twisted pair), they have advantages and disadvantages, the serial cable, is considered to be more than the security of the Ethernet connection, the connection method is slightly better, Because hacker cannot run programs such as Telnet, ssh, or rsh through a serial connection, it can reduce the chance that it will invade the backup server again through a hijacked server. However, the serial cable is limited by the available length, so the primary and standby servers must be very short distances. Ethernet cable Connection, this way can eliminate the length of the serial cable limitations, and this connection can be used to synchronize the file system between the master and slave servers, thereby reducing the use of normal communication connection bandwidth. such as

Source: http://www.linuxidc.com/Linux/2013-08/88522.htm

Reference Documentation:
Http://www.linux-ha.org/wiki/Main_Page
Http://clusterlabs.org/wiki/Main_Page
Http://opencf.org/home.html

Linux High Availability (HA) cluster basic concepts

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.