HA cluster basic concepts and implementation of highly available clusters

Source: Internet
Author: User

HA cluster high-availability clusters are divided into the following steps:

Dot I-->ha cluster basic concept

Dot I-->heartbeat implement ha

Dot I-->corosync detailed

Dot I-->pacemaker detailed

Dot I-->drbd detailed

Dot Me--heartbeat resource management based on CRM

Dot Me--corosync+pacemaker+drbd+mysql to implement a highly available (HA) MySQL cluster

Dot Me--heartbeat+mysql+nfs to implement a highly available (HA) MySQL cluster


This article is about the first section: HA Cluster Basic concepts

The HA Cluster=high availability Cluster is a highly available cluster.

Simply put, a cluster is a group of computers that provide users with a set of network resources as a whole. The single computer is the node in the cluster.

A highly available cluster of only two nodes is also known as a dual-machine hot standby, even with two servers backing up each other. When one server fails, a service task can be performed by another server, which automatically ensures that the system can continue to provide services without human intervention. Dual-Machine hot standby is only a high-availability cluster, high-availability cluster system can support more than two nodes, provide more than the dual-machine hot standby more advanced features, more to meet the needs of users constantly changing.


Ii. metrics for high-availability clusters
HA (High Available), highly available clusters are measured by system reliability (reliability) and maintainability (maintainability). In engineering, the reliability of the system is usually measured with mean time-free (MTTF), and the maintainability of the system is measured by mean time to repair (MTTR). The availability is then defined as: ha=mttf/(mttf+mttr) *100%
Specific HA measurement criteria:
99% downtime of less than 4 days a year

99.9% downtime of less than 10 hours a year

99.99% downtime of less than 1 hours a year

99.999% downtime less than 6 minutes a year


III. hierarchical structure of highly available clusters

High-availability clusters can be divided into three hierarchies: Messaging & Membership layer, Cluster Resource Manager layer, local resource Manager (LRM) and resource Agent layer composition, let's say the following:

Specific instructions for the core components:

1.CCM component (Cluster Consensus menbership Service): function, link, monitor the underlying heartbeat information, when the heartbeat information is not detected, the entire cluster of votes and convergence status information, and the results are forwarded to the upper layer, To make decisions about what to do, CCM can also generate a topology overview of the state of each node, taking this node as a perspective to ensure that the node can take corresponding actions in special cases.
2.crmd Components (Cluster Resource Manager, cluster resource Manager, i.e. pacemaker): to achieve the allocation of resources, each action of resource allocation through the CRM to achieve, is the core of the formation, The CRM on each node maintains a CIB to define resource-specific properties and which resources are defined on the same node.
3.CIB Component (cluster information base library, Cluster infonation base): is an XML-formatted configuration file, a configuration file of a clustered resource in memory in an XML format, primarily saved in a file, is resident in memory at work, and needs to be notified to other nodes, Only the CIB on the DC can be modified, and the CIB on the other nodes is on the copy DC. There are methods for configuring the CIB file, based on the command line configuration and the foreground-based graphical interface configuration.
4.lrmd Component (local Resource manager, native resource manager): Used to obtain the state of a local resource, and to implement local resource management, such as when the heartbeat information is detected, to start the local service process, and so on.
5.pengine Component:
PE: A policy engine that defines a complete set of transfer methods for resource transfers, but only as a strategist who does not personally participate in the process of resource transfer, but instead lets TE perform its own strategy.

TE (Transition Engine): is to execute the strategy made by PE and only run PE and TE on DC.

6.STONITHD components
STONITH (Shoot the other node in the head, "headshot"), this way directly to operate the power switch, when a node fails, the other node if it can detect, will be issued through the Network command, control the fault node power switch, through the temporary power, And power-up means that the failed node is restarted, which requires hardware support.
Stonith Application Case (master-slave server), the primary server at one end of the time due to busy service, no time to respond to heartbeat information, if this time the standby server to take the service resources at once, but this time the main server has not been down, which will lead to resource preemption, In this way, users can access the master-slave server, if only the read operation is OK, if there is write operation, it will lead to file system crashes, so all play, so in the resource preemption, you can use a certain isolation method to achieve, is the standby server to seize resources, Directly to the main server to Stonith, is what we often say "headshot."


Iv. classification of high-availability clusters
1. Two-Machine hot-standby (active/passive)

2. Multi-node hot standby (n+1)

3. Multi-node shared storage (n-to-n)

4. Shared storage hot standby (Split Site)


V. High-availability cluster software
Messaging and Membership layer (information and relationship layers):
Heartbeat (V1,v2,v3), Heartbeat v3 split heartbeat pacemaker Cluster-glue

Corosync

Cman

Keepalived

Ultramokey

Cluster Resource Manager Layer (resource management, for short: CRM):
HARESOURCE,CRM (Heartbeat v1/v2)

Pacemaker (Heartbeat V3/corosync)

Rgmanager (Cman)

Common combinations:
Heartbeat V2+haresource (or CRM) (description: Typically used in CentOS 5.X)

Heartbeat V3+pacemaker (description: Usually used in CentOS 6.X)

Corosync+pacemaker (Description: The most commonly used combination now)

Cman + rgmanager (description: Components in Red Hat cluster suite, also including GFS2,CLVM)

Keepalived+lvs (Description: Highly available for LVS)

Summary: We often see in the technical blog, heartbeat+pacemaker to achieve high availability of MySQL, or corosync+pacemaker to achieve high availability of MySQL, and some Bo friends will ask, we exactly what good? After the above instructions, you should have some understanding!


Vi. shared storage
When it comes to clustering, we have to say that shared storage, because not managed is web high availability Also, MySQL high availability, their data are shared on one copy, all must be placed in the shared storage, the master node can access, from the node can also be accessed. Let's briefly talk about shared storage.
1.DAS: (direct attached storage) directly attached storage
Description: The device directly connected to the host bus, the distance is limited, but also to re-mount, there is a delay between data transmission
RAID Array

SCSI Array

2.NAS: (Network attached storage) networked attached storage
Description: File-level sharing
Nfs

Ftp

Cifs

3.SAN: (storage area Network) storage Zone networks
Description: Block-level, emulated SCSI protocol
FC Optical Network (switch optical interface Super expensive, a nearly 20,000, if you use this, the price is too high)

Ipsan (iSCSI) Access fast, block-level, inexpensive


Vii. cluster file system and cluster LVM (cluster logical volume management CLVM)
Cluster file system: GFS2, OCFS2
Cluster LVM:CLVM











HA cluster basic concepts and implementation of highly available clusters

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.