HA cluster basic concepts and implementation of highly available clusters

Last Update:2016-01-30 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

HA cluster high-availability clusters are divided into the following steps:

Dot I-->ha cluster basic concept

Dot I-->heartbeat implement ha

Dot I-->corosync detailed

Dot I-->pacemaker detailed

Dot I-->drbd detailed

Dot Me--heartbeat resource management based on CRM

Dot Me--corosync+pacemaker+drbd+mysql to implement a highly available (HA) MySQL cluster

Dot Me--heartbeat+mysql+nfs to implement a highly available (HA) MySQL cluster

This article is about the first section: HA Cluster Basic concepts

The HA Cluster=high availability Cluster is a highly available cluster.

Simply put, a cluster is a group of computers that provide users with a set of network resources as a whole. The single computer is the node in the cluster.

A highly available cluster of only two nodes is also known as a dual-machine hot standby, even with two servers backing up each other. When one server fails, a service task can be performed by another server, which automatically ensures that the system can continue to provide services without human intervention. Dual-Machine hot standby is only a high-availability cluster, high-availability cluster system can support more than two nodes, provide more than the dual-machine hot standby more advanced features, more to meet the needs of users constantly changing.

Ii. metrics for high-availability clusters
HA (High Available), highly available clusters are measured by system reliability (reliability) and maintainability (maintainability). In engineering, the reliability of the system is usually measured with mean time-free (MTTF), and the maintainability of the system is measured by mean time to repair (MTTR). The availability is then defined as: ha=mttf/(mttf+mttr) *100%
Specific HA measurement criteria:
99% downtime of less than 4 days a year

99.9% downtime of less than 10 hours a year

99.99% downtime of less than 1 hours a year

99.999% downtime less than 6 minutes a year

III. hierarchical structure of highly available clusters

High-availability clusters can be divided into three hierarchies: Messaging & Membership layer, Cluster Resource Manager layer, local resource Manager (LRM) and resource Agent layer composition, let's say the following:

Specific instructions for the core components:

1.CCM component (Cluster Consensus menbership Service): function, link, monitor the underlying heartbeat information, when the heartbeat information is not detected, the entire cluster of votes and convergence status information, and the results are forwarded to the upper layer, To make decisions about what to do, CCM can also generate a topology overview of the state of each node, taking this node as a perspective to ensure that the node can take corresponding actions in special cases.
2.crmd Components (Cluster Resource Manager, cluster resource Manager, i.e. pacemaker): to achieve the allocation of resources, each action of resource allocation through the CRM to achieve, is the core of the formation, The CRM on each node maintains a CIB to define resource-specific properties and which resources are defined on the same node.
3.CIB Component (cluster information base library, Cluster infonation base): is an XML-formatted configuration file, a configuration file of a clustered resource in memory in an XML format, primarily saved in a file, is resident in memory at work, and needs to be notified to other nodes, Only the CIB on the DC can be modified, and the CIB on the other nodes is on the copy DC. There are methods for configuring the CIB file, based on the command line configuration and the foreground-based graphical interface configuration.
4.lrmd Component (local Resource manager, native resource manager): Used to obtain the state of a local resource, and to implement local resource management, such as when the heartbeat information is detected, to start the local service process, and so on.
5.pengine Component:
PE: A policy engine that defines a complete set of transfer methods for resource transfers, but only as a strategist who does not personally participate in the process of resource transfer, but instead lets TE perform its own strategy.

TE (Transition Engine): is to execute the strategy made by PE and only run PE and TE on DC.

6.STONITHD components
STONITH (Shoot the other node in the head, "headshot"), this way directly to operate the power switch, when a node fails, the other node if it can detect, will be issued through the Network command, control the fault node power switch, through the temporary power, And power-up means that the failed node is restarted, which requires hardware support.
Stonith Application Case (master-slave server), the primary server at one end of the time due to busy service, no time to respond to heartbeat information, if this time the standby server to take the service resources at once, but this time the main server has not been down, which will lead to resource preemption, In this way, users can access the master-slave server, if only the read operation is OK, if there is write operation, it will lead to file system crashes, so all play, so in the resource preemption, you can use a certain isolation method to achieve, is the standby server to seize resources, Directly to the main server to Stonith, is what we often say "headshot."

Iv. classification of high-availability clusters
1. Two-Machine hot-standby (active/passive)

2. Multi-node hot standby (n+1)

3. Multi-node shared storage (n-to-n)

4. Shared storage hot standby (Split Site)

V. High-availability cluster software
Messaging and Membership layer (information and relationship layers):
Heartbeat (V1,v2,v3), Heartbeat v3 split heartbeat pacemaker Cluster-glue

Corosync

Cman

Keepalived

Ultramokey

Cluster Resource Manager Layer (resource management, for short: CRM):
HARESOURCE,CRM (Heartbeat v1/v2)

Pacemaker (Heartbeat V3/corosync)

Rgmanager (Cman)

Common combinations:
Heartbeat V2+haresource (or CRM) (description: Typically used in CentOS 5.X)

Heartbeat V3+pacemaker (description: Usually used in CentOS 6.X)

Corosync+pacemaker (Description: The most commonly used combination now)

Cman + rgmanager (description: Components in Red Hat cluster suite, also including GFS2,CLVM)

Keepalived+lvs (Description: Highly available for LVS)

Summary: We often see in the technical blog, heartbeat+pacemaker to achieve high availability of MySQL, or corosync+pacemaker to achieve high availability of MySQL, and some Bo friends will ask, we exactly what good? After the above instructions, you should have some understanding!

Vi. shared storage
When it comes to clustering, we have to say that shared storage, because not managed is web high availability Also, MySQL high availability, their data are shared on one copy, all must be placed in the shared storage, the master node can access, from the node can also be accessed. Let's briefly talk about shared storage.
1.DAS: (direct attached storage) directly attached storage
Description: The device directly connected to the host bus, the distance is limited, but also to re-mount, there is a delay between data transmission
RAID Array

SCSI Array

2.NAS: (Network attached storage) networked attached storage
Description: File-level sharing
Nfs

Ftp

Cifs

3.SAN: (storage area Network) storage Zone networks
Description: Block-level, emulated SCSI protocol
FC Optical Network (switch optical interface Super expensive, a nearly 20,000, if you use this, the price is too high)

Ipsan (iSCSI) Access fast, block-level, inexpensive

Vii. cluster file system and cluster LVM (cluster logical volume management CLVM)
Cluster file system: GFS2, OCFS2
Cluster LVM:CLVM

HA cluster basic concepts and implementation of highly available clusters

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

HA cluster basic concepts and implementation of highly available clusters

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

HA cluster basic concepts and implementation of highly available clusters

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support