COROSYNC+PACEMAKER+MYSQL+DRBD enables high availability of MySQL

Last Update:2015-10-31 Source: Internet

Author: User

Tags failover node server

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

DBRD can achieve data synchronization, DRBD is generally a master from, all read and write operations, Mount can only be performed on the master node. DRBD Master node can be swapped.
The Corosync+pacemaker implementation of the DRBD master-slave point can be automatically switched to the slave node, and the failover continues to provide service.
MySQL data can therefore be placed on a DRBD block.

DRBD:DRBD: (Distributed replication block device) is a distributed replication block appliance. It works by: on a host on the specified disk device write request, the data sent to a host of kernel, and then through the kernel in a module, the same data to the B host of kernel, and then the B host then write their own designated disk device, Thus the synchronization of the two host data is realized, and the write operation is highly available. Similar to RAID1, to achieve the image of the data, DRBD is generally a master from, and all the read and write operations, Mount can only be performed on the primary node server, but between the master and slave DRBD server can be exchanged.

the relationship between the components: in fact, MySQL and DRBD do not have a half-dime relationship, and the combination of DRBD and MySQL has a very important role, because DRBD implementation of data mirroring, when the master node of DRBD is hung, the auxiliary node of DRBD can also provide services, However, the primary node does not actively switch to the secondary node, so the high-availability cluster comes in handy, because the resource is defined as a highly available resource, after the primary node fails, the high-availability cluster can automatically switch to the secondary node and failover continues to provide services.

1. Definition of highly available clusters

A highly available cluster is a server cluster technology that is designed to reduce service outages (such as service outages due to server outages). Simply put, a cluster is a group of computers that provide users with a set of network resources as a whole. These individual computer systems are the nodes of the cluster.

The advent of high-availability clusters is to reduce the loss of computer hardware and software error-prone. It minimizes the impact of software/hardware/human-caused failures on the business by protecting users ' business processes from uninterrupted service delivery. If a node fails, its redundancy node will take over its responsibilities within a few seconds. As a result, the cluster will never stop for the user. The main function of high-availability cluster software is to realize the automation of fault checking and business switching.

2, the structure of high-availability cluster

To implement and configure a highly available cluster, it is important to understand the structure of a highly available cluster, from bottom-up to three-tier structure

First high-availability cluster structure diagram

650) this.width=650; "title=" 1, high-availability cluster structure. PNG "alt=" Wkiom1ntghhqnytkaaiwwlo58ms938.png "src=" http://s3.51cto.com/wyfs02/M01/24/91/ Wkiom1ntghhqnytkaaiwwlo58ms938.png "width=" 650 "/>

1) Messaging Layer

Information layer, heartbeat information Transport layer, which is a process running on each host

The Corosync to speak today is the one that runs on this floor.

2) Crm,cluster Resources Manager

The cluster resource Manager relies on the underlying heartbeat information layer. This layer is because the non-ha_aware software itself does not have the ability to cluster high availability, only through the use of CRM to achieve, and if an application can use the underlying heartbeat information transfer layer function to complete the cluster transaction decision-making software is called Ha_aware.

In this layer, in fact, there is a layer called LRM (local Resource manager) native resource management layer, this layer is really to put the CRM layer of decision-making to implement the level, like, the CRM layer is the chairman of the company, LRM is the general manager, CRM is responsible for the entire company's vision planning and strategy implementation, and then assigned to the general Manager (LRM) to implement, the general manager is apportioned to the following boys (RA) to complete, these can also be intuitively displayed in the

Pacemaker is part of this layer, and pacemaker's configuration Interface for CRM (SUSE), so we need to install the installation Crmsh

3) Ra,resource Agent

A resource agent is the ability to receive a schedule of CRM, a tool for implementing management of a resource on a node, usually a script

(1) Heartbeat legacy

Traditional type of heartbeat, listening on UDP port 694

(2) Lsb,linux Standard base

Those scripts in/etc/rc.d/init.d/* belong to the LSB.

(3) Ocf,open Cluster Framework

Open cluster architecture, the organization that provides the resource proxy script is called Provider,pacemaker is one of them provider

(4) STONITH

Shoot the other node in the head, this type of RA is primarily isolated from nodes and is designed to be durable for configuring stonith devices.

The main purpose of the use of stonith is to avoid due to network reasons, the nodes can not complete communication (such as divided into two parts, left 3, the right 2), the left 3 can receive their heartbeat information, the right 2 can also receive their own heartbeat information, Is the left part and the right part of the heartbeat information can not be received, so they each think each other fault, they will be re-elected to a DC (designated Coordinator), resulting in two clusters, which led to resource contention If both sides write data to their shared storage, it is likely that the file system will crash, a phenomenon called cluster splitting (brain-split).

In order to avoid the fragmentation of the cluster, there are legal votes (quorum, votes > Half of the number of votes in the cluster to meet the legal votes ), that is, in the event of cluster communication failure, in order to avoid resource preemption, should let one party abandon to become a cluster, specifically should which party give up? This is the result of the vote, only the party with the legal votes can be qualified as a cluster, the opposite party should exit the cluster, but it does not mean that the service stopped, so should let it release resources, power off. The Stonith device is used here, to let the device that exits the cluster completely fail, the power switch is this principle

And if a cluster has only two nodes, this is a special cluster, in case the cluster split, they may not have the legal votes, it is conceivable that the resources will not be transferred, resulting in the entire resource failure, because there is no arbitration equipment

Say so much, just to illustrate the following two more important concepts

①, Corosync is enabled by default Stonith, and the cluster we are configuring does not have a stonith device, so it is disabled when configuring global properties for the cluster

②, when a cluster has no legal votes, the resources will not be transferred normally, when a node fails, the resources will not be transferred to the normal node, it will cause all the resources are faulted. Therefore, it should be defined when the legal votes are not enough to ignore instead of stopping all resources

Not finished, to be continued ...

COROSYNC+PACEMAKER+MYSQL+DRBD enables high availability of MySQL

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More