MySQL performance tuning and architecture design-the 17th chapter on the ideas and solutions of high-availability design

Source: Internet
Author: User
Tags database sharding

17th. Ideas and solutions for high-availability design


The database system is the core part of an application system, if the whole system availability is guaranteed, the database system can not have any problems. For an enterprise-class system, the availability of a database system is particularly important. Once a database system fails to provide services, all systems may not be able to continue to work, unlike some of the systems in the software that may affect only a feature that cannot continue to service. Therefore, a successful database architecture needs to be fully considered in terms of high-availability design. This chapter will focus on how to build a highly available MySQL database system to describe the comparisons between the various solutions and scenarios.

17.1 using Replication to achieve a highly available architecture

Maintenance readers should be aware that any equipment (or services), as long as the single point, there is a great security risk. Because once this device (or service) crash, it is difficult to have a spare device (or service) to replace its function in a short period of time. So a slightly more important server or application system, there will be at least one backup for the exception of the time can be quickly replaced to provide services.
For a database, the primary and standby configuration is a very common design idea. For MySQL, it can be said that there is a natural advantage to achieve this function, because its Replication function is widely used in practical applications to achieve the main configuration of the function.
I think that most of the MySQL environments in which we come into contact have the ability to replicate database replication between two (or more) MySQL servers via MySQL Replication. There may be some to enhance system extensibility to meet performance requirements for read/write separation. Or it is used for the design of the main standby machine, to ensure that when the host crash in a short period of time can be switched to the standby machine to continue to run.
The implementation of database replication through MySQL Replication is actually described in detail in the previous 13th chapter, and also introduces the implementation principle and implementation method of various Replication architectures. Prior to this, the Replication was introduced mainly from the extensibility aspect. Here, I'll focus on how to solve high-reliability problems from the high-availability aspects of the system and the many architectures that utilize Replication.

17.1.1 conventional Master-slave to solve basic design of primary and standby

As I mentioned in the previous chapters, the Common Master-slave architecture is one of the most common architectural approaches used in many systems today. The architecture design not only solves the expansibility problem of the system to a great extent, but also provides the improvement of the performance, and the guarantee of the system reliability.
In the design of a schema that replicates one or more Slave behind a common Master, we have at least one MySQL server (Master) that can provide services when one of our Slave fails to provide services. Not all business related to the database can be run down. If the Slave is more than one, the remaining Slave will still be able to continue to provide services without interruption.
Of course, one of the things to be aware of here is that the overall service capability of our MySQL database cluster is at least guaranteed to support the load of the system after it lacks a MySQL Server, otherwise everything is empty talk. However, from the system design point of view, the system processing capacity left a certain amount of residual space is a relatively basic requirement. As a result, it is normal for the system to have less than one MySQL Server in a short period of time that is still capable of supporting normal business.

As shown, when a Slave C in our Slave cluster fails crash, the entire system changes only the replication interrupt from Master to Slave C, and the client application's Read request can no longer access Slave C. All other MySQL servers were working properly without any adjustments. The client's request Read requests are all assumed by Slave A and Slave B.

the solution of 17.1.2 Master single point problem

The above architecture can easily resolve Slave failures, and can continue to serve without any adjustments. But what happens when our Master has a problem? When our Master has a problem, all client Write requests are not processed.
At this point we can have the following two solutions, one is to switch one of the Slave to Master to provide services, while all other Slave will be through the Change Master command to be copied through the new master. Another solution is to add a master, which is the Dual master solution.
Let's take a look at the first solution and switch one Slave to Master to solve the problem.

When Master fails crash, all Write requests from the original client to master will no longer be able to continue, and all copies of the original master to Slave will naturally break down. At this point, we select a Slave to switch it to Master. Assuming Slave A is selected, we will replace the other Slave B and Slave C with the change Master to command to continue copying from the new master, which is the original Slave a. At the same time, all write requests to the application end are shifted to the new Master. For the read request, we can remove the read request for the new Master, or we can continue to keep it.
One of the biggest drawbacks of this scheme is that the switching steps are much more complicated to achieve. Moreover, at the time of Master failure crash, all of our Slave's replication progress is not necessarily identical, there may be a small number of differences. At this point, choosing which Slave as Master is also a headache. So the controllability of this scheme is not particularly high.
Let's look at the second solution, which is to solve Master's point by Dual Master, in order to be straightforward, just draw the master part of the diagram, as follows:

We built the Dual Master environment with two MySQL servers, and normally all client write requests were written to master A, and then master A was copied to master B through Replication. Once the problem occurs with Master A, all Write requests are turned to master B. Under normal circumstances, when Master B has a problem, the actual database or client request will not be materially affected.
Here, perhaps a reader friend will think, when our Master A problem, how does the application automatically turn the request to Master B? In fact, we just need to set up a VIP through the corresponding hardware devices such as F5 or cluster management software such as Heartbeat, which normally points to master A, and once the exception crash in master A is automatically switched to Maste R B, the application of the front end has access to Master through this VIP. This solves the IP switching problem of the application and ensures that the application will only see one Master at any time, avoiding the possibility of data disturbances in the multi-point write.
The most important feature of this scheme is that after the failure of master, the processing is relatively simple and controllable. The disadvantage is the need to add a MySQL server, in the cost of more investment.

17.1.3 Dual Master with cascading replication to resolve high availability under exceptional failures

Through the previous architecture analysis, we got the solution of Slave failure, and solved the single point problem of master. Now we can get a whole solution to solve the problem of overall system reliability through the Dual Master and Cascade replication architecture.
The introduction of this architectural scheme has been described in detail in the previous chapters, where we mainly analyze the perfect and abnormal switching methods that are considered in high reliability.

As shown, first consider the case where the Slave is abnormal. In this architecture, Slave after the exception of the processing situation and the normal Master-slave architecture is handled exactly the same way, only need to be removed from the application Access Slave cluster access configuration to remove a Slave node can be resolved, whether through the application of their own judgment, or through hardware solutions such as F5 can be easily implemented.
Let's look at the processing scheme after Master A fails crash. Such as:

When master a fails crash, replication between Master A and Master B is interrupted, and all client Write requests to master a must be turned to master B. The implementation of this steering action can be achieved through the VIP approach described in the second scenario described in the previous section. Since all previous Slave are replicated from Master B, the Slave cluster will not be affected, and all Read requests from the client will be unaffected, and the process can be fully automated without any human intervention. But
One of the pitfalls here is that when master a crash, if Master B is a Slave IO thread, there is a problem with data loss if you haven't finished reading the binary log of master A. To completely solve this problem, we can only mirror the MySQL binary log to Master B through a third-party patch (Google Dev) to completely avoid losing any data.
So what happens when Master B fails crash? As shown in the following:

What if the failure is not master B but master A? The first thing to be sure is that all of our Write requests are unaffected and all Read requests are still accessible.
However, all Slave replication will be interrupted, and the data above Slave will begin to lag. All we need to do is change the master to operation and copy the Slave from Master A.
Since all Slave replication is unlikely to advance to the original data source, it is possible to find an accurate copy starting point based on the timestamp information in the Relay Log above Slave and the timestamp information in Master A, without causing any loss of data.

17.1.4 Dual Master in conjunction with cascading replication to resolve online DDL change issues

When we use the combined schema of Dual Master plus cascade replication, it can also be solved for a fatal wound of MySQL, the online DDL change. such as when we need to add a field to a Table tab, can be implemented as follows in the above schema:
1, a temporary stop in the Slave cluster to provide services, and then change it, and then put back to the cluster to continue to provide services;
2. Repeat the first step to complete all Slave Change,
3, pause the replication of Master B, and turn off the current session record binary log function, change it, and then start the replication;
4. Switch all requests for Master A to master via VIP switching B;
5, close the current session of Master A to record the function of the binary log, and then make changes;
6, and finally switch the VIP from master B back to master A, so that all changes are complete.
There are a few things to be aware of during the change process:
1. The entire Slave cluster needs to be able to sustain all the business while a single MySQL is missing;
2, the Slave cluster to increase or decrease the operation of a MySQL simple, can be achieved by adjusting the application configuration on-line; The VIP switch between
3, Dual Master is simple, and the switching time is short, because this switching process causes the application to not access the master database for a short period of time.
4, when the change Master B, there will be a short period of time Slave cluster data delay, so if a single host change time is longer, you need to change in the early hours of low traffic. If necessary, you may even need to change Master B before you switch all Slave to Master B as master.
Of course, even so, because there is a VIP switch between master A and master B, we still have a situation where the application cannot write in a short period of time. So, this solution can only solve the problem of MySQL online DDL to some extent. And when the number of clusters, and the node of each cluster is more, the entire operation process will be very complex and very long. There is no perfect solution for MySQL online DDL, and it is only expected that MySQL will be able to solve the problem as soon as possible in subsequent releases.

17.2 using MySQL Cluster for overall high availability

The related features of MySQL Cluster are described in detail in the previous chapter, as well as the contents of installation configuration maintenance. Here, the main point is to introduce how to use the features of MySQL Cluster to improve the overall usability of our system. Because MySQL Cluster itself is a complete distributed architecture system, but also supports multi-point redundancy of data storage, real-time data synchronization and other characteristics. So it can be said that he was born with the realization of high reliability conditions, whether it can meet the requirements in practical applications, mainly in the configuration of the system set up a reasonable.
Since the MySQL Cluster architecture consists of two tiers of two groups of clusters, including SQL node (mysqld) and NDB node (data node), all two levels need to be able to guarantee high reliability to ensure overall reliability.

Here we introduce the high reliability of MySQL Cluster from two aspects respectively.

17.2.1 High Reliability Assurance for SQL nodes

The SQL node in MySQL Cluster is actually a multi-node mysqld service that does not contain any data. So, the SQL node cluster, like any other normal application server, is a high-level alternative, as long as MySQL server with MySQL Cluster is installed.
When one of the SQL nodes in the cluster is crash out, it will not cause any data loss because it is simply an application service. Only the front-end application Data source configuration is compatible with a host in the cluster after crash automatically remove the host from the cluster. In fact, this is very easy for the application server, either by self-developing the proxy for the judgment function or through the hardware-level load balancing device, which can be very easy to do. Of course, the premise is that the rest of the SQL node can assume the overall load.

For example, after SQL 1 crash, it is actually just a break in one of the many ways to access the data, there are still many ways to get the data you need. Moreover, because of the high substitution of SQL, the replacement is also very simple, even if the replacement of the entire host, can be completed in a short period of time.

High Reliability Assurance for 17.2.2 NDB nodes

MySQL Cluster data redundancy is a prerequisite, first must ensure that there are enough nodes, in fact, at least 2 nodes to ensure that the data is redundant, because, MySQL Cluster when saving redundant data, is more redundant than the need to ensure that the same data is stored on different nodes. MySQL Cluster will partition the data only if the redundancy is guaranteed.
Suppose we have 4 NDB nodes, the data is divided into 4 partition storage, the data is redundantly stored, each data storage 2 parts, that is NDB configuration Noofreplicas parameter is set to 2, 4 nodes will be divided into 2 NDB Group.
The distribution of all data is similar to the following:

In this configuration, suppose we have a problem with one of the NDB nodes in group 0 (if it is NDB 1), some of the data (assuming Part 1) is broken, because there is a redundant copy of each piece of data, it does not have any effect on the system, and it does not even need to be NDB , MySQL will continue to provide the service as normal.
What would happen if we had a partial data corruption on all two nodes? Such as:

If, as shown, the two corrupted parts of the data node belong to the same NDB Group, as long as the damaged part does not contain the exact same data, the entire MySQL Cluster will still be able to provide the service normally. However, if there is the same data in the corrupted data, even if there are only a few parts, it will cause the problem of MySQL cluster, not fully normal service delivery. In addition, if the node that corrupted the data is in two different NDB Group, it is very fortunate that no matter what part of the data is damaged, it will not affect the normal service of MySQL Cluster.
Perhaps a reader friend would say, if our hardware fails, the whole NDB crash it? Yes, there is a real possibility of this problem, but we also don't have to worry about it:

Assuming that our entire NDB node is crash due to hardware (or software) failure, because MySQL Cluster ensures that every copy of the data is not on the same host, even if the entire host is crash, MySQL Cluster will still be able to provide services, like the As shown, even if the entire NDB 1 node is crash, each piece of data can also be retrieved through the NDB 2 node.
So what happens if you crash two nodes at the same time? The first thing to be sure is that if we crash two nodes in the same NDB Group, then there is no way to MySQL Cluster, because two redundant data are lost. However, as long as the crash two nodes are not in the same NDB Group, the MySQL Cluster will not be affected, or can continue to provide normal service. As shown in the situation:

From the situation listed above we can know that MySQL Cluster really can achieve very high reliability, after all, the same time to store the same data at the two NDB nodes have a large failure probability is too small, if this can be met, it can only natural unlucky.
Of course, since the old version of MySQL Cluster needs to Load all of the data into memory to function properly, there are very few people to use due to the limited size of the memory space. Now, although the new version has been supported only need all the index data Load into memory, but because the actual success story is not many, and the development time is not too long, so many user friends for MySQL Cluster is still cautious, most of them are still in the beta phase.

17.3 using DRBD to ensure high security and reliability of data

17.3.1 DRBD Introduction

For many of these friends, the use of DRBD may not be familiar, but there may be some understanding. After all, in the high availability and Scalability chapter of the official MySQL documentation manual, DRBD is introduced as a very important way for MySQL to achieve higher availability. Although this content was not added to the MySQL documentation manual until the middle of last year, DRBD itself has long been a high-reliability solution for many applications, and has long been used in many MySQL-use groups.
In short, DRBD is actually through the network to achieve the block device data mirroring synchronization of an open source cluster software, also known as the network RAID1. The official English presentation is: DRBD refers Toblock devicesdesigned ASA building block Toform High Availability (HA) clusters. This is do by mirroring a whole block device via an assigned network. It is Shownas network raid-1-DRBD. Here is an overview diagram of DRBD:

As we can see, DRBD is between the file system and the disk media, capturing all IO operations of the upper file system, and then invoking the IO module in the kernel to read and write the underlying disk media. When DRBD captures a write operation to the file system, it passes the IO to the remote host's network device via the local host's network device (NIC) at the same time as the local disk write operation, using the TCP/IP protocol. When the remote host's DRBD supervisor hears the IO information passed in, it immediately writes the data to the disk device maintained by the DRBD. At this point, the entire IO is done.
In fact, DRBD has three replication modes (or levels) to choose from when processing remote data writes, and different replication modes guarantee three reliability for remote data writes. Three levels of selection can be protocal through the common configuration section of DRBD. The different replication modes actually affect the actual meaning of an IO completion.
Because when we use DRBD, the identity of an IO completion (DRBD return IO completion) is both local and remote write that both concurrent processes return the completion identity. Let me explain in detail what these three replication modes mean:
Protocol A: This mode is the least reliable mode, and it is an asynchronous pattern. When we use this mode to configure, the process of writing remote data sends the data through the TCP/IP protocol into the TCP send buffer of the local host, which returns completion.
Protocol B: This mode is more reliable compared to Protocol A. Because the thread that writes to the remote waits for the network message to complete, that is, the data has been accepted by the remote DRBD and returned to completion.
Protocol c:protocol C Replication mode is a truly complete synchronous replication mode and will not return to completion until the remote DRBD writes the data to disk successfully.
In contrast to the above three replication modes, the C-mode ensures that the data at both ends can be 11-induced regardless of any anomalies. If you use B mode, you may lose some information that has not been fully written to the disk after the remote host suddenly loses power, and there is some inconsistency between the local and remote data. When we use A copy mode, there may be a greater risk. As long as the local network device suddenly fails to function (including a host power outage), it will lose data written to the remote host, resulting in inconsistent data.
The performance of Io writes is also directly determined by the timing of the remote write process required by different modes to return completion information. With the description of three modes, we can clearly know that IO write performance is inversely proportional to reliability. Therefore, when considering setting this parameter, readers ' friends need to evaluate the impact of each aspect carefully, and get as much as possible a performance requirement that satisfies the actual scene and satisfies the requirement of reliability.
The installation deployment of DRBD and the associated configuration instructions are described in the official MySQL documentation manual and in my Personal website blog (, and the most comprehensive description can be found on the official website of DRBD. So I'm not going to introduce you here. Below I mainly introduce some special features and limitations of DRBD, for your reference.

17.3.2 DRBD features and limitations

DRBD has been widely adopted, even by the official MySQL documentation manual as one of the most highly available, and highly reliable, high-availability solutions. Here are some of the more important features that DRBD has.
1, very rich configuration options, can be adapted to the situation in our application scenario. Whether it's a balance between reliability levels and performance, or data security, whether it's a local disk or a networked storage device, whether you want an exception to be resolved automatically, or if you want manual control, you can do so with a simple configuration file.
Of course, the rich configuration brings great flexibility, but also requires the user to have enough knowledge of it, otherwise it is difficult to decide how to configure in so many configuration parameters. Fortunately, the parameters of DRBD are actually satisfying most of the typical requirements by default, and do not require each of our parameters to be set to run;
2, for the data inconsistency between the nodes, DRBD can be re-synchronized through certain rules. It is also possible to configure the checksum of the data at a fixed point in time by using the relevant parameter configuration to determine whether the data is consistent and to mark inconsistent data. At the same time can also choose whether to solve the inconsistency of the problem or we manually decide how to synchronize;
3, in the course of operation if an exception causes one end to crash, and does not affect the other end of the normal operation. All data changes after an exception are recorded in the relevant log file. When the crash node recovers normally, the data that has changed over time can be automatically synchronized. In order to not affect the normal synchronization of new data, you can also set the speed of the recovery process to ensure that the network and other devices do not have performance problems;
4, a variety of file system types of support. In addition to supporting a variety of conventional file systems, DRBD can support distributed file systems such as GFS,OCFS2. Also, when using a distributed file system, you can implement all of the nodes while providing all IO operations.
5, to provide support for LVM. DRBD can either use the logical device provided by LVM, or it can become the physical device of LVM for the equipment provided by itself, which greatly facilitates the management of the storage device by the operation and maintenance personnel.
6, all IO operation, can absolutely guarantee the IO sequence. This is also a very important feature for databases, especially for database software that requires very demanding data consistency.
7, can support a variety of transport protocols, starting from the DRBD 8.2.7, DRBD began to support Ipv4,ipv6 and supersockets for data transmission.
8, Support Three-way copy. Starting with DRBD 8.3.0, DRBD can support replication between three nodes, somewhat similar to the characteristics of cascading replication.
Of course, while DRBD has a number of desirable features, there are some limitations, and here are some of the more important limitations that DRBD currently has:
1. For the use of regular file systems (non-Distributed file systems), DRBD can only support single primary mode. In single Primary mode, only one node of data is available for external IO service.
Dual Primary mode is supported only when using a distributed file system such as GFS or OCFS2.
2, Sblit Brain solution. Because of some special reasons, it is not too stable to deal with the Primary role after the two hosts have been disconnected after the DRBD connection is interrupted. Although the configuration file for DRBD can be configured to automatically resolve Split Brain, from my previous test situation, not every time the solution is very satisfying, in some cases, there may be a complete failure of a node possible.
3, the limit of the number of replication nodes, even the latest version of DRBD, can only support replication between three nodes at most.
These limitations are a few of the limitations that now seem likely to have a greater impact on users. There are, of course, many other limitations to DRBD, and before you decide to use it, you need to be tested enough to make sure it doesn't bother you after you get online.

17.4 Other highly available design options

In addition to the above several high-availability scenarios, there are many options available to you, such as RAIDB, shared storage solutions (SAN or NAS), and so on.
For Raidb, perhaps a lot of readers friends are also unfamiliar, it is all called redundant Arrays of inexpensive Databases. That is, the data of the database is managed by the Raid concept. So RAIDB also exists the concept of database sharding. As with disk raid, RAIDB also has multiple raid types, such as raidb-0,raidb-1,raidb-2,raidb-0-1 and raidb-1-0. The concept of RAIDB is leveraged in a database solution from the commercial MySQL database solution provider, continuent. You can get a clear picture of Raidb's various Raid modes through a few pictures.





Logo, you should be more clear raidb how the data in various RAID modes are distributed and how it works. As for why this is guaranteed to be high availability, it is the same as the disk through Raid to ensure high data reliability. Of course, to really understand, the premise is that we already have a clear Raid concept.
And for shared storage solutions, it's a relatively expensive solution. The host running MySQL Server does not hold data, except that the disk space above the remote Storage device is used as a local disk by the sharing protocol (FC or NFS).
Why are shared storage solutions capable of meeting high reliability requirements? First, the data does not exist on the MySQL server local host, so when there is any failure of the local MySQL server host, there will be no loss of data, it is completely possible to retrieve the data on the storage device through other hosts. Second, regardless of whether it is shared storage via SAN or NAS, the storage itself has a variety of highly available solutions that can be done without losing any data at all. Even though the connection channel between MySQL Server and shared storage, there are many mature high-availability solutions to ensure high availability of the connection. Because the shared storage solution itself violates my personal advocacy of building inexpensive, enterprise-class high-reliability solutions through MySQL, and taking into account the length of the issue, this is not covered in detail here.
If your readers are interested in this, you can learn more about SAN storage and NAS-related storage through other books.

17.5 comparison of the pros and cons of various high-availability scenarios

From the introduction of various high-availability designs, readers may have discovered that, regardless of the program, there is a unique advantage, but there are some limitations. In fact, this is also very normal, after all, nothing is perfect, we can only make full use of their strengths to solve their own problems, rather than hope to rely on a solution to solve all problems once and for all. This section will do a pros and cons analysis of the above major scenarios, for everyone to choose the process of reference.
1. MySQL Replication
Advantages: Simple deployment, ease of implementation, maintenance is not complex, is the nature of MySQL support features. It is easy to switch between the main and standby machines, and the primary and standby switch can be done automatically through the third-party software or by writing a simple script yourself.
Disadvantage: If the master host hardware fails and cannot be recovered, it may result in some data loss that is not transmitted to the Slave end;
2. MySQL Cluster
Advantage: Very high availability, very good performance. Each minute data has at least one copy on top of different hosts, and redundant copies of the data are synchronized in real time.
Disadvantage: Maintenance is more complex, the product is relatively new, there are some bugs, is not necessarily applicable to the core of the online system.
3. DRBD Disk Network mirroring scheme
Advantage: Software is powerful, data is mirrored across physical hosts at the underlying fast device level, and different levels of synchronization can be configured according to performance and reliability requirements. IO operations are kept in order to meet the database's demanding data consistency.
Disadvantage: Non-Distributed File system environments cannot support mirrored data at the same time, performance and reliability are contradictory and cannot be applied to environments where performance and reliability requirements are more demanding. Maintenance costs are higher than MySQL Replication.

17.6 Summary

This chapter focuses on two highly available solutions for MySQL itself, as well as a detailed description of MySQL's official recommended DRBD solution, and includes a comparison of the pros and cons of each solution. It is hoped that this information will bring some help to the readers. However, MySQL's high-availability solution is far more than the centralized solution described above, and there are a number of other options available for research and exploration by your fellow readers. The power of open source is huge, and the power of open source contributors is endless.

Excerpt from: "MySQL performance tuning and architecture design" Jane Chaoyang

Reprint please specify the source:


MySQL performance tuning and architecture design-the 17th chapter on the ideas and solutions of high-availability design

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.