MySQL Horizontal Split (Reading Notes)

Last Update:2018-06-05 Source: Internet

Author: User

Tags table definition

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Once the data is split and stored in multiple MySQLServer, no matter how perfect our splitting rules are (in fact there is no perfect splitting rule ),

Once the data is split and stored in multiple MySQL servers, no matter how perfect our splitting rules are (in fact there is no perfect splitting rule ),

1. Introduction to horizontal split

In general, a simple horizontal split mainly refers to distributing a table with extremely ordinary access to multiple tables according to certain rules of a field. Each table contains part of the data.

To put it simply, we can split the data horizontally to split some rows in the table into a database, some other rows are split into other databases. Of course, in order to be able to easily determine which database each row of data is split into, segmentation always requires certain rules.
For example, the modulo operation is based on a specific number of numeric fields, the range of a certain time type field, or the hash value of a certain character type field. If most of the core tables in the system can be associated using a certain field, this field is naturally the top choice for horizontal partitioning. Of course, if it is very special and cannot be used, you can only select another one.

2. Advantages and Disadvantages of Horizontal Split

Advantages of horizontal split:
◆ Table association can be completed at the database end;
◆ There will be no bottlenecks in tables with large data volumes and high loads;
◆ The overall architecture of the application is relatively small;
◆ Transaction processing is relatively simple;
◆ As long as the splitting rules can be defined, it is basically difficult to meet scalability restrictions;

Disadvantages of horizontal Splitting:
◆ Splitting rules are more complex and it is difficult to abstract a segmentation rule that can satisfy the entire database;
◆ Data maintenance becomes more difficult in the future, making it more difficult to manually locate data;
◆ The modules of the application system have a high degree of coupling, which may cause some difficulties for subsequent data migration and splitting.

3. sharding Rule 3.1 Horizontal sharding Based on Modulus

Generally, Internet companies, especially e-commerce and game businesses, which are currently very popular on the Internet, are able to relate most of their data through member user information, many core tables may be suitable for horizontal data sharding by member IDs. For example, the Forum community discussion system makes it easier to split data horizontally by Forum number.
After splitting, there is basically no interaction between databases.

Therefore, for our example database, most tables can be horizontally split based on the user ID.
Data related to different users is split and stored in different databases. For example, all user IDs are stored in two different databases through 5 modulo. Each table associated with the user ID can be split in this way. In this way, the data related to each user is basically in the same database. Even if it needs to be associated, it can be very simple
Connected.

3.2 horizontal segmentation by region

For example, China is divided into 10 major zones, Jiangsu, Zhejiang, and Shanghai are regarded as one brother, and qilu is regarded as one. The two lakes are regarded as one, the central plains are counted as one, the southwest is counted as one, the Inner Mongolia is, and the Northeast is counted as one, one in the northwest, one in the north, and one in the southeast.

Servers in North China, southeast China, Jiangsu, Zhejiang, and Shanghai regions with relatively large business volumes can allocate a large amount of server resources, such as cpu, io, and network resources, with relatively high-end configurations.

Servers in northwest China, Qilu, lake, and Northeast China with normal business volume can allocate medium and high-end server resources.
If the business volume is relatively small, servers in southwest China, Inner Mongolia, and central plains can be slightly more general servers.

Of course, the division of these resources cannot be made explicit to the outside, so we can consider it well when doing internal planning, so that we will not be criticized for being biased and not paying attention to it.

PS: This division is not qualitative. You can upgrade the resources of good business areas at any time according to the business.

As shown in:

4. integrated view of horizontal split and Application

5. Problems After horizontal split

Before implementing the data splitting scheme, we still need to analyze some possible problems. Generally,
We may encounter the following problems:
◆ Introduce distributed transactions;
◆ Cross-node Join;
◆ Cross-node merge and sorting paging;

5.1 Introduction of distributed transactions

Once the data is split and stored in multiple MySQL servers, no matter how perfect our splitting rules are (in fact there is no perfect splitting rule ), it is possible that the data involved in some previous transactions is no longer in the same MySQL Server.
In such a scenario, if our applications still follow the old solution, distributed transactions are required. In MySQL versions, only versions after MySQL 5.0 support distributed transactions. Currently, only Innodb supports distributed transactions. In addition, even if we use the MySQL version that supports distributed transactions and the Innodb Storage engine, distributed transactions consume a lot of system resources, the performance itself is not too high. In addition, the introduction of distributed transactions will bring more difficult factors for exception handling.
What should I do? In fact, we can solve this problem through a work und. The first thing to consider is: is the database the only one that can solve the transaction? In fact, this is not the case. We can work together with databases and applications. Each database solves the transactions on its own, and then controls the transactions on multiple databases through applications.
That is to say, as long as we are willing, we can split a distributed transaction across multiple databases into multiple small transactions that are only on a single database, and control each small transaction through the application. Of course, the requirement for this is that our Russian applications must be robust enough, and of course it will bring some technical difficulties to the applications.

5.2 cross-node Join

The preceding section describes the possible introduction of distributed transactions. Now let's look at the cross-node Join problem. After data splitting, some old Join statements may fail to be used, because the data source used for Join may be split into multiple MySQL servers.
What should I do? From the MySQL database perspective, if you have to solve this problem directly on the Database End, I am afraid it can only be solved through a special storage engine Federated of MySQL. The Federated storage engine is a solution for MySQL to solve problems similar to Oracle's DB Link. The main difference with OracleDB Link is that Federated stores a definition of the remote table structure locally. At first glance, Federated is indeed a very good solution for cross-node Join. However, we should also be clear that if the remote table structure is changed, the local table definition information will not change accordingly. If the local Federated table definition information is not updated when the remote table structure is updated, the Query operation may fail and the correct results may not be obtained.
To solve such problems, we recommend that you use an application to obtain the corresponding driver result set in the MySQL Server where the driver table is located, then, the data is retrieved from the MySQL Server where the driver table is located based on the driver result set. Many readers may think that this will have a certain impact on the performance. Yes, it does have a certain negative impact on the performance, but in addition to this method, basically, there are not many other better solutions. In addition, since the database has been well expanded, the load of each MySQL Server can be well controlled, simply for a single Query, the response time may be higher than before splitting, so the negative impact of performance is not too great. What's more, there are not many requirements for cross-node Join, which may be a small part of the overall performance. Therefore, for the sake of overall performance, it is worthwhile to sacrifice a little bit occasionally. After all, system optimization itself is a process of many trade-offs and balances.

5.3 cross-node Merge Sorting Paging

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More