MySQL Database slicing

Source: Internet
Author: User
Tags database join joins mysql version table definition


First, the vertical segmentation of data
Concept: Vertical segmentation of data can also be referred to as vertical segmentation. Spread the different tables across different database hosts.
An application system, the overall function is certainly composed of many functional modules, and each function module needs to correspond to the data in the database is one or more tables.
In the architecture design, the more unified the interaction point of each function module, the less the coupling degree of the system, the better the maintainability and expansibility of each module.
Such a system, it is easier to achieve vertical segmentation of data.

1) The data can be segmented according to the function module, the data of different function modules are stored in different database hosts, so it is easy to avoid cross-database join existence, and the system architecture is very clear.
2) It is very difficult for the system to be able to do all the function modules used by the table completely independent, do not need to access the other side of the table or a table of two modules to join operation. Evaluate tradeoffs based on actual application scenarios.
1. Indulge the application by storing the table associated with the join in the same database as soon as necessary.
2. Let the application do more things, the program completely through the module interface to obtain data from different databases, and then complete the join operation in the program.

After such vertical slicing, the service can be divided into four databases to provide service, and the service ability is increased several times.

Advantages of vertical slicing

The splitting of the database is simple and clear, and the splitting rule is explicit;

The application module is clear and easy to integrate.

Data maintenance convenient and easy to locate;

Disadvantages of vertical slicing

Some table associations cannot be completed at the database level and need to be completed in the program;

There is still a quiet performance for tables with extremely frequent access and large data volumes, and may not necessarily meet the requirements;

Transaction processing is relatively more complex;

When the segmentation reaches a certain extent, the extensibility will be limited;

Read-through segmentation can lead to complex system transitions and difficult maintenance.

Notice: In the database of vertical segmentation, how to slice, to what extent, is a comparative test of human problems. Only by balancing the costs and benefits of each aspect in a real-world scenario can you analyze a split plan that really suits you.
There may be a slight increase in the single response time for some operations, but the overall performance of the system is likely to be somewhat improved.

Second, the level of data segmentation
Concept: Horizontal slicing is the main way to spread an extremely mundane table into multiple tables, with a subset of the data in each table, followed by some rule of a field.
is to slice through rows of data, to slice some of the rows in a table into one database, and some other rows to be sliced into other databases.

1) Slicing according to a specific rule.
1. A numeric Type field is based on a specific number of modulo
2. Scope of a Time Type field
3. The hash value of a character type field.
2) so that the relevant data to be stored in the same database as far as possible, even if the need to join query, it can be very simple association, at the database level can be linked query.

Advantages of horizontal slicing

Table Association is basically able to complete on the database side;

There is no problem that some very large data volumes and high-load tables encounter bottlenecks;

Application-side Overall schema changes are relatively small;

Transaction processing is relatively simple;

As long as the segmentation rules can be defined, it is basically more difficult to meet the extensibility limit;

Disadvantages of horizontal slicing

The segmentation rules are relatively more complex, and it is difficult to abstract a segmentation rule that satisfies the whole database.

Later data maintenance difficulty has increased, manual positioning of data is more difficult;

The coupling degree of each module in the application system is high, which may cause some difficulties in the migration and splitting of the later data.


Three, vertical and horizontal segmentation of the joint use
The load on each application is growing step after time, and most architects and DBAs choose to start with a vertical split of the data at the beginning of a performance bottleneck, because the cost is first and foremost in line with the maximum input-output ratio pursued during this period. However, with the continuous expansion of the business, the system load continues to grow, after a period of stable system, after the vertical split of the database cluster may again overwhelmed, encountered a performance bottleneck. At this point, you need to slice horizontally.

In fact, in many large-scale application systems, the two data segmentation methods, vertical slicing and horizontal cutting, are basically co-existent, and constantly alternating, in order to continuously increase the system's ability to expand. When dealing with different scenarios, we also need to take into account the respective limitations of these two methods, as well as their respective advantages, and use different combinations at different times (load pressures).

Benefits of Joint Segmentation

Can make full use of vertical segmentation and horizontal segmentation of their respective advantages to avoid their own shortcomings;

Maximize system scalability;

Disadvantages of Joint segmentation

The database system architecture is more complex and more difficult to maintain.

The application architecture is also relatively more complex;


Iv. Data segmentation and integration solutions

After the data in the database is split vertically and/or horizontally and stored in different database hosts, the biggest problem facing the application system is how to make these data sources better integrated.

In general, there are two ways to solve this problem:

1. In each application module to configure the management of their own needs of one (or more) data sources, direct access to each database, within the module to complete the integration of data;

2. Through the intermediary agent layer to manage all the data sources uniformly, the backend database cluster is transparent to the front-end application;

Perhaps more than 90% of people in the face of the above two ways of thinking will be inclined to choose the second, especially when the system is constantly becoming large and complex. Indeed, this is a very correct choice, although the cost in the short term may be relatively larger, but it is very helpful for the overall system scalability.

V. Possible problems of data segmentation and integration

There are some possible problems that we need to do before we can implement a data-splitting scheme. In general, the problems we may encounter are mainly as follows:

The problem of introducing distributed transaction;

Cross-node join problem;

Cross-node merge sorting paging problem;


1. Problems with the introduction of distributed transactions

Once the data is sliced and stored in multiple mysqlserver, no matter how perfect our segmentation rules are (in fact, there is no perfect segmentation rule), it is possible that the data involved in some of the previous transactions is no longer in the same mysqlserver.

In such a scenario, if our application is still in accordance with the old solution, then it is bound to need to introduce a distributed transaction to solve. In each version of MySQL, the distributed transaction is supported only by the versions since MySQL5.0 began, and only INNODB provides distributed transaction support. Not only that, even if we are using a MySQL version that supports distributed transactions, but also a InnoDB storage engine, the distributed transaction itself consumes a lot of system resources, and the performance itself is not too high. And the introduction of distributed transaction itself in the exception processing will bring more difficult to control factors.

What to do? In fact, we can use a workaround to solve this problem, the first thing to consider is: whether the database is the only place to solve the transaction? This is not the case, we can completely combine the database and the application both to solve together. Each database resolves its own transactions, and then uses the application to control transactions on multiple databases.

That is, as long as we are willing, a distributed transaction spanning multiple databases can be split into multiple small transactions that are only on a single database, and the application is used to control small transactions. Of course, the requirement for this is that our application must be robust enough and, of course, bring some technical difficulties to the application.


2. Cross-node Join issues

The above describes the possible introduction of distributed transactions, and now let's look at the problem of cross-node join. Data segmentation may cause some old join statements to be unusable because the data source used by join may be sliced into multiple mysqlserver.

What to do? This problem from the MySQL database point of view, if it has to be directly resolved on the database side, I am afraid only through the MySQL a special storage engine federated to solve. The Federated storage Engine is a solution for MySQL to address issues such as Oracle-like Dblink. The main difference with Oracledblink is that federated will save a copy of the definition of the remote table structure locally. At first glance, Federated is really a great solution for cross-node joins. But we should also be clear, it seems that if the remote table structure is changed, the local table definition information will not follow the corresponding changes. If you update the remote table structure without updating the local federated table definition information, it is likely to cause query to run out of error and not get the correct results.

To deal with this kind of problem, I still recommend using the application to process, first in the driver table in the MySQLServer to take out the corresponding drive result set, and then based on the driving result set to the driver table is located in the MySQLServer to fetch the corresponding data. Many readers may think that this will have a certain impact on performance, yes, it does have a certain negative impact on performance, but in addition to this method, there are basically not many other better solutions. Moreover, because the database through the good expansion, each mysqlserver load can get better control, purely for a single query, its response time may be higher than the non-segmentation, so the performance of the negative impact is not too large. What's more, similar to the need for cross-node joins is not too much, and probably only a small part of the overall performance. So for the overall performance of the consideration, the occasional sacrifice a little bit, in fact, is worth it, after all, the system optimization itself is a lot of trade-offs and balance process.

3. Cross-node merge sort paging problem

Once the data has been sliced horizontally, it is possible that not only the cross-node join does not work, but the data source for some sort of paged query statement may also be sliced to multiple nodes, and the immediate consequence is that these sort paged queries cannot continue to function properly. In fact, this and cross-node join is a reason, the data source exists on multiple nodes, to be resolved through a query, and cross-node join is the same operation. The same federated can be partially solved, of course, the same risks exist.

Or the same problem, how to do? I also continue to recommend that the application be addressed.

How to solve? The solution is broadly similar to that of a cross-node join, but there is a point that joins the cross-node join, which often has a driver-driven relationship, so there is generally a sequential relationship between the data reads of multiple tables involved in the join itself. But sort of paging is not the same, the sort of paging data source can basically be said to be a table (or a result set), itself does not exist a sequential relationship, so the process of fetching data from multiple data sources can be completely parallel. In this way, the fetch efficiency of the sorted paging data can be higher than the cross-Library join, so the performance loss is relatively smaller and in some cases more efficient than the original data-splitting database. Of course, whether it's cross-node join or sorting across nodes, our application server consumes more resources, especially memory resources, because the process of reading access and merging result sets requires more data than was originally processed.

MySQL Database slicing

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.