MySQL performance tuning and Architecture Design-Chapter 1 Data splitting of scalability design and mysql Optimization

Source: Internet
Author: User
Tags database join

MySQL performance tuning and Architecture Design-Chapter 1 Data splitting of scalability design and mysql Optimization

Chapter 1 Data splitting for scalable design


The expansion implemented through the MySQL Replication function is always limited by the size of the database. Once the database is too large, especially when writing too frequently, it is difficult to support it by a host, we are still faced with expansion bottlenecks. At this time, we must seek other technical means to solve this bottleneck, that is, we will introduce the evil data segmentation technology in this chapter.

14.1 what is data segmentation

Many readers may have seen articles about data splitting on the Internet or in magazines for many times, but some articles refer to them as data Sharding. In fact, whether it is data Sharding or data Sharding, the concept is the same. Simply put, it refers to distributing data stored in the same database to multiple databases (hosts) through a specific condition to distribute the load of a single device. Data splitting can also improve the overall availability of the system, because after a single device Crash, only a part of the overall data is unavailable, rather than all the data.
Sharding can be divided into two Sharding modes based on the Sharding rule type. One is to split the data to different databases (hosts) based on different tables (or Schemas). Such splitting can be called vertical data splitting; the other is to split the data in the same table into multiple databases (hosts) based on the Logical Relationship of the data in the table. This split is called the data level (horizontal) split.
Vertical splitting is a system with simple rules and more convenient implementation. It is especially suitable for systems with low coupling, low mutual impact, and clear business logic. In such a system, it is easy to split the tables used by different business modules into different databases. Splitting Based on different tables has less impact on applications, and the splitting rules are simple and clear.
Compared with Vertical Split, horizontal split is relatively more complex. Because different data in the same table needs to be split into different databases, it is more complicated for applications to split the data according to the table name, later data maintenance will be more complex.
When the data volume and traffic volume of a table (or some tables) are very large, it cannot meet the performance requirements after being placed on an independent device through vertical splitting, at this time, we must combine Vertical Split and horizontal split, Vertical Split first, and then horizontal split to solve the performance problem of this super large table.
Next we will analyze the architecture Implementation of the vertical, horizontal, and combined data splitting methods and the data integration after splitting.

14.2 Vertical Split of data

Let's take a look at how the Vertical Split of data is split. Vertical Data splitting can also be called vertical data splitting. Think of a database as a "data block" (table) composed of many large blocks. We vertically cut these "data blocks, then they are distributed across multiple database hosts. This split method is a vertical data split.
The overall function of an application system with a well-designed architecture must be composed of many functional modules, the data required by each function module corresponds to one or more tables in the database. In architecture design, the less interaction points between functional modules, the lower the Coupling Degree of the system, and the better the maintainability and scalability of each module. Such a system makes it easier to achieve vertical data splitting.
The clearer the functional modules and the lower the coupling degree, the easier it is to define rules for Vertical Data splitting. Data can be split based on functional modules. Data of different functional modules is stored in different database hosts, which can easily avoid cross-database Join, the system architecture is also very clear.

Of course, it is difficult for a system to ensure that the tables used by all functional modules are completely independent, and there is no need to access the other table or the tables of the two modules need to be joined. In this case, we must evaluate and weigh based on actual application scenarios. The decision is to move applications to store all the tables to be joined in the same database, or let the applications do more, that is, the program completely acquires data from different databases through the module interface, and then completes the Join operation in the program.
Generally, if a system with relatively low load and frequent table associations, database concessions may be made, it is a feasible solution to combine several related modules to reduce the workload of applications.
Of course, the concession of the database allows multiple modules to share data sources in a centralized manner. In fact, the introduction acquiesce in the development of the Coupling Degree of each module architecture, which may make the future architecture worse and worse. Especially when the Development reaches a certain stage, it is found that the database is unable to bear the pressure of these tables and has to face further splitting, the cost of architecture transformation may be much higher than at the beginning.
Therefore, how to split and to what extent the database performs vertical segmentation is a challenge to test people. Only by balancing costs and benefits in practical application scenarios can we analyze a splitting solution that is truly suitable for you.
For example, in the example database of the example system used in this book, we will analyze it briefly and then design a simple splitting rule to perform a Vertical Split.
The system functions can be divided into four functional modules: user, group message, album, and event, which correspond to the following tables respectively:

1. user Module table: user, user_profile, user_group, user_photo_album
2. group discussion table: groups, group_message, group_message_content, top_message
3. album related tables: photo, photo_album, photo_album_relation, photo_comment
4. event info table: event

At first glance, no module exists independently from other modules, and there is a relationship between the module and the module, so it cannot be split?
Of course not. Let's further analyze it and find that although the tables used by each module are correlated, the Association is clear and simple.
◆ The group discussion module and the user module are mainly associated through the user or group relationship. Generally, the Association is performed by the user id, nick_name, and group id. Implementing the association through interfaces between modules will not cause too much trouble;

◆ The album module is only associated with the user module. The association between the two modules is basically the content associated with the user id, simple and clear, and the interface is clear;

◆ The event module and each module may be associated, but only focus on the ID information of objects in each module, which can also be easily split.

Therefore, the first step is to vertically split the database according to the tables related to the function module. The forms involved in each module are unique in a database, the table associations between modules are all handled by an excuse on the application system side.

As shown in:

After such vertical splitting, services that can only be provided by one database are split into four databases to provide services. The service capability is naturally increased several times.
Advantages of vertical splitting
◆ Database splitting is simple and clear, and sharding rules are clear;
◆ The application module is clear and clear, and integration is easy;
◆ Data maintenance is convenient and easy to locate;
Disadvantages of vertical splitting
◆ Some table associations cannot be completed at the database level and must be completed in the program;
◆ Tables with extremely frequent access and large data volumes still have stable performance and may not meet the requirements;
◆ Transaction processing is more complex;
◆ When the splitting reaches a certain degree, the scalability will be limited;
◆ Excessive splitting may result in excessive system complexity and difficulty in maintenance.
For vertical splitting, data splitting and transaction problems may occur. It is difficult to find a good solution at the database level. In actual application cases, the vertical split of databases mostly corresponds to the modules of the application system. The data sources of the same module are stored in the same database, which can solve the internal data association problem of the module. Between modules, applications provide the required data in the form of service interfaces. Although this will indeed increase the total number of database operations, it is beneficial in terms of overall system scalability and modular architecture.

The response time for some operations may increase slightly, but the overall performance of the system may be improved. The bottleneck of expansion can only be solved by the Data horizontal splitting architecture described in the next section.

14.3 horizontal split of data

The above section analyzes the Vertical Split of data, and then analyzes the horizontal split of data. Vertical Data splitting can be simply understood as splitting data by table by module, while horizontal data splitting is no longer based on tables or functional modules. In general, a simple horizontal split mainly refers to distributing a table with extremely ordinary access to multiple tables according to certain rules of a field. Each table contains part of the data.
To put it simply, we can split the data horizontally to split some rows in the table into a database, some other rows are split into other databases. Of course, in order to be able to easily determine which database each row of data is split into, segmentation always requires certain rules.

For example, the modulo operation is based on a specific number of numeric fields, the range of a certain time type field, or the hash value of a certain character type field. If most of the core tables in the system can be associated using a certain field, this field is naturally the top choice for horizontal partitioning. Of course, if it is very special and cannot be used, you can only select another one.
Generally, for websites of the Web2.0 type that are currently popular on the Internet, most of the data can be associated with member user information, many core tables may be suitable for horizontal data sharding by member IDs. For example, the Forum community discussion system makes it easier to split data horizontally by Forum number.

After splitting, there is basically no interaction between databases.
For example, in our example system, all the data is associated with the user, so we can perform horizontal split based on the user and split the data of different users into different databases. Of course, the only difference is that the groups table in the User Module has no direct relationship with the user, so the groups cannot be split horizontally based on the user. In this special case, tables can be independently stored in an independent database. In fact, this practice can be said to use the "vertical data segmentation" method described in the previous section, in the next section, I will introduce the joint Splitting Method Used for vertical splitting and horizontal splitting in more detail.
Therefore, for our example database, most tables can be horizontally split based on the user ID. Data related to different users is split and stored in different databases. For example, all user IDs are stored in two different databases through 2 modulo. Each table associated with the user ID can be split in this way. In this way, the data related to each user is basically in the same database. Even if it is necessary to associate, it can be very simple.
We can more intuitively display information about horizontal Splitting:

Advantages of horizontal splitting
◆ Table association can be completed at the database end;
◆ There will be no bottlenecks in tables with large data volumes and high loads;
◆ The overall architecture of the application is relatively small;
◆ Transaction processing is relatively simple;
◆ As long as the splitting rules can be defined, it is basically difficult to meet scalability restrictions;
Disadvantages of horizontal splitting
◆ Splitting rules are more complex and it is difficult to abstract a segmentation rule that can satisfy the entire database;
◆ Data maintenance becomes more difficult in the future, making it more difficult to manually locate data;
◆ The modules of the application system have a high degree of coupling, which may cause some difficulties for subsequent data migration and splitting.

14.4 use of vertical and horizontal joint splitting
In the preceding two sections, we learned the implementation of the vertical and horizontal splitting methods and the architecture information after splitting, the advantages and disadvantages of the two architectures are also analyzed. However, in actual application scenarios, in addition to the load that is not too large and the business logic is relatively simple, the system can solve the scalability problem through one of the above two splitting methods, i'm afraid most of the other business logic is a little more complicated. systems with a higher system load cannot achieve better scalability through any of the above data splitting methods, the preceding two splitting methods must be used in combination. Different scenarios use different splitting methods.

In this section, I will combine the advantages and disadvantages of vertical splitting and horizontal splitting to further improve our overall architecture and system scalability.
In general, it is difficult for all tables in our database to be joined by one (or a few) field. Therefore, it is difficult to simply solve all the problems simply by horizontal splitting of data. Vertical splitting can only solve some problems. For systems with very high loads, even if only a single table is unable to bear its load through a single database host. We must combine the vertical and horizontal splitting methods to take full advantage of the advantages of the two methods to avoid their shortcomings.
The load of each application system increases step by step. At the beginning of the performance bottleneck, most architects and DBAs will choose to vertically split data first, because the cost is the first, it is the most consistent with the maximum input-output ratio pursued in this period. However, as the business continues to expand and the system load continues to grow, after a period of system stability, the database cluster after vertical split may be overwhelmed again, encountered a performance bottleneck.
How should we choose this time? Are we going to further segment the modules, or are we looking for other solutions? If we continue to segment modules and vertically split data as we did at the beginning, we may face the same problems in the near future. With the continuous refinement of modules, the architecture of application systems will become more and more complex, and the entire system may be out of control.
At this time, we must solve the problems encountered here through the advantages of horizontal data splitting. In addition, we do not need to use the vertical data splitting results when using data horizontal splitting. Instead, we need to use the advantages of horizontal splitting to avoid the disadvantages of vertical splitting, solve the problem of increasing complexity of the system. The disadvantages of horizontal splitting (difficult to unify rules) have been solved by the previous vertical splitting, so that horizontal splitting can be easily implemented.
For our example database, we suppose we split the data vertically at the beginning. However, as the business grows, the database system encountered a bottleneck. We chose to reconstruct the database cluster architecture. How to refactor? Considering that the vertical data splitting has been completed before, and the module structure is clear and clear. However, the business growth momentum is becoming increasingly fierce. Even if the module is further split, it will not last long. We chose to perform horizontal Splitting Based on vertical splitting.
Each database cluster that has undergone Vertical Split has only one function module, and all the tables in each function module are basically associated with a field. For example, all user modules can be split by user ID, while all group discussion modules can be split by group ID, and the album module can be split by album ID, the last event notification information table considers the time limit of data (only the information of the last event segment is accessed), and considers time-based splitting.
Demonstrate the entire split architecture:

In fact, in many large application systems, the vertical and horizontal data Splitting Methods basically coexist, and are constantly alternating, to continuously increase the scalability of the system. When dealing with different application scenarios, we also need to fully consider the limitations and advantages of the two splitting methods, in different periods (load pressure) use different combinations.

Advantages of joint splitting
◆ The advantages of vertical splitting and horizontal splitting can be fully utilized to avoid their respective defects;
◆ Maximize system scalability;

Disadvantages of joint splitting
◆ Complex database system architecture and more difficult maintenance;
◆ The application architecture is also relatively more complex;

14.5 data splitting and integration solutions

Through the previous sections, we have learned that data splitting in databases can greatly improve system scalability. However, after the data in the database is divided vertically and (or) horizontally and stored on different database hosts, the biggest problem facing the application system is how to integrate these data sources, this may also be a concern of many readers. In this section, we mainly focus on analyzing various solutions that can help us achieve data splitting and data integration.
Data integration is hard to rely on the database itself to achieve this effect. Although MySQL has a Federated storage engine that can solve some similar problems, it is difficult to use it in practical application scenarios. How can we integrate these data sources scattered on various MySQL hosts?
In general, there are two solutions:

1. configure and manage one or more data sources required by each application module, directly access each database, and complete data integration in the module;
2. Use the intermediate proxy layer to manage all data sources. The backend database cluster is transparent to the front-end applications;

Maybe more than 90% of people tend to choose the second solution when facing the above two solutions, especially when the system keeps getting bigger and complicated. Indeed, this is a very correct choice. Although the cost in the short term may be relatively higher, it is very helpful for the scalability of the entire system.
Therefore, I will not prepare too many analyses for the first solution. Next I will focus on some solutions in the second solution.

★Self-developed intermediate proxy Layer
After deciding to use the intermediate proxy layer of the database to solve the architecture direction of data source integration, many companies (or enterprises) select the proxy layer application that meets the specific scenarios of the application by self-development.
Through self-development, the intermediate proxy layer can respond to the specific application to the maximum extent, customize many personalized requirements to the maximum extent, and flexibly respond to changes. This should be said to be the biggest advantage of self-developed proxy layer.

Of course, when you choose to develop on your own and enjoy the pleasure of maximizing personalized customization, you naturally need to invest more costs in the preliminary R & D and continuous upgrade and improvement work in the future, in addition, the technical threshold may be higher than that of simple Web applications. Therefore, it is better to make a comprehensive assessment before deciding on your own development.
Because self-developed systems are more often used to better adapt to their own application systems and cope with their own business scenarios, it is difficult to analyze too much here. Next we will mainly analyze several popular data source integration solutions.

★Use MySQL Proxy for data splitting and integration
MySQL Proxy is a database Proxy layer product officially provided by MySQL. Like MySQL Server, it is also an open source product based on the GPL open source protocol. It can be used to monitor, analyze, or transmit communication information between them. Its flexibility allows you to use it to the maximum extent. Currently, its main functions include connection routing, Query analysis, Query filtering and modification, Server Load balancer, and basic HA mechanisms.
In fact, MySQL Proxy does not have all of the above functions, but provides the basis for implementing the above functions. To implement these functions, we also need to compile the LUA script.
MySQL Proxy actually establishes a connection pool between the client request and the MySQL Server. All client requests are sent to the MySQL Proxy and then analyzed through the MySQL Proxy to determine whether the operation is read or write and distributed to the corresponding MySQL Server. Slave clusters with multiple nodes can also achieve load balancing. The basic architecture of MySQL Proxy is as follows:

Through the architecture diagram above, we can clearly see the position of MySQL Proxy in practical application and the basic things we can do. More detailed implementation rules for MySQL Proxy are described and examples in the MySQL official documentation. Interested readers can download them from the MySQL official website for free or read them online, I will not waste paper here.

★Use Amoeba for data splitting and integration
Amoeba is developed based on Java and focuses on the open-source framework for integrating distributed database data sources into the Proxy program, based on the GPL3 open-source protocol. Currently, Amoeba supports Query routing, Query filtering, read/write splitting, Server Load balancer, and HA mechanisms.

Amoeba mainly solves the following problems:
1. Integrate complex data sources after data splitting;
2. Provide data splitting rules and reduce the impact of data splitting rules on the database;
3. Reduce the number of connections between the database and the client;
4. read/write splitting route;
We can see that what Amoeba does is exactly what we need to improve database scalability through data splitting.

Amoeba is not a Proxy program at the Proxy layer, but a development framework For the Proxy program at the Proxy layer of the development database. Currently, the Proxy programs developed based on Amoeba include Amoeba For MySQL and Amoeba For Aladin.
Amoeba For MySQL is mainly a solution For MySQL databases. The Protocol requested by the front-end application and the data source database connected to the backend must be MySQL. For any client application, Amoeba For MySQL is no different from a MySQL database. Any client requests using the MySQL protocol can be parsed and processed by Amoeba For MySQL. The following figure shows the Architecture Information of Amoeba For MySQL (from the Amoeba developer blog ):

Amoeba For Aladin is a more widely used and powerful Proxy program. It can connect to data sources of different databases at the same time to provide services for front-end applications, but only accept client application requests that comply with the MySQL protocol. That is to say, as long as the front-end application is connected through the MySQL protocol, Amoeba For Aladin will automatically analyze the Query statement, automatically identifies the type of database on which the queried data source is located based on the data requested in the Query statement. Demonstrate the architecture details of Amoeba For Aladin (from the Amoeba developer blog ):

At first glance, the two seem to be exactly the same. After looking at the details, we will find that the main difference between the two is that after processing through the MySQL Protocal Adapter, we can determine the data source database based on the analysis results, and then select a specific JDBC driver and the corresponding protocol to connect to the backend database. In fact, you may have discovered the characteristics of Amoeba through the two architecture diagrams above. He is only a development framework, in addition to providing the For MySQL and For Aladin products, we can also conduct secondary development based on our own needs, get a Proxy program better suited to our own application features.
For MySQL databases, both Amoeba For MySQL and Amoeba For Aladin can be used well. Of course, considering the complexity of any system, the performance of the system will certainly suffer a certain loss, and the maintenance cost will naturally be relatively higher. Therefore, we recommend that you use Amoeba For MySQL when you only need to use the MySQL database.
Amoeba For MySQL is very simple to use. All configuration files are standard XML files, and there are four configuration files in total. They are:

◆ Amoeba. xml: The main configuration file that configures all data sources and parameter settings of Amoeba;
◆ Rule. xml: configure information about all Query routing rules;
◆ FunctionMap. xml: configure the Java Implementation class for parsing the functions in the Query;
◆ RullFunctionMap. xml: configure the implementation class of the specific function to be used in the routing rule;
If your rules are not too complex, you only need to use the first two of the above four configuration files to complete all the work. Common functions of Proxy programs, such as read/write splitting and Server Load balancer, are configured in amoeba. xml.
In addition, Amoeba supports automatic routing for vertical and horizontal data splitting. You can set routing rules in rule. xml.
At present, Amoeba is rarely lacking mainly in its online management function and support for transactions. It has put forward related suggestions during the communication with related developers, we hope to provide a command line management tool for online maintenance and management to facilitate online maintenance and use. The feedback is that the management module has been included in the development agenda. In addition, Amoeba does not support transactions for the time being. Even if the request submitted by the client application to Amoeba contains transaction information, Amoeba ignores transaction information. Of course, after continuous improvement, I believe that transaction support must be the added feature that Amoeba focuses on.
For more detailed use of Amoeba readers can get through the user manual provided above Amoeba developer blog (, here will not be detailed.

★Use HiveDB for data splitting and integration
Like MySQL Proxy and Amoeba, HiveDB is also an open-source framework that provides data splitting and integration for MySQL databases based on Java. Currently, HiveDB only supports horizontal data splitting.

It mainly solves the scalability of databases with large data volumes and high-performance access to data. It also supports data redundancy and basic HA mechanisms.
The implementation mechanism of HiveDB is different from that of MySQL Proxy and Amoeba. Instead of using the Replication function of MySQL to achieve data redundancy, HiveDB implements its own data redundancy mechanism, at the underlying layer, data splitting is implemented based on Hibernate Shards.

In HiveDB, data is distributed to multiple MySQL servers through various custom Partition keys. During access, when a Query request is run, the filter conditions are automatically analyzed, data is read from multiple MySQL servers in parallel, and the result set is merged and returned to the client application.
In terms of functions, HiveDB may not be as powerful as MySQL Proxy and Amoeba, but its data splitting idea is essentially different from the previous two. In addition, HiveDB is not only shared by open-source enthusiasts, but also an open-source project supported by commercial companies.
The following figure shows the basic information about how HiveDB organizes data on the official website of HiveDB. Although it cannot be detailed, however, it also shows its unique aspect of data splitting.

★Other solutions for data splitting and integration
In addition to the data splitting and integration solutions described above, there are also many other solutions that provide data splitting and integration. For example, HSCALE is further extended based on MySQL Proxy, Spock Proxy built through Rails, and Pyshards Based on Pathon.

No matter which solution you choose, the overall design concept should not be changed. That is, the overall database service capability should be enhanced through vertical and horizontal data splitting, improve the overall scalability of the application system as much as possible, and the scalability is as convenient as possible.
As long as we solve the problem of data splitting and Data Source Integration through the middle layer Proxy application, the linear scalability of the database will be as convenient as our application, by adding cheap PC Server servers, you can linearly increase the overall service capabilities of the Database Cluster, so that the database does not easily become the performance bottleneck of the application system.

14.6 possible problems in data splitting and integration

Here, you should have a certain understanding of the implementation of data splitting and integration, perhaps many readers have chosen solutions suitable for their application scenarios based on the advantages and disadvantages of various solutions. The next step is to prepare implementation.
Before implementing the data splitting scheme, we still need to analyze some possible problems. Generally, we may encounter the following problems:
◆ Introduce distributed transactions;
◆ Cross-node Join;
◆ Cross-node merge and sorting paging;

1. Introduction of distributed transactions
Once the data is split and stored in multiple MySQL servers, no matter how perfect our splitting rules are (in fact there is no perfect splitting rule ), it is possible that the data involved in some previous transactions is no longer in the same MySQL Server.
In such a scenario, if our applications still follow the old solution, distributed transactions are required. In MySQL versions, only versions after MySQL 5.0 support distributed transactions. Currently, only Innodb supports distributed transactions. In addition, even if we use the MySQL version that supports distributed transactions and the Innodb Storage engine, distributed transactions consume a lot of system resources, the performance itself is not too high. In addition, the introduction of distributed transactions will bring more difficult factors for exception handling.
What should I do? In fact, we can solve this problem through a work und. The first thing to consider is: is the database the only one that can solve the transaction? In fact, this is not the case. We can work together with databases and applications. Each database solves the transactions on its own, and then controls the transactions on multiple databases through applications.
That is to say, as long as we are willing, we can split a distributed transaction across multiple databases into multiple small transactions that are only on a single database, and control each small transaction through the application. Of course, the requirement for this is that our Russian applications must be robust enough, and of course it will bring some technical difficulties to the applications.

2. Cross-node Join
The preceding section describes the possible introduction of distributed transactions. Now let's look at the cross-node Join problem. After data splitting, some old Join statements may fail to be used, because the data source used for Join may be split into multiple MySQL servers.
What should I do? From the MySQL database perspective, if you have to solve this problem directly on the Database End, I am afraid it can only be solved through a special storage engine Federated of MySQL. The Federated storage engine is a solution for MySQL to solve problems similar to Oracle's DB Link. The main difference with Oracle DB Link is that Federated stores a definition of the remote table structure locally. At first glance, Federated is indeed a very good solution for cross-node Join. However, we should also be clear that if the remote table structure is changed, the local table definition information will not change accordingly. If the local Federated table definition information is not updated when the remote table structure is updated, the Query operation may fail and the correct results may not be obtained.
To solve such problems, we recommend that you use an application to obtain the corresponding driver result set in the MySQL Server where the driver table is located, then, the data is retrieved from the MySQL Server where the driver table is located based on the driver result set. Many readers may think that this will have a certain impact on the performance. Yes, it does have a certain negative impact on the performance, but in addition to this method, basically, there are not many other better solutions. In addition, since the database has been well expanded, the load of each MySQL Server can be well controlled, simply for a single Query, the response time may be higher than before splitting, so the negative impact of performance is not too great. What's more, there are not many requirements for cross-node Join, which may be a small part of the overall performance. Therefore, for the sake of overall performance, it is worthwhile to sacrifice a little bit occasionally. After all, system optimization itself is a process of many trade-offs and balances.

3. Cross-node Merge Sorting Paging
After the data is horizontally split, not only cross-node Join operations may fail, but some data sources of Query statements sorted by pages may also be split to multiple nodes, the direct consequence is that the sorting paging Query cannot continue to run normally. In fact, this is the same as cross-node Join. The data source exists on multiple nodes. To solve this problem through a Query, it is the same operation as cross-node Join. Federated can also be partially resolved, and of course there are also risks. What should I do with the same problem? I also continue to suggest using applications.
How can this problem be solved? The solution is basically similar to the solution of cross-node Join, but one thing is different from that of Cross-node Join. Join often has a drive-driven relationship, therefore, Data Reading between multiple tables involved in the Join operation usually has an ordered relationship. However, sorting pages are not the same. The data source of sorting pages is basically a table (or a result set) and there is no sequential relationship, therefore, the process of retrieving data from multiple data sources can be completely parallel. In this way, the efficiency of retrieving the paging data is higher than that of Cross-database Join, resulting in a lower performance loss, in some cases, it may be more efficient than in databases where data is not split. Of course, whether it is cross-node Join or cross-node sorting paging, our application server will consume more resources, especially memory resources, because we need to process more data than the original process in the Process of reading access and merging result sets.
After analysis, many readers may find that all of the above problems are basically solved by applications. Everyone may be confused, isn't it because I am a DBA, so I threw a lot of things to the application architects and developers? In fact, this is not the case at all. First, because of its particularity, the application can easily achieve good scalability, but the database is different. It must be expanded in many other ways, in addition, it is difficult to avoid some problems that can be solved in a centralized database but are split into a database cluster. To maximize the overall expansion of the system, we can only let the application do more things to solve the problem that the database cluster cannot better solve.

Conclusion 14.7

Data splitting technology is used to split a large MySQL Server into multiple small MySQL servers. This not only solves the write performance bottleneck problem, but also improves the scalability of the entire database cluster. Both vertical and horizontal splitting make the system less likely to encounter bottlenecks. Especially when we use the vertical and horizontal splitting methods, we will not encounter expansion bottlenecks theoretically.


From: MySQL performance tuning and architecture design, Jian Chaoyang

Reprinted please indicate the source:

Author: JesseLZJ

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.