MySQL performance tuning and architecture design-the 14th chapter of Scalable Design data segmentation

Source: Internet
Author: User
Tags database join joins mysql version table definition

The 14th chapter of Scalable Design data segmentation

Preface

The extension implemented by MySQL Replication functionality is always limited by the size of the database, and once the database is too large, especially if it is too frequent to be supported by a single host, we still face an expansion bottleneck. At this point, we have to find other technical means to solve this bottleneck, that is, this chapter is to introduce the bad data segmentation technology.

14.1 What is Data segmentation

Perhaps a lot of reader friends on the Internet or in magazines have seen the relevant articles on data segmentation, but in some articles called Data sharding. In fact, whether it is called data sharding or data segmentation, the concept is the same. Simply put, it means that the data we store in the same database is distributed to multiple databases (hosts) through a certain condition to achieve the effect of dispersing the load of a single device. Data segmentation can also improve the overall usability of the system, because after a single device Crash, only one part of the overall data is unavailable, not all data.
The segmentation of data (sharding) can be divided into two segmentation modes according to the type of its segmentation rules. One is to separate tables (or schemas) into different databases (hosts), which can be referred to as vertical (vertical) Segmentation of data, the other is based on the data in the table of the logical relationship, the data in the same table by a certain conditions split into multiple databases (hosts), This segmentation is called horizontal (horizontal) slicing of data.
The biggest feature of vertical slicing is the simple rules, the implementation is more convenient, especially suitable for each business between the coupling degree is very low, mutual influence is very small, business logic is very clear system. In this system, it is easy to split the tables used by different business modules into different databases. Depending on the table to split, the impact on the application is also smaller, the split rule will be relatively simple and clear.
Horizontal segmentation is relatively slightly more complex than vertical slicing. Because to split different data from the same table into different databases, the split rule itself is more complex than the table name for the application, and later data maintenance is more complex.
When one (or some) of our tables have a particularly large amount of data and access, and still can't meet performance requirements by vertically slicing them on separate devices, then we have to combine vertical and horizontal segmentation, and then slice vertically, then horizontally, to solve the performance problems of this very large table.
In this paper, we will analyze the implementation of the three kinds of data segmentation methods, such as vertical, horizontal and combined segmentation, and the integration of data after segmentation.

14.2 Vertical segmentation of data

Let's take a look at how the vertical segmentation of data is a segmentation method. Vertical segmentation of data can also be referred to as vertical segmentation. Think of the database as a large chunk of chunks of "data blocks" (tables), we cut these "chunks" vertically, and then scatter them across multiple database hosts. Such a segmentation method is a vertical (longitudinal) data segmentation.
A good architecture design of the application system, its overall function is certainly composed of many functional modules, and each function module needs to correspond to the database is one or more tables. In architecture design, the more unified the interaction points of each function module, the less the coupling degree of the system, the better the maintainability and expansibility of each module. Such a system, it is easier to achieve vertical segmentation of data.
The clearer the function module, the lower the coupling, the easier the rule definition for vertical segmentation of data. Can completely according to the function module to the data segmentation, the different function module's data holds in the different database host, can easily avoid the cross-database Join existence, simultaneously the system architecture is also very clear.

Of course, it is very difficult to have the system to do all the functions of the table is completely independent, do not need to access the other side of the table or a table of two modules to Join operation. In this case, we have to evaluate the tradeoffs based on the actual application scenario. The decision is that the application will have to store the tables related to the join in the same database, or let the application do more things, that is, the program completely through the module interface to obtain data from different databases, and then complete the join operation in the program.
In general, if the load is relatively not very large system, and the table association is very frequent, it may be the database concessions, a few related modules together to reduce the work of the application of the program can reduce more workload, is a feasible scenario.
Of course, through the concession of the database, so that multiple modules to centralize the sharing of data sources, in fact, the introduction of the tacit acquiescence of the modular architecture of the increase in the development of coupling, may make the future of the architecture more and more deteriorated. Especially when the development to a certain stage, found that the database can not bear the pressure of these tables, have to face the re-segmentation, the cost of the architecture can be far greater than the initial time.
Therefore, in the database to vertical segmentation, how to slice, to what extent, is a comparative test of people's problems. Only by balancing the costs and benefits of each aspect in a real-world scenario can you analyze a split plan that really suits you.
For example, in the example database of the sample system used in this book, we analyze it briefly and then design a simple segmentation rule to split vertically and vertically.
System functions can be basically divided into four functional modules: users, group messages, albums and events, respectively, corresponding to the following tables:

1. User Module table: user,user_profile,user_group,user_photo_album
2. Group Discussion table: Groups,group_message,group_message_content,top_message
3. Album Related table: Photo,photo_album,photo_album_relation,photo_comment
4. Event Information table: Events

At first glance, no module can be separated from the other modules, there is a relationship between the module and the module, can not be segmented?
Of course not, we have a little more in-depth analysis, we can find that although the various modules used by the table are related, but the relationship is relatively clear, but also relatively simple.
The main existence between group discussion module and user module is related by user or group relation. The General Association of the time will be through the user ID or nick_name and group ID to associate, through the interface between the modules do not bring too much trouble;

The album module exists only with the user module that is associated with the user. The correlation between the two modules is basically related to the content through the user ID, simple and clear, the interface is clear;

The event module may be associated with each module, but it can be easily split by focusing only on the ID information of the objects in each module.

Therefore, our first step can be the database according to the function module related to the table for a vertical split, each module involved in the table alone into a database, module and module of the table association between the application system side through an excuse to handle.

As shown in the following:

After such vertical slicing, the service can be divided into four databases to provide service, and the service ability is increased several times.
Advantages of vertical slicing
The splitting of the database is simple and clear, and the splitting rule is explicit;
The application module is clear and easy to integrate.
Data maintenance convenient and easy to locate;
Disadvantages of vertical slicing
Some table associations cannot be completed at the database level and need to be completed in the program;
There is still a quiet performance for tables with extremely frequent access and large data volumes, and may not necessarily meet the requirements;
Transaction processing is relatively more complex;
When the segmentation reaches a certain extent, the extensibility will be limited;
Over-slicing can lead to systems that are overly complex and difficult to maintain.
For vertical slicing, it is difficult to find a better solution to the data segmentation and transaction problems in the database level. In practical cases, the vertical segmentation of database is mostly corresponding to the module of application system, and the data source of the same module is stored in the same database, which can solve the problem of data association inside the module. Between modules, the application provides the required data to each other through the service interface. While this does increase in the overall number of operations on the database, it is beneficial in terms of overall system scalability and modularity of the architecture.

There may be a slight increase in the single response time for some operations, but the overall performance of the system is likely to be somewhat improved. The problem of scaling bottlenecks can only be solved by relying on the data-level segmentation architecture that will be introduced in the next section.

14.3 Horizontal segmentation of data

The above section analyzes the vertical segmentation of data, which is analyzed horizontally. The vertical segmentation of data can be easily understood as the partitioning of the data according to the table, and the horizontal segmentation is no longer based on the table or function module to slice. In general, simple horizontal slicing is the main way to spread an extremely mundane table into multiple tables, with a subset of the data in each table, according to some rule of a field.
To put it simply, we can understand the horizontal segmentation of the data as a segmentation of data rows, that is, some rows in a table are sliced into one database, and some other rows are sliced into other databases. Of course, in order to make it easier to decide which database the rows of data are being sliced into, the Shard always needs to follow a specific rule.

The range of a Time Type field, or the hash value of a field of a character type, based on a specific number of fields, depending on a number type field. If most of the core tables in the entire system can be associated with a field, then this field is naturally a choice for horizontal partitioning, and, of course, very special to use.
Generally speaking, as the internet is very popular Web2.0 type of Web site, basically most of the data can be linked through the member user information, many of the core tables may be very suitable for the membership ID for the horizontal segmentation of data. And like the Forum community discussion system, it is easier to slice, it is very easy to follow the forum number for the horizontal segmentation of data.

After slicing, there is basically no interaction between the libraries.
As with our example system, where all data is associated with the user, we can split the data from different users into different databases based on the user's horizontal splitting. Of course, the only difference is that the groups table in the user module is not directly related to the user, so the groups cannot be split horizontally according to the user. For this particular case table, we can completely stand alone and put it in a separate database. In fact, this approach can be described in the previous section of the "vertical segmentation of data" method, I will in the next section more detailed introduction of this vertical segmentation and horizontal segmentation of the simultaneous use of the joint segmentation method.
So, for our sample database, most of the tables can be sliced horizontally based on the user ID. Different user-related data are sliced and stored in different databases. For example, all user IDs are modeled by 2 and stored in two different databases respectively. Each table that is associated with a user ID can be sliced this way. In this way, basically each user-related data is in the same database, even if it needs to be associated, it can be very simple association.
We can display the information about horizontal segmentation more intuitively by:

Advantages of horizontal slicing
Table Association is basically able to complete on the database side;
There is no problem that some very large data volumes and high-load tables encounter bottlenecks;
Application-side Overall schema changes are relatively small;
Transaction processing is relatively simple;
As long as the segmentation rules can be defined, it is basically more difficult to meet the extensibility limit;
Disadvantages of horizontal slicing
The segmentation rules are relatively more complex, and it is difficult to abstract a segmentation rule that satisfies the whole database.
Later data maintenance difficulty has increased, manual positioning of data is more difficult;
The coupling degree of each module in the application system is high, which may cause some difficulties in the migration and splitting of the later data.

14.4 Use of vertical and horizontal joint segmentation
In the above two sections, we understand the implementation of the two methods of "vertical" and "horizontal" and the architecture information after the segmentation, and also analyze the advantages and disadvantages of the two architectures respectively. But in the actual application scenario, except that the load is not too large, the business logic is relatively simple system can solve the extensibility problem by one of the two methods above, I am afraid that most of the other business logic is slightly more complex, the system load a larger system, Can not be achieved by any of the above data segmentation method to achieve better scalability, and need to use the two methods of segmentation, different scenarios using different segmentation methods.

In this section, I will combine the pros and cons of vertical slicing and horizontal slicing to further refine our overall architecture and further enhance the scalability of the system.
In general, all the tables in our database are difficult to correlate with one (or a few) fields, so it is difficult to simply solve all the problems by just slicing the data horizontally. Vertical segmentation can only solve some problems, for those systems with very high load, even if only a single table can not be a single database host to bear its load. We must combine the two methods of "vertical" and "horizontal" to make full use of the advantages of both to avoid their shortcomings.
The load on each application is growing step after time, and most architects and DBAs choose to start with a vertical split of the data at the beginning of a performance bottleneck, because the cost is first and foremost in line with the maximum input-output ratio pursued during this period. However, with the continuous expansion of the business, the system load continues to grow, after a period of stable system, after the vertical split of the database cluster may again overwhelmed, encountered a performance bottleneck.
How do we decide this time? Is the module further subdivided, or is there another way to solve it? If we continue to subdivide the modules again as we did at the beginning of the data, we may in the near future encounter the same problems that we are facing now. And with the continuous refinement of the module, the application system architecture will become more and more complex, the whole system is likely to appear out of control situation.
At this point we have to solve the problem here through the advantage of the horizontal segmentation of data. Moreover, we do not have to use the data at the time of horizontal segmentation, the previous data vertical segmentation of the results, but on the basis of the advantage of horizontal segmentation to avoid the shortcomings of vertical segmentation, to solve the problem of increasing system complexity. The horizontal split of the drawbacks (the rule is difficult to unify) has been solved by the vertical segmentation, so that the horizontal split can be done handy.
For our sample database, let's say we started with vertical segmentation of the data, but as the business grew, the database system encountered bottlenecks, and we chose to refactor the database cluster's architecture. How to Refactor? Consider the vertical segmentation of the data before, and the structure of the module is clear and clear. And the growth of the business is getting more and more fierce, even now further split the module, will not persist for too long. We chose to split horizontally on the basis of vertical segmentation.
Each database cluster that has experienced a vertical split has only one function module, and all the tables in each feature module are basically associated with a field. If the user module all can be cut through the user ID, the group discussion module is divided by the group ID, the album module is based on the album ID to the segmentation, the final Event notification information table to take into account the time limit of the data (only access to a recent event segment information), then consider the time to slice.
Shows the entire architecture of the Shard:

In fact, in many large-scale application systems, the two data segmentation methods, vertical slicing and horizontal cutting, are basically co-existent, and constantly alternating, in order to continuously increase the system's ability to expand. When dealing with different scenarios, we also need to take into account the respective limitations of these two methods, as well as their respective advantages, and use different combinations at different times (load pressures).

Benefits of Joint Segmentation
Can make full use of vertical segmentation and horizontal segmentation of their respective advantages to avoid their own shortcomings;
Maximize system scalability;

Disadvantages of Joint segmentation
The database system architecture is more complex and more difficult to maintain.
The application architecture is also relatively more complex;

14.5 Data segmentation and integration solutions

Through the previous chapters, we have been very clear that the data segmentation through the database can greatly improve the scalability of the system. However, data in the database after vertical and/or horizontal segmentation is stored in different database hosts, the biggest problem of application system is how to make these data sources better integration, perhaps this is a lot of readers friends are very concerned about a problem. Our main focus in this section is to analyze the various solutions that can be used to help us achieve data segmentation and data integration.
Data integration is difficult to rely on the database itself to achieve this effect, although MySQL exists federated storage Engine, can solve some of the similar problems, but in the actual application scenario, but it is difficult to use better. So how do we integrate these data sources that are scattered across MySQL hosts?
In general, there are two ways to solve this problem:

1. In each application module to configure the management of their own needs of one (or more) data sources, direct access to each database, within the module to complete the integration of data;
2. Through the intermediary agent layer to manage all the data sources uniformly, the backend database cluster is transparent to the front-end application;

Perhaps more than 90% of people in the face of the above two ways of thinking will be inclined to choose the second, especially when the system is constantly becoming large and complex. Indeed, this is a very correct choice, although the cost in the short term may be relatively larger, but it is very helpful for the overall system scalability.
Therefore, for the first solution I am not prepared to analyze too much here, I would like to focus on the second solution to the idea of some of the solutions.

★ Self-developed intermediate agent layer
After deciding to choose an intermediate proxy layer through the database to address the architectural direction of data source consolidation, many companies (or enterprises) have chosen to develop their own agent-tier applications that meet their own application-specific scenarios. The
Self-developed intermediary agent layer can maximize the specific application of its own, maximize the customization of many personalized needs, in the face of change can also be flexible response. This should be said to be the biggest advantage of self-development agent layer.

Of course, choosing to develop on your own and enjoy maximizing personalization is also a natural need to invest more in pre-development and post-upgrade improvements, and its own technical thresholds may be higher than simple Web applications. Therefore, it is better to make a more comprehensive assessment before deciding to choose your own development.
Due to self-development more time to consider is how to better adapt to their own application system, to deal with their own business scenarios, so here is not too much analysis. In the following we mainly analyze the current more popular data source integration solutions.

★ Use MySQL Proxy for data segmentation and integration
MySQL Proxy is a database agent layer that is officially provided by MySQL, and, like MySQL Server, is an open source product based on the GPL Open source protocol. can be used to monitor, analyze, or transmit communications between them. His flexibility allows you to use it to the fullest, with features such as connection routing, query analysis, query filtering and modification, load balancing, and basic HA mechanisms.
In fact, the MySQL Proxy itself does not have all of these features, but rather provides the basis for implementing the functionality described above. To implement these features, we also need to write LUA scripts ourselves.
The MySQL Proxy actually establishes a connection pool between the client request and MySQL Server. All client requests are sent to the MySQL proxy and then analyzed by MySQL proxy to determine whether the read or write operation is distributed to the appropriate MySQL Server. For multi-node Slave clusters, load balancing can also be achieved. Here is the basic architecture diagram for MySQL Proxy:

Through the above architecture diagram, we can clearly see the MySQL Proxy in the actual application of the location, and can do the basic things. More detailed implementation details of MySQL Proxy in the official MySQL document has a very detailed introduction and examples, interested readers can be directly from the official MySQL Web site download or online reading, I am not tired of wasting paper.

★ Use Amoeba to achieve data segmentation and integration
Amoeba is a Java-based open source framework that focuses on solving distributed database data source integration Proxy programs, based on the GPL3 Open source protocol. At present, amoeba already has query routing, query filtering, read/write separation, load balancing and HA mechanism and other related content.

Amoeba mainly addresses the following issues:
1. Data segmentation after the integration of complex data sources;
2. Provide data segmentation rules and reduce the impact of data segmentation rules on the database;
3. Reduce the number of connections between the database and the client;
4. Read and write separation routes;
We can see that what amoeba is doing is exactly what we need to improve the scalability of the database through data segmentation.

Amoeba is not an agent layer of proxy program, but a development of the database proxy layer Proxy program development framework, currently based on amoeba developed a proxy program has amoeba for MySQL and amoeba for Aladin two.
Amoeba for MySQL is primarily a solution for MySQL databases, and the protocol requested by front-end applications and the data source database for back-end connections must be MySQL. For any application to the client, amoeba for MySQL and a MySQL database make no difference, any client requests that use the MySQL protocol can be parsed by amoeba for MySQL and processed accordingly. Below can tell us amoeba for MySQL architecture information (from the Amoeba Developer blog):

Amoeba for Aladin is a more extensive and powerful Proxy program. He can connect data sources from different databases at the same time to serve front-end applications, but only accept client application requests that meet the MySQL protocol. That is, as long as the front-end application is connected through the MySQL protocol, amoeba for Aladin automatically parses the query statement to automatically identify which physical host of the type database the data source of the query is on, based on the data requested in the query statement. Shows the architectural details of amoeba for Aladin (from the Amoeba Developer blog):

At first glance, the two seem to be exactly the same. After a closer look, you will find that the main difference is only after the MySQL protocal adapter processing, based on the results of the analysis of the data source database, and then select a specific JDBC driver and the corresponding protocol to connect to the backend database. In fact, through the above two frame composition you may have discovered the characteristics of the amoeba, he is just a development framework, we have to choose what he has provided for MySQL and for Aladin these two products, but also based on their own needs for the corresponding two development, To get more adapted to our own application characteristics of the Proxy program.
When it comes to using MySQL database, both amoeba for MySQL and amoeba for Aladin can be used very well. Of course, given the complexity of any one system, its performance will certainly have a certain loss, maintenance costs will naturally be relatively higher. So, I recommend using amoeba for MySQL only when I need to use MySQL database.
Amoeba for MySQL is very simple to use, all the configuration files are standard XML files, a total of four configuration files. The following were:

Amoeba.xml: Master configuration file, configure all data sources and amoeba parameter settings;
Rule.xml: Configure information for all Query routing rules;
Functionmap.xml: Configures the Java implementation class that is used to parse the function in Query;
Rullfunctionmap.xml: The implementation class that configures the specific functions that need to be used in a routing rule;
If your rules are not too complex, you basically need to use only the first two of the four configuration files above to do all the work. Proxy programs commonly used functions such as read-write separation, load balancing and other configurations are carried out in the amoeba.xml.
In addition, amoeba has supported automatic routing for vertical and horizontal segmentation of data, and routing rules can be set in Rule.xml.
At present, the main lack of amoeba is its online management function and support for the business, has been in the communication process with the relevant developers have made relevant recommendations, I hope to provide an online maintenance management of the command-line management tools, easy to use online maintenance, The feedback received is that management-specific management modules have been incorporated into the development agenda. In addition, the transaction support is temporarily or amoeba cannot do, even if the client application in the request submitted to amoeba contains transaction information, Amoeba will also ignore the transaction-related information. Of course, after continuous improvement, I believe that business support is definitely the amoeba of the feature.
More detailed usage of Amoeba reader friends can be obtained from the user manual provided on the Amoeba Developer blog (http://amoeba.sf.net), which is not described here.

★ Use HIVEDB to achieve data segmentation and integration
Like the previous MySQL Proxy and Amoeba, Hivedb is also an open source framework for data segmentation and consolidation based on Java for MySQL databases, but the current hivedb only supports horizontal segmentation of data.

It mainly solves the expansibility of database and high-performance access problem of data, and supports the redundancy of data and the basic HA mechanism.
Hivedb implementation mechanism and MySQL Proxy and amoeba have a certain difference, he did not use the MySQL Replication function to achieve data redundancy, but the implementation of the data redundancy mechanism, and its underlying mainly based on Hibernate shards to real Data segmentation work now.

In Hivedb, data is distributed across multiple MySQL servers through a user-defined set of Partition keys, in fact, a data segmentation rule. At the time of access, when the Query request is run, the filter conditions are automatically parsed, the data is read in parallel from multiple MySQL servers, and the result set is merged back to the client application.
In terms of functionality alone, Hivedb may not be as powerful as MySQL Proxy and amoeba, but the idea of data segmentation is not fundamentally different from the previous two. In addition, Hivedb is not just a content shared by open source enthusiasts, but an open source project supported by commercial companies.
The following is an image of the official website of Hivedb, which describes how Hivedb organizes the basic information of the data, although it can not show too much information about the architecture in detail, but also basically shows its unique aspect in data segmentation.

★ Other solutions for data segmentation and integration
In addition to the several data segmentation and integration solutions described above, there are many other solutions that also provide data segmentation and integration. such as the Hscale based on MySQL Proxy, the Spock proxy built through Rails, and pyshards based on Pathon, and so on.

Regardless of which solution you choose to use, the overall design approach should basically not have any change, that is, through the vertical and horizontal segmentation of data, enhance the overall service capabilities of the database, so that the overall expansion of the application system as far as possible, the expansion of the way as easy as possible.
As long as we can better solve the problem of data segmentation and data source integration through the middle-tier Proxy application, the linear scalability of the database will be easily as convenient as our application, only need to add a cheap PC server server, the overall service capacity of the database cluster is increased linearly, Make the database no longer an easy performance bottleneck for application systems.

14.6 possible problems in data segmentation and integration

Here, we should have a certain understanding of the implementation of data segmentation and integration, perhaps many readers have been based on various solutions to the merits of the respective characteristics of the basic selection of suitable for their own application scenarios, the work behind the main is the implementation of the preparation.
There are some possible problems that we need to do before we can implement a data-splitting scheme. In general, the problems we may encounter are mainly as follows:
The problem of introducing distributed transaction;
Cross-node Join problem;
Cross-node merge sorting paging problem;

1. Problems with the introduction of distributed Transactions
Once the data is segmented and stored in multiple MySQL servers, no matter how perfect our segmentation rules are (in fact, there is no perfect segmentation rule), the data involved in some of the previous firms may not be the same MySQL Server.
In such a scenario, if our application is still in accordance with the old solution, then it is bound to introduce a distributed transaction to solve. In each version of MySQL, the distributed transaction is supported only after the beginning of MySQL 5.0, and only INNODB provides distributed transaction support. Not only that, even if we are using a MySQL version that supports distributed transactions, but also a Innodb storage engine, the distributed transaction itself consumes a lot of system resources, and the performance itself is not too high. And the introduction of distributed transaction itself in the exception processing will bring more difficult to control factors. What about the
? In fact, we can use a workaround to solve this problem, the first thing to consider is: whether the database is the only place to solve the transaction? This is not the case, we can completely combine the database and the application both to solve together. Each database resolves its own transactions, and then uses the application to control transactions on multiple databases.
that is, we can split a distributed transaction across multiple databases into multiple small transactions that are only on a single database, as long as we want, and control each small transaction through the application. Of course, the requirement for this is that our Russian application must be robust enough to provide some technical difficulties for the application as well.

2. Problems with cross-node join
The problem that might introduce a distributed transaction is described above, and now let's look at the problem of cross-node join. Data segmentation may cause some old join statements to be unusable because the data source used by Join may be sliced into multiple MySQL servers. What about the
? This problem from the MySQL database point of view, if it has to be directly resolved on the database side, I am afraid only through the MySQL a special storage engine federated to solve. The Federated storage Engine is a solution for MySQL to address issues like Oracle's DB Link. The main difference from Oracle DB Link is that federated saves a copy of the remote table structure's definition information locally. At first glance, Federated is really a great solution for cross-node joins. But we should also be clear, it seems that if the remote table structure is changed, the local table definition information will not follow the corresponding changes. If you update the remote table structure without updating the local federated table definition information, it is likely to cause Query to run out of error and not get the correct results.
for this kind of problem, I still recommend the application for processing, first in the MySQL server where the driver table is located to remove the corresponding drive result set, and then based on the driving result set and the drive table is located in the MySQL server to fetch the corresponding data. Many readers may think that this will have a certain impact on performance, yes, it does have a certain negative impact on performance, but in addition to this method, there are basically not many other better solutions. Moreover, since the database through a good extension, the load of each MySQL Server can be better controlled, simply for a single Query, its response time may be higher than the non-segmentation, so the performance of the negative impact is not too large. What's more, similar to the need for cross-node joins is not too much, and probably only a small part of the overall performance. So for the overall performance of the consideration, the occasional sacrifice a little bit, in fact, is worth it, after all, the system optimization itself is a lot of trade-offs and balance process.

3. Cross-node merge sort paging problems
Once the data has been sliced horizontally, it may not be just that cross-node Join does not work, and the data source for some sort of paged query statements may also be sliced to multiple nodes, so the direct consequence of these sort paging Query cannot continue to function properly. In fact, this and cross-node join is a reason, the data source exists on multiple nodes, to be resolved through a Query, and cross-node join is the same operation. The same federated can be partially solved, of course, the same risks exist. Or the same problem, how to do? I also continue to recommend that the application be addressed.
How do i fix it? The solution is broadly similar to that of a cross-node join, but there is a point that joins the cross-node join, which often has a driver-driven relationship, so there is generally a sequential relationship between the data reads of multiple tables involved in the join itself. But sort of paging is not the same, the sort of paging data source can basically be said to be a table (or a result set), itself does not exist a sequential relationship, so the process of fetching data from multiple data sources can be completely parallel. In this way, the fetch efficiency of the sorted paging data can be higher than the cross-library Join, so the performance loss is relatively smaller and in some cases more efficient than the original data-splitting database. Of course, whether it's cross-node Join or sorting across nodes, our application server consumes more resources, especially memory resources, because the process of reading access and merging result sets requires more data than was originally processed.
Analysis Here, perhaps a lot of readers will find that all of these problems, I give the advice is basically through the application to solve. People may start to whispered, is it because I am a DBA, so many things are thrown to application architects and developers? In fact, this is not the case, the first application because of its particularity, it can be very easy to do very good extensibility, but the database is different, it must be used in many other ways to expand, and in this expansion process, It is difficult to avoid situations where some of the original problems can be solved in a centralized database, but split into a single DB cluster. In order to maximize the overall system, we can only let the application do more things to solve the problem that the database cluster can not be better solved.

14.7 Summary

Splitting a large MySQL server into smaller MySQL servers through data segmentation solves both write performance bottlenecks and, once again, the scalability of the entire database cluster. Whether it is through vertical slicing or horizontal slicing, the system is less likely to experience bottlenecks. Especially when we use the vertical and horizontal combination of the segmentation method, in theory will no longer encounter the expansion bottleneck.

Excerpt from: "MySQL performance tuning and architecture design" Jane Chaoyang

Reprint please specify the source:

Jesselzj
Source: http://jesselzj.cnblogs.com

MySQL performance tuning and architecture design-the 14th chapter of Scalable Design data segmentation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.