Problem solving strategy of MySQL database sub-Library sub-table technology

Source: Internet
Author: User
Tags database join joins mysql official site mysql version table definition
MySQL Database sub-Library sub-table scheme, once the database is too large, especially when the write is too frequent, very difficult to be supported by a host, we will still face the expansion bottleneck. At this point, we have to find other technical means to solve this bottleneck, that is, this chapter is to introduce the bad data segmentation technology.

MySQL Database slicing


Extensions that are implemented through the Mysqlreplication feature are always limited by the size of the database. Once the database is too large, especially if the write is too frequent, very difficult to be supported by a host, we will still face the expansion bottleneck. At this point, we have to find other technical means to solve this bottleneck, that is, this chapter is to introduce the bad data segmentation technology.

What Is Data segmentation

It's possible that a lot of readers have seen the relevant articles about data segmentation on the Internet or in magazines, just sharding in some articles called data. In fact, whether it is called data sharding or data segmentation, the concept is the same.

Simply put, it means that the data we store in the same database is distributed to multiple databases (hosts) through a certain condition to achieve the effect of dispersing the load of a single device. Data segmentation can also improve the overall availability of the system at the same time, due to the crash of a single device. Only a single part of the overall data is unavailable, not all of the data.

The segmentation of data (sharding) is based on the type of its segmentation rule. Can be divided into two kinds of segmentation mode.

One is to divide it into different databases (hosts) according to different tables (or schemas), which can be called vertical (vertical) segmentation of data. The other is to split the data in the same table into multiple databases (hosts) according to a condition based on the logical relationship of the data in the table. Such segmentation is called horizontal (horizontal) slicing of data.

The most important feature of vertical slicing is that the rules are simple, the implementation is more convenient, especially for the coupling degree between the business is very low. Systems with very small interactions and very clear business logic. In such a system, it is easy to split the tables used by different business modules into different databases. Split according to a different table. The impact on the application is also smaller, and the split rule is simpler and clearer.

Horizontal segmentation is compared to vertical slicing. Relatively slightly more complicated. Because you want to split different data from the same table into different databases, the split rule itself is more complex than the table name for the application, and later data maintenance is more complex.

When one (or some) of our tables have a particularly large amount of data and access, and still can't meet performance requirements by vertically slicing them on separate devices, we have to combine vertical and horizontal segmentation. Slice vertically before slicing horizontally. Ability to solve the performance problems of such super-large tables.

The following is an analysis of the implementation of the three data segmentation methods, such as vertical, horizontal and combined segmentation, and the integration of data after segmentation.

Vertical segmentation of data

Let's take a look at how the vertical segmentation of data is a segmentation method. Vertical segmentation of data. can also be called longitudinal segmentation. Think of the database as a chunk of "data blocks" (tables) that are made up of very many chunks. We cut these "chunks" vertically and spread them across multiple database hosts. Such a segmentation method is a vertical (longitudinal) data segmentation.

An application system with better architecture design. Its overall function is certainly composed of very many functional modules. The data required for each function module corresponds to one or more tables in the database.

In architecture design, the more unified the interaction points of each function module, the less the coupling degree of the system, the better the maintainability and expansibility of each module. Such a system. The easier it is to achieve vertical segmentation of data.

As our functional modules become clearer and the coupling is lower, the rule definition for vertical segmentation of data is easier. Can be completely based on the function module to the data segmentation, different function module data stored in different database host, can be very easy to avoid cross-database join existence. The system architecture is also very clear at the same time.

Of course. It is very difficult to have the system to do all the functions of the table is completely independent, no need to visit the other side of the table or need two modules of the table for join operation. In this case, we have to evaluate the tradeoff based on the actual application scenario. The decision is that the application will need to join the table related to a certain fast storage in the same database, or let the application do a lot of other things, that is, the program completely through the module interface to  from different databases, and then complete the join operation in the program.

Generally speaking. The assumption is that a system with relatively large load is not very big, and the table association is very frequent. That could the database withdraw. Combining several related modules together to reduce the work of the application can reduce the workload. is a viable solution.

Of course. Through the concession of the database, so that a number of modules pooled data sources, in fact, is also a simple introduction of the tacit approval of the modular architecture of the increase in the development of coupling, may make the future of the architecture worse. In particular, when it comes to a certain stage, it is found that the database cannot bear the pressure of these tables. Had to face the time of re-slicing. The cost of the architectural transformation can be far greater than the initial time.

So. In the database of vertical segmentation, how to slice, how to cut to what degree, is a more difficult problem to test people. It is only possible to balance the costs and benefits of all aspects in the actual application scenario. Ability to analyze a really suitable for their own split plan.

For example, in the example database of the Demo sample system used in this book, we analyze It briefly. A simple segmentation rule is then designed to split vertically and vertically.

The system functions can be divided into four functional modules: users, group messages, albums, and events. Correspond for example the following tables respectively:

1. User Module table: user,user_profile,user_group,user_photo_album

2. Group Discussion table: Groups,group_message,group_message_content,top_message

3. Registration Related table: Photo,photo_album,photo_album_relation,photo_comment

4. Event Information table: Events

At first glance, no module can exist independently from other modules, and there is a relationship between modules and modules. Is it impossible to slice?

Of course not, we can find out in a little bit more detail that, although there are correlations between the tables used by each module, the correlation is relatively clear and simpler.

The main existence between group discussion module and user module is related by user or group relation. The general association will be associated with the user's ID or nick_name and group ID. The implementation of the interface between the modules will not cause too much trouble.

The registration module is only associated with the user module through the user. The association between the two modules basically has content that is associated with the user ID. Simple and clear, interface clear;

The event module may be associated with each module, but it only focuses on the ID information of the objects in each module, which can be very easy to split.

So. Our first step is the ability to split the database vertically according to the table associated with the function module. The tables involved in each module are separated into a single database, and the table association between the module and the module is handled by an excuse on the application system side. For example, as seen in:

After such vertical slicing. Services that were previously only available through a single database. is split into four databases to provide services, the ability to service is naturally added several times.

Advantages of Vertical Segmentation

The splitting of the database is simple and clear, and the splitting rules are clear;

The application module is clearly understood and integrates easy.

Easy and easy to locate data maintenance.

Disadvantages of vertical slicing

Partial table associations cannot be completed at the database level. Needs to be completed in the program.

For tables that are extremely frequent and have large data volumes, there is still a quiet performance that does not necessarily meet the requirements.

Transaction processing is relatively more complex;

When the segmentation reaches a certain extent, the extensibility will be limited;

Read-through segmentation can lead to complex system transitions and difficult maintenance.

For vertical slicing, it is very difficult to find a better solution to the problem of data segmentation and transaction in the database level. In practical cases, the vertical segmentation of database is mostly corresponding to the module of application system, and the data source of the same module is stored in the same database, which can solve the problem of data association inside the module. The modules and modules, through the application of the service interface to provide each other the required data.

Although this does add to the overall number of operations on the database, it is intentional in terms of overall system scalability and modularity. A single response time may be added slightly in some operations. However, the overall performance of the system is likely to have a certain increase. and expand the bottleneck problem. Can only be overcome by relying on the data level segmentation architecture that will be introduced in the next section.

Horizontal segmentation of data

The above section analyzes the vertical segmentation of data, which is analyzed horizontally. The vertical segmentation of data can be easily understood as a table in accordance with the module to slice the data, and horizontal segmentation is no longer in accordance with the table or function module to slice. In general, simple horizontal segmentation is primarily a way of spreading a table with an extremely mediocre access to multiple tables in accordance with some rule of a field. Each table includes a subset of the data.

In simple terms. We are able to interpret the horizontal segmentation of data as a segmentation of data rows. is to slice some rows of a table into one database, and some other rows to be sliced into other databases. Of course, in order to be able to easily determine the data of each row is divided into which database, the Shard always need to follow a specific rule.

The range of a time-type field, for example, based on a number-type field that is modeled according to a specific number. Or a hash value for a field of a character type. Assume that most of the core tables in the entire system can be associated with a field. The field is naturally a great choice for horizontal partitioning, and, of course, it is very special and can only be used for other purposes.

In general, Web2.0 types of sites like the Internet are very popular today. Basically, most of the data can be associated with member user information, and perhaps very many core tables are well suited for horizontal segmentation of data through member IDs.

And like forum community discussion system. It's easier to slice, and very easy to slice the data horizontally according to the forum number.

After slicing, there is basically no interaction between the libraries.

Examples of our demo system. All data is associated with the user. Then we can split the data of different users into different databases according to the user's horizontal splitting. Of course, the only difference is that the groups table in the user module does not have a direct relationship with the user. Therefore, groups cannot be split horizontally according to the user. We can be completely independent of the table in this particular case. placed separately in a separate database.

In fact, this approach can be said to take advantage of the "vertical segmentation of data" approach described in the previous section. I'll be more specific in the next section on the joint segmentation method used at the same time as vertical slicing and horizontal slicing.

So, for our demo sample database, most of the tables can be sliced horizontally based on the user ID. Different user-related data are sliced and stored in different databases. For example, all user IDs are modeled by 2 and stored in two different databases respectively.

Each table that is associated with a user ID can be so segmented. In this way, basically every user-related data. are in the same database, even if they need to be correlated, can be very simple associations.

We are able to display horizontal segmentation related information through more intuitive presentation: The strengths of horizontal segmentation

Table Association is basically able to complete the database end;

There is no problem that some very large data volumes and high-load tables encounter bottlenecks;

The application-side overall schema modification is relatively small;

Transaction processing is relatively simple;

Only the segmentation rules can be defined well. Is basically more difficult to meet the extensibility limit;

Disadvantages of horizontal slicing

The segmentation rules are relatively more complex, it is very difficult to abstract a rule that satisfies the whole database;

Later data maintenance difficulty has been added, manual positioning of data is more difficult;

The coupling degree of each module in the application system is high, which may cause some difficulties in the migration and splitting of the later data.

Joint use of vertical and horizontal slicing

In the above two sections. We understand the implementation of the two ways of slicing "vertical" and "horizontal" and the schema information after slicing. The pros and cons of the two architectures are also analyzed at the same time. But in the actual application scenario, except those loads are not too large. The business logic is also relatively simple and the system can solve the extensibility problem through one of the above two segmentation methods. I am afraid that most of the other business logic is slightly more complex, the system load system, can not be any of the above no matter what kind of data segmentation method to achieve better scalability. The two methods need to be used in combination, the different scenarios use different segmentation methods.

In this section. I will combine the advantages and disadvantages of vertical slicing and horizontal slicing, and further improve our overall architecture to further enhance the scalability of the system.

Generally speaking. All the tables in our database are very difficult to relate to by one (or a few) fields, so it is very difficult to simply solve all the problems by just slicing the data horizontally. Vertical segmentation can only solve some problems, for those systems with very high load, even if only a single table can not be a single database host to bear its load.

We must combine the two methods of "vertical" and "horizontal" to use the same time, make full use of the advantages of both, to avoid its shortcomings.

The load on each application is growing step-by-point, and most architects and DBAs choose to start with a vertical split of the data at the start of a performance bottleneck, as such costs are first. Most in line with this period of the pursuit of the maximum input-output ratio. However. As the business continues to expand. The continuous growth of the system load, after a period of stable system, after the vertical split of the database cluster may again overwhelmed, encountered a performance bottleneck.

How do we decide this time? Is the module further subdivided, or is there another way to solve it? Let's say we continue to subdivide the module again as we did in the beginning, to split the data vertically, and we may encounter the same problems we face today in the near future. And with the continuous refinement of the module, the application system architecture will become more and more complex, the whole system is likely to appear out of control situation.

At this point we have to solve the problem here through the advantage of the horizontal segmentation of data. Furthermore, it is absolutely unnecessary for us to use the horizontal segmentation to avoid the drawbacks of vertical segmentation by using the advantage of horizontal slicing when we tear down the results of data vertical segmentation. Solve the problem of increasing system complexity.

The disadvantage of horizontal splitting (the rule is difficult to unify) has been solved by the vertical segmentation. Let the horizontal split can be done handy.

For our demo sample database. Suppose at the very beginning. We have a vertical segmentation of the data, but as the business continues to grow, the database system encounters bottlenecks, and we choose to refactor the database cluster's architecture. How to Refactor? Consider that the vertical segmentation of the data has been done before, and the module structure is clearly understood.

And business growth is gaining momentum. Even if the module is further split now, it will not last long.

We chose to split horizontally on the basis of vertical segmentation.

Each database cluster that has experienced a vertical split has only one function module. All the tables in each function module are basically associated with a field. If the user module is all capable of slicing through the user ID, the group discussion module is segmented by the group ID. The registration module is segmented according to the ID of the album. The final Event Notification information table takes into account the temporal nature of the data (only the information that is given to a recent event segment), which is considered to be segmented by time.

Shows the entire architecture of the Shard:

In fact, in a very large number of applications, the two data segmentation methods, vertical slicing and horizontal cutting, basically coexist. and constantly alternating, in order to constantly add the system's ability to expand. When dealing with different scenarios, we also need to take into account the limitations of these two methods and their respective advantages. Different combinations are used at different times (load pressure).

The strengths of joint segmentation

Can make full use of vertical segmentation and horizontal segmentation of their respective advantages to avoid their own shortcomings;

Maximize system scalability.

Disadvantages of Joint segmentation

The database system architecture is more complex. Maintenance is much more difficult.

The application architecture is also relatively more complex;

Data segmentation and Integration solutions

Through the previous chapters. We've made it very clear that data slicing through a database can greatly improve the scalability of the system. However, after the data in the database has been divided vertically and/or horizontally and stored in different database hosts, the biggest problem facing the application system is how to make these data sources better integrated. Perhaps this is also a problem that a lot of reader friends are very concerned about. Our main focus in this section is to analyze the broad range of solutions that can be used to help us achieve data segmentation and data integration.

Data integration is very difficult to rely on the database itself to achieve this effect, although MySQL exists federated storage Engine, can solve some similar problems. However, it is very difficult to use it in practical application scenarios. So how do we integrate these data sources that are scattered across MySQL hosts?

In general, there are two ways to solve this problem:

1. Configure one (or more) data sources to manage your own needs in each application module. Direct access to each database, complete the integration of data within the module;

2. Unified management of all data sources through the intermediary agent layer. The backend database cluster is transparent to the front-end application;

Perhaps more than 90% of people in the face of the above two ways of thinking will be inclined to choose another, especially when the system continues to become large and complex.

Do. This is a very correct choice, although the cost in the short term may be relatively larger, but it is very helpful for the overall system's scalability.

Therefore, for the first solution I do not prepare for too much analysis here, the following I would like to focus on another way to solve some of the solutions.

★ Self-developed intermediate agent layer

After deciding to choose an intermediate proxy layer through the database to address the architectural direction of data source consolidation, many companies (or enterprises) have chosen to develop their own agent-tier applications that meet their own application-specific scenarios.

By developing the intermediary agent layer on its own, it can deal with the specific of its own application. Maximize customization with a lot of individual needs and be able to respond flexibly when faced with change. This should be said to be the biggest advantage of self-development agent layer.

Of course, the choice of self-development, to enjoy the fun of maximizing personalized customization at the same time, it is also necessary to invest a lot of other costs for the early development and continuous upgrading of the post-improvement work. and the technical threshold itself may be higher than a simple Web application. Therefore, it is necessary to conduct a more comprehensive assessment before deciding on the choice of self-development.

Due to the development of a lot of other times to consider how to better adapt to their own application system, to deal with their own business scenarios, so here is not too much analysis. In the following we mainly analyze the current more popular data source integration solutions.

★ Use Mysqlproxy to achieve data segmentation and integration

Mysqlproxy is a database agent tier product that is officially provided by MySQL, and, like MySQLServer, is an open source product based on the GPL open source protocol. can be used to monitor, analyze, or transmit communications between them. His flexibility allows you to use it to the fullest, and the features you have today are mainly connected routing, query analysis, query filtering and modification, and load balancing. and the main ha mechanism.

In fact, Mysqlproxy itself does not have all of these features. Instead, it provides the basis for implementing the above functions.

To implement these functions, we also need to write Lua scripts ourselves.

Mysqlproxy actually establishes a connection pool between the client request and the MySQLServer. All client requests are sent to the Mysqlproxy, and then the corresponding analysis is done through Mysqlproxy. Infer whether the read or write operation is distributed to the corresponding MySQLServer. For multi-node slave cluster, it can also achieve the effect of load balancing. Here is the basic architecture diagram for Mysqlproxy:

Through the schema diagram above. We can see very clearly the location of mysqlproxy in practical applications and the basic things that can be done.

More specific implementation details about Mysqlproxy in the MySQL official documentation are very specific introduction and demonstration examples. Interested readers can be downloaded directly from the MySQL official site or read online, I am not tired of wasting paper here.

★ Use Amoeba to achieve data segmentation and integration

Amoeba is a Java-based open source framework that focuses on solving distributed database data source integration proxy programs, based on the GPL3 Open source protocol. For now, Amoeba has content such as query routing, query filtering, read-write separation, load balancing, and HA mechanisms.

Amoeba mainly addresses the following issues:

1. Data segmentation after the integration of complex data sources;

2. Provide data segmentation rules and reduce the impact of data segmentation rules on the database.

3. Reduce the number of connections between the database and the client.

4. Read and write separation routes;

We can see that what amoeba is doing is exactly what we need to improve the scalability of the database through data segmentation.

Amoeba is not an agent layer of proxy program, but a development of the database proxy layer Proxy program development framework, now based on amoeba developed a proxy program has amoebaformysql and Amoebaforaladin two.

Amoebaformysql is primarily a solution to the MySQL database, and the protocol requested by the front-end application and the data source database for the backend connection must be MySQL. There is no difference between a amoebaformysql and a MySQL database for any application of the client. Any client request that uses the MySQL protocol can be parsed by amoebaformysql and processed in response. Below to tell us about the architecture of Amoebaformysql (from the Amoeba Developer blog):

Amoebaforaladin is a more extensive application. A more powerful proxy program.

He was able to provide services to the front-end application at the same time as a data source that connected different databases, but only accepted client application requests that met the MySQL protocol. In other words, only after the front-end application is connected through the MySQL protocol, Amoebaforaladin will proactively parse the query statement, According to the data requested in the query statement, the active identification of the query data source is in what type of database on which the physical host. Shows the architectural details of Amoebaforaladin (from the Amoeba Developer blog):

At first glance, the two seem to be completely alike. After a closer look, you will find that the main difference between the two is only after mysqlprotocaladapter processing. The data source database is inferred based on the analysis results. Then select the specific JDBC driver and the corresponding protocol to connect to the backend database.

In fact, through the above two frame composition, you may have discovered the characteristics of amoeba, he is just a development framework. In addition to choosing the two products he has provided, Formysql and Foraladin. Can also be based on their own needs for the corresponding two times development. To get more adapted to our own application characteristics of the proxy program.

When it comes to using MySQL databases. Both Amoebaformysql and Amoebaforaladin can be used very well. Of course, given that no matter what a system is more complex, its performance will certainly have a certain loss, maintenance costs will naturally be relatively higher. So, I'd recommend using Amoebaformysql for just the MySQL database.

The use of Amoebaformysql is very simple, all of the configuration files are standard XML files, total together have four configuration files. The following were:

Amoeba.xml: Master configuration file, configure all data sources and the amoeba's own parameter settings.

Rule.xml: Configures the information for all query routing rules.

Functionmap.xml: Configures the Java implementation class that is used to parse the function in query;

Rullfunctionmap.xml: The implementation class that configures the specific functions that need to be used in the routing rules;

Assuming that your rules are not too complex, you basically need to use only the first two of the four configuration files above to complete the work. A proxy program often uses functions such as read-write separation. Configurations such as load balancing are performed in Amoeba.xml. In addition Amoeba has supported its own active routes for vertical slicing and horizontal slicing of data. Routing rules can be set in Rule.xml.

At the moment, the main thing that Amoeba lacks is its on-line management function and support for affairs, which was previously proposed in the communication process with related developers, hoping to provide a command-line management tool for online maintenance management, so as to facilitate online maintenance and use. The feedback received is that management-specific management modules have been incorporated into the development agenda. In addition, the transaction support is temporary or amoeba cannot do, even if the client application submitted to amoeba the request is to include transaction information, Amoeba will also ignore the transaction-related information. Of course, after the continuous good, I believe that the transaction support is definitely the amoeba focus on adding feature.

More specific usage of amoeba readers can be obtained by using the manual provided on the Amoeba Developer blog (, which is no longer detailed here.

★ Use HIVEDB to achieve data segmentation and integration

Like the previous mysqlproxy and Amoeba, Hivedb is a Java-based open source framework that provides data segmentation and consolidation for MySQL databases, just as the current hivedb only supports horizontal segmentation of data.

It mainly solves the scalability of the database and the high-performance data access problem, and supports the redundancy of data and the main ha mechanism at the same time.

Hivedb implementation mechanism and mysqlproxy and amoeba have a certain difference, he does not rely on the replication function of MySQL to achieve data redundancy, but to implement the data redundancy mechanism, The bottom layer is mainly based on the hibernateshards to achieve the data segmentation work.

In Hivedb, data is dispersed across multiple mysqlserver through a variety of partitionkeys defined by the user (in effect, data segmentation rules are developed). At the time of the interview. When you execute a query request. Proactively analyzes the filtering conditions, reads the data from multiple mysqlserver in parallel, and merges the result set back to the client application.

Purely from the functional aspect, Hivedb may not be as strong as mysqlproxy and amoeba, but its data segmentation ideas and the previous two are not essential differences. In addition, Hivedb is not just a content shared by an open source enthusiast, but an open source project supported by a commercial company.

The following is the official site of Hivedb the above chapter picture, describes how hivedb to organize the basic information of the data, although not specific to show too much information on the architecture, but also basically can show its unique aspect of data segmentation.

★mycat Data integration: specific HTTP://WWW.SONGWIE.COM/ARTICLELIST/11

★ Other solutions to achieve data segmentation and integration

In addition to the above-mentioned data segmentation and integration of the overall solution, there are many other similar to provide data segmentation and integration of the solution. As Mysqlproxy based on the further expansion of the Hscale, the spockproxy built through rails. and pyshards based on Pathon and so on.

No matter what kind of solution you choose to use, the overall design approach should not be any different. That is, through the vertical and horizontal segmentation of data, enhance the overall service capacity of the database, so that the overall expansion of the application system to maximize the capacity. The way you extend it is as easy as possible.

Only through the middle-tier proxy application, we better conquer the problem of data segmentation and data source integration. Then the linear scalability of the database will be very easy to do as convenient as our application. Simply by adding a cheap pcserverserver, you can add the overall service capability of the database cluster linearly, making the database no longer easily become the performance bottleneck of the application system.

Possible problems with data segmentation and integration

Over here. We should have a certain understanding of the implementation of data segmentation and integration. Perhaps a lot of reader friends have been based on various solutions to the merits of the respective characteristics of the basic selection of suitable for their own application scenarios, the work behind the main is the implementation of the preparation.

There are some possible problems that we need to do before we can implement a data-splitting scheme.

In general, the problems we may encounter are mainly as follows:

The problem of introducing distributed transactions.

Cross-node join problem;

Cross-node merge sort paging problem.

1. Problems with the introduction of distributed transactions

Once the data is sliced and stored in multiple mysqlserver, no matter how perfect our segmentation rules are (in fact, there is no perfect segmentation rule), the data involved in some of the previous firms may not be in the same mysqlserver.

In such a scenario, suppose our application still follows the old solution. Then the potential must be solved by introducing distributed transactions. In each version of MySQL, only the various version numbers since MySQL5.0 started to support distributed transactions, and only INNODB provides distributed transaction support at the moment. Not only this. Even if we happen to use the MySQL version number that supports distributed transactions. The same time is also used by the InnoDB storage engine, the distributed transaction itself to the system resource consumption is very large, the performance itself is not too high. And the introduction of the distributed transaction itself in the exception processing will bring more difficult to control the factors.

What to do? In fact, we can solve this problem in a flexible way. The first thing to consider is whether the database is the only place where you can solve a transaction. In fact this is not the case, we are fully able to combine the database and the application both to solve together. Each database solves its own business. It then controls the transactions on multiple databases through the application.

Other words. Just want us to be willing. It is entirely possible to split a distributed transaction across multiple databases into multiple small transactions that are only on a single database. And through the application to control each small transaction.

Of course, the requirement for this is that our Russian application must be robust enough. Of course, it will also bring some technical difficulties to the application.

2. Cross-node Join issues

The above describes the possible introduction of distributed transactions, and now let's look at the need to cross-node join issues.

After the data is sliced. May cause some old join statements to be unusable. Because the data source used by join may be sliced into multiple mysqlserver.

What to do? This problem from the MySQL database point of view, assuming that it has to be directly resolved on the database side, I am afraid that only through MySQL a special storage engine federated to conquer. The Federated storage Engine is the solution for MySQL to solve problems like Oracle's Dblink.

The main difference with Oracledblink is that federated will save a copy of the definition of the remote table structure locally. At first glance, Federated is really a great way to solve cross-node joins. But we should also be clear, it seems to assume that the remote table structure has changed, the local table definition information will not follow the corresponding changes. It is assumed that the local federated table definition information is not updated when the remote table structure is updated. It is very likely that the query execution error will not get the correct result.

To deal with this kind of problem, I still recommend using the application for processing, first in the driver table in the MySQLServer to take out the corresponding drive result set. The corresponding data is then taken from the drive result set to the MySQLServer where the driver table is located. Maybe a lot of readers will think that this will have a certain impact on performance, yes, it will have a certain negative impact on performance, but in addition to this method, there are basically not many other better ways to solve.

And, because of the good expansion of the database, the load of each mysqlserver can be better controlled. For a single query, the response time may be higher than the non-segmentation, so the performance of the negative impact is not too large. Not to mention. Similar to the need for cross-node joins is not too much. It's probably just a very small part of the overall performance. So for overall performance considerations, the occasional sacrifice is a little bit. It's actually worth it. After all, system optimization itself is the process of having a lot of trade-offs and balances.

3. Cross-node merge sort paging problem

Once the data has been sliced horizontally, it is possible that not only the cross-node join will not work properly, but the data source for some sort of paged query statement may also be sliced to multiple nodes. The immediate consequence of this is that these sort paged query cannot continue to execute normally. In fact, this is the same as cross-node join. The data source exists on more than one node, and it is the same operation as a cross-node join, to be resolved through a query. The same federated can also be partially resolved. Of course there are the same risks.

Still the same question, how to do? I still continue to recommend the application to solve the same.

How to solve? The solution is broadly similar to a cross-node join, but one point is not the same as a cross-node join. Join is very often a driver-driven relationship. So the data read between multiple tables involved in the join itself generally has a sequential relationship. But sort of paging is not the same, the sort of paging data source basically can be said to be a table (or a result set). There is no sequential relationship in itself, so the process of fetching data from multiple data sources can be completely parallel.

Such Sorting the fetch efficiency of paging data we can do more than cross-Library joins. As a result, the performance loss is relatively smaller, and in some cases it may be more efficient than a database that has not been sliced in the original.

Of course, paging is sorted across node joins or across nodes. will make our application server consume a lot of other resources, especially memory resources, because the process of reading the interview and merging the result set requires a lot more data than the original processing.

Analysis here, perhaps a lot of reader friends will find that all of these issues above, I give the advice is basically through the application to solve. You may have started whispered in your heart. Was it because I was a DBA that I threw a lot of things to the application architect and developer?

In fact, this is completely not the case, first the application due to its particularity. Very easy to do very good extensibility, but the database is not the same. There must be a lot of other ways to expand. And in this extension process, it is very difficult to avoid the case that some of the original in the centralized database can be resolved but cut apart into a database cluster has become a problem.

To maximize the overall scale of the system, we can only let the application do a lot of other things. To solve the problem that the database cluster can not be solved better.


By segmenting a large mysqlserver into smaller mysqlserver, it overcomes the write performance bottleneck and once again increases the scalability of the entire DB cluster. Whether it's through vertical slicing or horizontal slicing. Can make the system less likely to experience bottlenecks. Especially when we use the vertical and horizontal combination of the segmentation method, in theory will no longer encounter the expansion bottleneck.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.