This article is connected to the original: http://blog.csdn.net/bluishglc/article/details/7696085, reproduced please indicate the source. This paper focuses on the sharding segmentation strategy, if you lack of basic understanding of the database sharding, please refer to my another article from the basic theory of the comprehensive introduction of sharding: The basic idea of database sharding and segmentation strategy
Part I: implementation strategy
Figure 1. Database Sub-Library (sharding) Implementation strategy diagram (click to view larger image)
1. The preparatory phase
Before the database is divided into tables (sharding), developers are required to have a thorough understanding of the system business logic and database schema. A good suggestion is to draw a database ER diagram or domain model diagram, which is based on this kind of graph to divide Shard, intuitive and easy to ensure that developers always keep a clear mind. Select the database ER diagram or the domain model diagram to choose from the project itself. If the project uses the data-driven development method, the team takes the database ER diagram as the basis for business communication, then naturally chooses the database ER diagram, if the project uses the domain-driven development method, and constructs a good domain model through the or-mapping, then the domain model diagram is undoubtedly the best choice. As far as I am concerned, I prefer to use the domain model diagram, because the segmentation is more based on business analysis and judgment, the domain model is undoubtedly more clear and intuitive.
2. Analysis phase
1. Vertical segmentation
Vertical segmentation is based on the principle of: Close the business, the tables are closely related to each other, such as the table of the same module. In conjunction with the prepared database ER or domain model diagram, follow the swim lane concept in the activity diagram, a lane represents a shard, dividing all tables into different lanes. The following analysis example shows this practice. Of course, you can also use the pencil circle directly on the printed ER or model diagram, depending on your preferences.
2. Horizontal segmentation
After vertical segmentation, the data volume and the growth rate of the table in Shard need to be analyzed further to determine whether horizontal segmentation is required.
2.1If the table data that is grouped together is growing slowly, after the product on-line can meet long enough time can be hosted by a single database, then do not need horizontal segmentation, all tables reside in the same shard, all the relations between the tables will be maximum reservation, while ensuring the freedom of writing SQL, It is not easy to be restricted by clauses such as join, group BY, order by, and so on.
2.2If the table is divided into a large amount of data, rapid growth, the need for further horizontal segmentation. Further horizontal segmentation is done in this way:
2.2.1. Combining business logic and inter-table relationships, dividing the current shard into smaller shard, typically, these smaller shard each contains only one primary table (a table that will hash with that table ID) and multiple secondary tables associated with or indirectly associated with it. The condition of a shard table with multiple sheets is the inevitable result of horizontal segmentation. In this way, the number of Shard will increase rapidly. If each shard represents a separate database, then managing and maintaining the database will be very cumbersome, and these small shard often have only two or three tables, to create a new library, the utilization rate is not high, so after the horizontal segmentation is completed, you can do a "reverse merge", that is: the business close to the , and two or more shard with similar data growth rates (the amount of primary table data on the same order of magnitude) are placed on the same database, logically they remain separate shard, have their respective primary tables, and are hashed according to the IDs of their respective primary tables. The difference is that their hash modulo (that is, the number of nodes) must be consistent. In this way, the number of tables on each database node is relatively average.
2.2.2.After all the tables have been partitioned into the appropriate shard, all the relationships across the Shard must be interrupted, and when writing SQL, joins, group by, and order times across Shard will be banned, and these issues need to be resolved at the application level.
In particular, I would like to mention that after horizontal segmentation, the granularity of Shard is often smaller than only vertical cutting granularity, the original single vertical Shard will be subdivided into one to many shard with a primary table as the center or indirectly associated with multiple secondary tables, at this time the Shard granularity and domain-driven design of "aggregation" The concept coincides with, and can be said to be, exactly the same, and the main table of each shard is the aggregate root of an aggregation.
3. Implementation phase
If the project at the beginning of the development of the decision to carry out the sub-Library, then strictly follow the analysis of the design plan to advance. If implemented in the medium-term architecture evolution, in addition to building the infrastructure to implement sharding logic (the topic will be described in the next article), you will also need to filter the original SQL, and modify the SQL that is affected by the sharding.
Part Two: Sample demo
This article chooses one of the most well-known applications: Jpetstore to demonstrate how to perform the work of the Sub-Library (sharding) in the analysis phase. For some personal reasons, the demo used Jpetstore from a demo version of the original Ibatis official, SVN address: http://mybatis.googlecode.com/svn/tags/java_release_ 2.3.4-726/jpetstore-5. About Jpetstore business logic is no longer introduced here, this is a very simple electrical quotient system prototype, its domain model as follows:
Figure 2. Jpetstore Domain Model
As the system is simpler, it is easy to see from the model that it consists mainly of three modules: users, products and orders. Then the vertical segmentation scheme is out. Next look at the horizontal segmentation, if we consider from a real pet store, the possibility of a single table of data explosion should be account and order, so the two tables need to be horizontal segmentation. For the product module, if it is a real system, the number of product and Item is not very large, so only vertical segmentation is sufficient, that is, (Product,category,item,iventory, Supplier) Five tables on a database node (no horizontal segmentation, no more than two database nodes). however, as a demo, we assume that the product module also has a large number of data needs for us to do horizontal segmentation , then analysis, the module to split two shard: one Is (Product (main), Category), the other is (Item (main), Iventory,supplier), at the same time, we think: These two shard in the data growth should be similar, and in the business is also very close , then we can put these two shard on the same database node, Item and product data take the same modulus when hashing . According to the drawing method described in the previous article, we get the following sharding schematic diagram:
Figure 3. Jpetstore sharding schematic
A few more points for this picture:
1. Use lanes to represent physical shard (a database node)
2. If the vertical cut out of the Shard for further horizontal segmentation, but the common one physical shard, then use the dotted box to indicate that it is logically an independent shard.
3. The dark entity represents the primary table
4.X indicates an association between tables that you want to interrupt