Part I: Implementation strategy
Figure 1: Database sub-list (sharding) Implementation strategy diagram (click to view the larger image)
1. Preparation Phase
Before the database is partitioned into tables (sharding), it is necessary for developers to fully understand the system business logic and database schema. A good suggestion is to draw a database ER diagram or domain model diagram that is based on this type of diagram, which is intuitive and easy to use, to ensure that developers always stay awake. Choose the database ER diagram or the domain model diagram according to the project's own situation. If the project uses a data-driven development approach, the team takes the database ER diagram as the basis for business communication, then naturally chooses the database ER diagram, and if the project uses a domain-driven development approach and builds a good domain model through or-mapping, then the domain model diagram is undoubtedly the best choice. As far as I am concerned, I prefer to use the domain model diagram, because the segmentation is more business-based analysis and judgment, the domain model is undoubtedly more clear and intuitive.
2. Analysis phase
1. Vertical Slicing
Vertical segmentation is based on the principle that the tables that are closely related to each other, such as tables in the same module, are grouped together. In conjunction with the prepared database ER diagram or domain model diagram, modeled on the Swimlane concept in the activity diagram, a swimlane represents a shard and divides all the tables into different lanes. The following analysis example shows this practice. Of course, you can also use a pencil circle directly on a printed ER or model diagram, depending on your preferences.
2. Horizontal slicing
After vertical segmentation, you need to further analyze the amount of data and the growth rate of the tables within the Shard to determine if horizontal segmentation is required.
2.1If the tabular data is slow to grow, and can be hosted by a single database for a long enough period of time after the product is online, you do not need to slice horizontally, all tables reside in the same shard, all relationships between the tables are kept to a maximum, and the degree of freedom of writing SQL is guaranteed. Not easily restricted by clauses such as join, group BY, order by, and so on.
2.2If the amount of tabular data is large and rapid, it needs to be further divided horizontally. Further horizontal segmentation is done in this way:
2.2.1In combination with business logic and inter-table relationships, the current shard is divided into smaller shard, typically these smaller shard each contain only one primary table (the table that will be hashed with that table ID) and multiple secondary tables associated with or indirectly associated with it. The condition of a shard table is the inevitable result of horizontal slicing. This way, the number of Shard will increase rapidly. If each shard represents a separate database, then managing and maintaining the database will be cumbersome, and these small shard tend to have only two or three tables, creating a new library for this purpose, and the utilization is not high, so you can do a "reverse merge" once the horizontal slice is complete, that is: close the business , and two or more shard with similar data growth rates (the primary table data volume at the same order of magnitude) are placed on the same database, logically they remain independent Shard, have their own primary table, and hash according to their main table ID. The difference is that their hash modulus (that is, the number of nodes) must be consistent. In this way, the number of tables on each database node is relatively average.
2.2.2.After all tables have been partitioned into the appropriate shard, all inter-table associations across the Shard must be interrupted, and when writing SQL, the join, group by, and order by will be banned across the Shard and need to be coordinated at the application level to address these issues.
Special want to mention: after the horizontal segmentation, shard is often smaller than the size of the vertical cut, the original single vertical Shard will be subdivided into one or more shard with one main table as the center or indirectly associated with multiple sub-tables, at this time the Shard granularity and domain-driven design "aggregation" The concept coincides with, and even is, exactly the same, and the main table of each shard is the aggregation root in an aggregation!
3. Implementation phase
If the project at the beginning of the decision to carry out the sub-database, then strictly in accordance with the analysis of the design plan to advance. If implemented in the evolution of the medium-term architecture, in addition to building the infrastructure that implements Sharding logic (as discussed in the next article), it is necessary to filter the original SQL one by one and modify the SQL that is affected by sharding.
Part II: Sample Demo
This article chooses a well-known application: Jpetstore to demonstrate how the sub-database (sharding) works in the analysis phase. For some personal reasons, the demo used Jpetstore from a demo version of the original Ibatis official, SVN address: http://mybatis.googlecode.com/svn/tags/java_release_ 2.3.4-726/jpetstore-5. The business logic about Jpetstore is not introduced here, it is a very simple prototype of the e-commerce system, and its domain model is as follows:
Figure 2. Jpetstore Domain Model
As the system is simpler, we can easily see from the model that it consists of three modules: User, product and order. Then the vertical slicing scheme is out. Next look at horizontal slicing, if we start with a real pet store, the single table that could be a spike in data should be account and order, so the two tables need to be sliced horizontally. For the product module, if it is an actual system, the number of product and Item is not very large, so it is sufficient to do only vertical slicing, i.e. (product,category,item,iventory, Supplier) Five tables on a database node (no horizontal segmentation, there will be no more than two database nodes). But as a demonstration, we assume that the product module also has a large number of data need we do horizontal segmentation , then the analysis of this module to split two shard: one is (product (main), Category), and the other is (Item (master), Iventory,supplier), at the same time, we think: These two shard in data growth should be similar, and in business is also very close , then we can put these two shard on the same database node, The item and the product data are hashed with the same modulus . According to the drawing method described in the previous article, we get the following sharding:
Figure 3. Jpetstore sharding
A few more points for this picture:
1. Use Swimlane to represent physical shard (a database node)
2. If the vertical cut-off shard is further horizontally segmented, but the common one is a physical shard, it is framed by a dashed line, indicating that it is logically a separate shard.
3. The dark body represents the main table
4.X indicates inter-table associations that need to be interrupted
Database Repository sub-table (sharding)