Database sharding series (I) sharding implementation policies and Examples

Source: Internet
Author: User
Tags database sharding

The original Article connection: http://blog.csdn.net/bluishglc/article/details/7696085, reproduced please indicate the source! This article focuses on sharding policies. If you do not have a basic understanding of database sharding, please refer to my other article about sharding from the basic theory: basic ideas and sharding strategies of database sharding.


Part 1: Implementation Strategy

Figure 1. Implementation Policy diagram of database sharding (click to view the big picture)

1. preparation phase

Before sharding databases, developers must fully understand the system business logic and database schema. A good suggestion is to draw a database erdiagram or domain model diagram. Based on this type of diagram, Shard is divided, which is intuitive and easy to use, so that developers can always stay awake. The database erdiagram or domain model diagram should be selected based on the project's own situation. If the project uses data-driven development and the team uses the database erdiagram as the basis for business communication, the database erdiagram will naturally be selected. If the project uses a domain-driven development method, and a good domain model is built through or-mapping, so the domain model diagram is undoubtedly the best choice. Personally, I prefer to use domain model diagrams because Domain Models are clearer and more intuitive when splitting data based on business analysis and judgment.

2. Analysis Phase

1. Vertical Split
Vertical splitting is based on the principle of dividing tables with close business and close association among tables, such as tables in the same module. Based on the prepared database erdiagram or domain model diagram, a swimming track represents a shard and all the tables are divided into different swimming paths. The following analysis example shows this practice. Of course, you can also use a pencil to circle the printed ERTU or model graph, depending on your preferences.

2. horizontal segmentation
After Vertical splitting, You need to further analyze the data volume and growth rate of the table in the Shard to determine whether horizontal splitting is required.

2.1If the table data that is grouped together grows slowly and can be carried by a single database for a long enough period after the product is launched, horizontal splitting is not required, all Tables reside in the same Shard, and the association between all tables is retained to the maximum extent. At the same time, the SQL writing degree of freedom is ensured, which is not restricted by clauses such as join, group by, and order.

2.2If the data size of a table is large and the growth rate is fast, further horizontal segmentation is required. Further horizontal segmentation is implemented as follows:

2.2.1. Based on the business logic and the relationship between tables, the current Shard is divided into several smaller shard. Generally, each of these smaller shard tables only contains one primary table (a table with the table ID hashed) and multiple associated or indirectly associated secondary tables. This kind of situation where one shard and one primary table have multiple tables is the inevitable result of horizontal splitting. In this way, the number of shard instances increases rapidly. If each shard represents an independent database, it will be very troublesome to manage and maintain the database, and these small shard often only have two or three tables. Therefore, a new database is created, and the utilization is not high, therefore, after the horizontal split is complete, you can perform a "reverse merge" operation again, that is, the business is similar and the data growth rate is similar (the data volume of the master table is on the same order of magnitude) and put two or more shard in the same database. Logically, they are still independent Shard, have their own master tables, and hash them based on the IDs of independent tables, the difference is that their hash modulo (that is, the number of nodes) must be consistent. In this way, the number of tables on each database node is relatively average.

2.2.2.After all the tables are divided into appropriate Shard, all the tables that span the shard must be interrupted. When writing an SQL statement, cross-shard join, group by, and order by will be disabled, and these problems need to be resolved in coordination at the application layer.

In particular, after horizontal splitting, the shard granularity is usually smaller than that of vertical cutting, A single vertical Shard is subdivided into one or more shard tables with one master table as the center or indirectly associated with multiple tables, at this time, the shard granularity is consistent with the concept of "aggregation" in the domain-driven design. It can be said that the master table of each shard is exactly the aggregation root in the aggregation!

3. Implementation Phase

If the project decides to perform database/table sharding at the beginning of development, it should be promoted in strict accordance with the analysis and design scheme. If it is implemented in the mid-term Architecture Evolution, in addition to setting up the infrastructure for implementing the sharding logic (this topic will be described in the next article), you also need to filter and analyze the original SQL statements one by one, modify the SQL statements that are affected by sharding.

Part 2: Demo

This article selects a well-known application: jpetstore to demonstrate how to perform sharding in the analysis phase. For some personal reasons, the jpetstore used for the demonstration comes from an official demo version of the original ibatis. The SVN address is http://mybatis.googlecode.com/svn/tags/java_release_2.3.4-#/jpetstore-5. The business logic of jpetstore is not described here. This is a very simple e-commerce system prototype. Its domain model is as follows:

Figure 2. jpetstore Domain Model

Because the system is relatively simple, we can easily see from the model that it is mainly composed of three modules: users, products and orders. Then the Vertical Split solution will come out. Next, let's look at horizontal splitting. If we consider a pet store, the single table that may surge in data should be account and order. Therefore, the two tables need to be horizontally split. For the product module, if it is an actual system, the number of products and items is not very large, so only vertical splitting is enough, that is, (product, category, item, iventory, supplier) five tables on one database node (there is no horizontal split, there will be no more than two database nodes ).But as a demonstration, we assume that the product module also has a large amount of data that requires horizontal segmentation.According to the analysis, this module split two shard: one is (product (main), category), the other is (item (main), iventory, supplier ),At the same time, we believe that the two shard should be similar in terms of data growth and closely related business.So we canPlace the two shard instances on the same database node. The item and product data are in the same mode in the hash.. Based on the drawing method described above, we get the following sharding:

Figure 3. jpetstore sharding

The following points are described in this figure:

1. Use a swimming track to represent a physical shard (a database node)

2. IfVertical Shard is further horizontally split, but when a physical Shard is used, it is placed in a dotted box, indicating that it is logically an independent shard.

3. Dark entities represent the primary table

4. x indicates the association between tables to be interrupted.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.