Mysql horizontal segmentation

Last Update:2013-11-25 Source: Internet

Author: User

Tags field table md5 hash

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In large and medium-sized projects, mysql horizontal segmentation usually splits the database or data table horizontally to reduce the pressure on a single database or table, considering the maximum data size of the database. Here I will introduce two common data table splitting methods in our projects. Of course, these methods use certain skills in the program to route to a specific table. First, we need to confirm based on what horizontal segmentation? In our system (SNS), the user's UID runs through the system, and the unique self-growth, according to this field table, it is better. Method 1: use MD5 hash to encrypt the UID using md5, and then retrieve the first few digits (Here we take the first two digits ), then we can hash different UIDs to different user tables (user_xx. Function getTable ($ uid) {$ ext = substr (md5 ($ uid), 0, 2); return "user _". $ ext;} with this technique, we can distribute different UIDs to the user_00, user_01 ...... User_ff. Because UID is a number and increments, according to the md5 algorithm, user data can be evenly divided into different user tables. However, there is a problem here: if more and more users are in our system, the data volume of a single table will inevitably increase, and the table cannot be extended according to this algorithm, this will return to the problem at the beginning of the article. Method 2: Use the shift method: www.2cto.com Php code public function getTable ($ uid) {return "user _". sprintf ("% 04d", ($ uid> 20);} Here, we move the uid 20 to the right, in this way, we can put about 1 million of the first user data in the first table user_0000, and the second 1 million of the user data in the second table user_0001. This keeps going, if we have more and more users, simply add a user table. Since the table suffix we keep is four bits, we can add 10 thousand user tables here, that is, user_0000, user_0001 ......

User_9999. There are 10 thousand million tables and 1 million data records in each table. We can store 10 billion user records. Of course, it doesn't matter if you have more user data than this. You just need to change the reserved table suffix to add scalable tables. If there are 100 billion data records, if you store 1 million data records in each table, you only need to keep the table suffix as 6 characters. The above algorithms can also be written flexibly: www.2cto.com Php code/*** according to UID table sharding algorithm * @ param int $ uid // user ID * @ param int $ bit // The table suffix retains several digits * @ param int $ seed/ /move the number of digits to the right */function getTable ($ uid, $ bit, $ seed) {return "user _". sprintf ("% 0 {$ bit} d", ($ uid >>$ seed);} conclusion: the above two methods are www.2cto.com, make the largest possible estimation of the user data volume in our current system, and estimate the maximum capacity of a single table in the database. For example, in the second solution, if we estimate that the number of users in our system is 10 billion, and the optimal data size for a single table is 1 million, we need to move the UID 20 to ensure that each table has 1 million data, the User table (user_xxxx) is reserved for expansion of 10 thousand tables.
Another example is the first solution. For each table, 1 million and the first two digits after md5 are obtained, there are only 256 tables. The total system database is 256*1 million; if the total data volume in your system is more than that, you must use MD5 to retrieve the first three or four or more digits. Both methods split data horizontally into different tables. Compared with the first method, the second method is more scalable...

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More