Two ideas for horizontal database Segmentation

Last Update:2018-06-10 Source: Internet

Author: User

Tags field table md5 hash

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In large and medium-sized projects, when designing a database, the database or data table is split horizontally to reduce the pressure on a single database or table, considering the maximum data size of the database. This section describes common data table Splitting Methods in two projects. Of course, these methods are all in the program? Use certain techniques to route data to a specific table. First, we need to confirm based on what horizontal segmentation? In our system (SNS), the user's UID runs through the system, and the unique self-growth, according to this field table, it is better.

Method 1: use MD5 Hash

The method is to encrypt the UID with md5, and then take the first few digits (Here we take the first two digits). Then we can hash different UIDs to different user tables (user_xx.

function getTable( $uid ){  $ext =  substr ( md5($uid) ,0 ,2 );  return "user_".$ext;}

Through this technique, we can distribute different UIDs to the user table in 256, namely user_00 and user_01 ...... user_ff. Because UID is a number and increments, according to the md5 algorithm, user data can be evenly divided into different user tables.

However, there is a problem here: if more and more users are in our system, the data volume of a single table will inevitably increase, and the table cannot be extended according to this algorithm, this will return to the problem at the beginning of the article.

Method 2: Shift

The specific method is:

public function getTable( $uid ) { return "user_" . sprintf( "d", ($uid >> 20) );}

Here, we move the uid 20 to the right, so that we can put about 1 million of the first user data in the first table user_0000, and 1 million of the second user data in the second table user_0001, in this way, if more and more users exist, simply add the user table. Because the table suffix we keep is four bits, we can add 10 thousand user tables here, that is, user_0000, user_0001 ...... user_9999. There are 10 thousand million tables and 1 million data records in each table. We can store 10 billion user records. Of course, it doesn't matter if you have more user data than this. You just need to change the reserved table suffix to add scalable tables. If there are 100 billion data records, if you store 1 million data records in each table, you only need to keep the table suffix as 6 characters.

The above algorithms can also be written flexibly:

/*** According to the UID table sharding algorithm ** @ param int $ uid // user ID * @ param int $ bit // The number of digits reserved by the table suffix * @ param int $ seed // to the right mobile digits */function getTable ($ uid, $ bit, $ seed) {return "user _". sprintf ("% 0 {$ bit} d", ($ uid >>$ seed ));}

Summary

The above two methods both have to make the largest possible estimation of the user data volume in our current system, and make an estimate of the maximum capacity of a single database table.

For example, in the second solution, if we estimate that the number of users in our system is 10 billion, and the optimal data size for a single table is 1 million, we need to move the UID 20 to ensure that each table has 1 million data, the User table (user_xxxx) is reserved for expansion of 10 thousand tables.

Another example is the first solution. For each table, 1 million and the first two digits after md5 are obtained, there are only 256 tables. The total system database is 256*1 million; if the total data volume in your system is more than that, you must use MD5 to retrieve the first three or four or more digits.

Both methods split data horizontally into different tables. Compared with the first method, the second method is more scalable.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More