Two thoughts on horizontal segmentation of database The problems that must be considered in the database to withstand the maximum amount of data

Source: Internet
Author: User
Tags md5 hash sprintf

In large and medium-sized projects, when the database is designed, the database or data table is usually split horizontally to reduce the pressure of a single library and a single table, taking into account the maximum amount of data that can be sustained. This section describes the data-table segmentation methods commonly used in two projects. Of course these methods are all in the program? Use a certain technique to route to a specific table. First of all, we need to confirm what to slice horizontally? In our system (SNS), the user's UID runs through the system, the only self-growth, according to this field sub-table, and better.

Method One: Use the MD5 hash

The procedure is to MD5 the UID and then take the first few (we take the first two bits) and then hash the different UID into a different user table (USER_XX).

1 functiongetTable( $uid){
2   $extsubstr( md5($uid) ,0 ,2 );
3   return"user_".$ext;
4 }

With this technique, we can scatter different UID into 256 user tables, user_00,user_01 ... user_ff. Because the UID is a number and increment, according to the MD5 algorithm, the user data can be almost evenly divided into different users table.

But there is a problem here, if we have more and more users of the system, it is bound to increase the amount of data on a single table, and according to this algorithm can not extend the table, this will return to the beginning of the article problems.

Method Two: Use Shift

The specific methods are:

1 publicfunction getTable( $uid) {
2  return"user_" . sprintf( "d", ($uid>> 20) );
3 }

Here, we will move the UID to the right 20 bits, so that we can put about the first 1 million of the user data in table user_0000, the second 1 million of the user data placed in the second table user_0001, so continue, if our users more and more, Just add the user table directly. Since we have reserved a table suffix of four bits, here we can add 10,000 user tables, namely user_0000,user_0001 ... user_9999. 10,000 tables, 1 million data per table, we can save 10 billion user records. Of course, if you have more user data than this, it does not matter, you just change the reserved table suffix to increase the table can be expanded, such as if there are 100 billion data, each table 1 million, then you need 100,000 tables, we just keep the table suffix 6 bits.

The above algorithm can also write a flexible point:

01 /**
02  * 根据UID分表算法
03  *
04  * @param int $uid  //用户ID
05  * @param int $bit    //表后缀保留几位
06  * @param int $seed //向右移动位数
07  */
08 functiongetTable( $uid , $bit , $seed){
09   return"user_" . sprintf( "%0{$bit}d" , ($uid>> $seed) );
10 }
Summarize

Both of these methods make the largest possible estimate of the amount of user data in our current system and estimate the maximum capacity of a single table in the database.

For example, if we estimate that the user of our system is 10 billion and the optimal data volume for a single table is 1 million, then we need to move the UID 20来 to ensure that each table is 1 million data, and the user table (USER_XXXX) Four bits are reserved to expand 10,000 tables.

Again, as the first scheme, each table 1 million, MD5 after the first two, there can only be 256 tables, the total system database is: 256*100 million; If you have a system with more total data than this, you will definitely want to MD5 the top three or four or even more bits.

Both methods are to slice the data horizontally into different tables, and the second method is more extensible than the first.

Go to (http://www.nowamagic.net/librarys/veda/detail/1528)

(go) Two ideas of horizontal segmentation of database The issues that must be considered in the database to withstand the maximum amount of data

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.