In large and medium-sized projects, when the database is designed, the database or data table is usually split horizontally to reduce the pressure of a single library and a single table, taking into account the maximum amount of data that can be sustained. I'll introduce two data-table segmentation methods commonly used in our projects. Of course, these methods are used in the program to use a certain skill to route to the specific table. First of all, we need to confirm what to slice horizontally? In our system (SNS), the user's UID runs through the system, the only self-growth, according to this field sub-table, and better.
Method One: Use the MD5 hash
The procedure is to MD5 the UID and then take the first few (we take the first two bits) and then hash the different UID into a different user table (USER_XX).
function GetTable ($uid) {
$ext = substr (MD5 ($UID), 0, 2);
Return "User_". $ext;
}
With this technique, we can scatter different UID into 256 user tables, user_00,user_01 ... user_ff. Because the UID is a number and increment, according to the MD5 algorithm, the user data can be almost evenly divided into different users table.
But there is a problem here, if we have more and more users of the system, it is bound to increase the amount of data on a single table, and according to this algorithm can not extend the table, this will return to the beginning of the article problems.
Method Two: Use Shift
The specific methods are:
Public Function getTable ($uid) {
Return "User_". sprintf ("%04d", ($uid >> 20));
}
Here, we will move the UID to the right 20 bits, so that we can put about the first 1 million of the user data in table user_0000, the second 1 million of the user data placed in the second table user_0001, so continue, if our users more and more, Just add the user table directly. Since we have reserved a table suffix of four bits, here we can add 10,000 user tables, namely user_0000,user_0001 ... user_9999. 10,000 tables, 1 million data per table, we can save 10 billion user records. Of course, if you have more user data than this, it does not matter, you just change the reserved table suffix to increase the table can be expanded, such as if there are 100 billion data, each table 1 million, then you need 100,000 tables, we just keep the table suffix 6 bits.
The above algorithm can also write a flexible point:
/**
* According to the UID Sub-table algorithm
*
* @param int $uid//user ID
* @param int $bit//table suffix reserved several
* @param int $seed//Right-shift number
*/
function getTable ($uid, $bit, $seed) {
Return "User_". sprintf ("%0{$bit}d", ($uid >> $seed));
}
Summarize
Both of these methods make the largest possible estimate of the amount of user data in our current system and estimate the maximum capacity of a single table in the database.
For example, if we estimate that the user of our system is 10 billion and the optimal data volume for a single table is 1 million, then we need to move the UID 20来 to ensure that each table is 1 million data, and the user table (USER_XXXX) Four bits are reserved to expand 10,000 tables.
Again, as the first scheme, each table 1 million, MD5 after the first two, there can only be 256 tables, the total system database is: 256*100 million; If you have a system with more total data than this, you will definitely want to MD5 the top three or four or even more bits.
Both methods are to slice the data horizontally into different tables, and the second method is more extensible than the first.
A method of horizontal segmentation of database