In large and medium-sized projects, when designing a database, the database or data table is split horizontally to reduce the pressure on a single database or table, considering the maximum data size of the database. This section describes common data table Splitting Methods in two projects. Of course, these methods are all in the program? Use certain techniques to route data to a specific table. First, confirm
In large and medium-sized projects, when designing a database, the database or data table is split horizontally to reduce the pressure on a single database or table, considering the maximum data size of the database. This section describes common data table Splitting Methods in two projects. Of course, these methods are all in the program? Use certain techniques to route data to a specific table. First, confirm
In large and medium-sized projects, when designing a database, the database or data table is split horizontally to reduce the pressure on a single database or table, considering the maximum data size of the database. This section describes common data table Splitting Methods in two projects. Of course, these methods are all in the program? Use certain techniques to route data to a specific table. First, we need to confirm based on what horizontal segmentation? In our system (SNS), the user's UID runs through the system, and the unique self-growth, according to this field table, it is better.
Method 1: use MD5 Hash
The method is to encrypt the UID with md5, and then take the first few digits (Here we take the first two digits). Then we can hash different UIDs to different user tables (user_xx.
function getTable( $uid ){ $ext = substr ( md5($uid) ,0 ,2 ); return "user_".$ext;}
Through this technique, we can distribute different UIDs to the user table in 256, namely user_00 and user_01 ...... user_ff. Because UID is a number and increments, according to the md5 algorithm, user data can be evenly divided into different user tables.
However, there is a problem here: if more and more users are in our system, the data volume of a single table will inevitably increase, and the table cannot be extended according to this algorithm, this will return to the problem at the beginning of the article.
Method 2: Shift
The specific method is:
public function getTable( $uid ) { return "user_" . sprintf( "d", ($uid >> 20) );}
Here, we move the uid 20 to the right, so that we can put about 1 million of the first user data in the first table user_0000, and 1 million of the second user data in the second table user_0001, in this way, if more and more users exist, simply add the user table. Because the table suffix we keep is four bits, we can add 10 thousand user tables here, that is, user_0000, user_0001 ...... user_9999. There are 10 thousand million tables and 1 million data records in each table. We can store 10 billion user records. Of course, it doesn't matter if you have more user data than this. You just need to change the reserved table suffix to add scalable tables. If there are 100 billion data records, if you store 1 million data records in each table, you only need to keep the table suffix as 6 characters.
The above algorithms can also be written flexibly:
/*** According to the UID table sharding algorithm ** @ param int $ uid // user ID * @ param int $ bit // The number of digits reserved by the table suffix * @ param int $ seed // to the right mobile digits */function getTable ($ uid, $ bit, $ seed) {return "user _". sprintf ("% 0 {$ bit} d", ($ uid >>$ seed ));}
Summary
The above two methods both have to make the largest possible estimation of the user data volume in our current system, and make an estimate of the maximum capacity of a single database table.
For example, in the second solution, if we estimate that the number of users in our system is 10 billion, and the optimal data size for a single table is 1 million, we need to move the UID 20 to ensure that each table has 1 million data, the User table (user_xxxx) is reserved for expansion of 10 thousand tables.
Another example is the first solution. For each table, 1 million and the first two digits after md5 are obtained, there are only 256 tables. The total system database is 256*1 million; if the total data volume in your system is more than that, you must use MD5 to retrieve the first three or four or more digits.
Both methods split data horizontally into different tables. Compared with the first method, the second method is more scalable.