If you need to perform mysql table sharding, we will prove that your database is relatively large, that is, to divide a table into N multiple small tables. After Table sharding, The concurrency of a single table is improved, disk I/O performance is also improved. Why is the concurrency improved? Because it takes a short time to query a table. If there is a high concurrency, the total table can allocate the concurrency pressure to different small tables based on different queries.
What is table sharding? On the surface, it means dividing a table into N small tables.
Mysql sub-tables are real sub-tables. After a table is divided into many tables, each small table is a positive table, which corresponds to three files. MYD data file ,. MYI index file ,. frm table structure file.
1. [root @ BlackGhost test] # ls | grep user
2. alluser. MRG
3. alluser. frm
4. user1.MYD
5. user1.MYI
6. user1.frm
7. user2.MYD
8. user2.MYI
9. user2.frm
1. Create a mysql cluster, such as using mysql cluster, mysql proxy, mysql replication, and drdb.
Some people will ask the mysql cluster, what is the relationship between the root table sharding? Although it is not a table sharding in the actual sense, it starts the role of table sharding. What is the significance of the cluster? To reduce the burden on a database, simply reduce the number of SQL statements in the SQL queue. For example, if there are 10 SQL requests in the queue of a database server, it takes a long time to allocate these 10 SQL requests to the queuing queues of five database servers. There are only two queues of one database server, is the waiting time greatly shortened? This is already obvious. So I listed it within the table sharding range. I have done some mysql clusters:
Installation, configuration, and read/write splitting of linux mysql proxy
Mysql replication is mutually active/standby installation and configuration, and Data Synchronization
Advantage: good scalability, no complex operations after multiple table shards (php code)
Disadvantage: The data volume of a single table remains unchanged, the time spent on one operation is still that large, and the hardware overhead is high.
2. Tables with large data volume and frequent access are estimated in advance and divided into several tables.
This estimation is very poor. The table posted in the Forum is certainly very large after a long time, which may be hundreds of thousands or even millions of users. The chat room information table contains dozens of people chatting for one night. After a long time, the data in this table must be large. There are many situations like this. Therefore, for this big data table that can be estimated, We will separate N tables in advance. The N value depends on the actual situation. Take the chat info table as an example:
First, we will create 100 such tables, message_00, message_01, message_02 .......... message_98, message_99. then, the user ID is used to determine which table the user's chat information is stored in. You can use the hash method to obtain the information, and the remainder method to obtain the information. There are many methods, let everyone think about it. The following uses the hash method to obtain the table name:
The Code is as follows: |
Copy code |
<? Php Function get_hash_table ($ table, $ userid ){ $ Str = crc32 ($ userid ); If ($ str <0 ){ $ Hash = "0". substr (abs ($ str), 0, 1 ); } Else { $ Hash = substr ($ str, 0, 2 ); } Return $ table. "_". $ hash; } Echo get_hash_table ('message', 'user18991 '); // The result is message_10. Echo get_hash_table ('message', 'user34523'); // The result is message_13. |
?> The above method tells us that all user messages of user18991 are recorded in message_10. All user messages of user34523 are recorded in message_13, you only need to read from the corresponding table.
Advantage: it avoids millions of data entries in a table and shortens the execution time of an SQL statement.
Disadvantage: when a rule is determined, it will be very troublesome to break this rule. In the above example, the hash algorithm I used is crc32. If I do not want to use this algorithm now, after md5 is used, messages of the same user will be stored in different tables, so that data is messy. Poor scalability.
Improve Performance
A) after table sharding, the single-Table concurrency capability is improved, and the disk I/O performance is also improved. Why is the concurrency improved? Because it takes a short time to query a table. If there is a high concurrency, the total table can allocate the concurrency pressure to different small tables based on different queries. How can I achieve high disk I/O performance? A very large. MYD file is now allocated to. MYD in each small table.