Mysql table sharding implementation

Last Update:2013-11-17 Source: Internet

Author: User

Tags crc32

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

If you need to perform mysql table sharding, we will prove that your database is relatively large, that is, to divide a table into N multiple small tables. After Table sharding, The concurrency of a single table is improved, disk I/O performance is also improved. Why is the concurrency improved? Because it takes a short time to query a table. If there is a high concurrency, the total table can allocate the concurrency pressure to different small tables based on different queries.

What is table sharding? On the surface, it means dividing a table into N small tables.

Mysql sub-tables are real sub-tables. After a table is divided into many tables, each small table is a positive table, which corresponds to three files. MYD data file ,. MYI index file ,. frm table structure file.

1. [root @ BlackGhost test] # ls | grep user
2. alluser. MRG
3. alluser. frm
4. user1.MYD
5. user1.MYI
6. user1.frm
7. user2.MYD
8. user2.MYI
9. user2.frm

1. Create a mysql cluster, such as using mysql cluster, mysql proxy, mysql replication, and drdb.

Some people will ask the mysql cluster, what is the relationship between the root table sharding? Although it is not a table sharding in the actual sense, it starts the role of table sharding. What is the significance of the cluster? To reduce the burden on a database, simply reduce the number of SQL statements in the SQL queue. For example, if there are 10 SQL requests in the queue of a database server, it takes a long time to allocate these 10 SQL requests to the queuing queues of five database servers. There are only two queues of one database server, is the waiting time greatly shortened? This is already obvious. So I listed it within the table sharding range. I have done some mysql clusters:

Installation, configuration, and read/write splitting of linux mysql proxy

Mysql replication is mutually active/standby installation and configuration, and Data Synchronization

Advantage: good scalability, no complex operations after multiple table shards (php code)

Disadvantage: The data volume of a single table remains unchanged, the time spent on one operation is still that large, and the hardware overhead is high.

2. Tables with large data volume and frequent access are estimated in advance and divided into several tables.

This estimation is very poor. The table posted in the Forum is certainly very large after a long time, which may be hundreds of thousands or even millions of users. The chat room information table contains dozens of people chatting for one night. After a long time, the data in this table must be large. There are many situations like this. Therefore, for this big data table that can be estimated, We will separate N tables in advance. The N value depends on the actual situation. Take the chat info table as an example:

First, we will create 100 such tables, message_00, message_01, message_02 .......... message_98, message_99. then, the user ID is used to determine which table the user's chat information is stored in. You can use the hash method to obtain the information, and the remainder method to obtain the information. There are many methods, let everyone think about it. The following uses the hash method to obtain the table name:

The Code is as follows:

Copy code

<? Php
Function get_hash_table ($ table, $ userid ){
$ Str = crc32 ($ userid );
If ($ str <0 ){
$ Hash = "0". substr (abs ($ str), 0, 1 );
} Else {
$ Hash = substr ($ str, 0, 2 );
}

Return $ table. "_". $ hash;
}

Echo get_hash_table ('message', 'user18991 '); // The result is message_10.
Echo get_hash_table ('message', 'user34523'); // The result is message_13.

?> The above method tells us that all user messages of user18991 are recorded in message_10. All user messages of user34523 are recorded in message_13, you only need to read from the corresponding table.

Advantage: it avoids millions of data entries in a table and shortens the execution time of an SQL statement.

Disadvantage: when a rule is determined, it will be very troublesome to break this rule. In the above example, the hash algorithm I used is crc32. If I do not want to use this algorithm now, after md5 is used, messages of the same user will be stored in different tables, so that data is messy. Poor scalability.

Improve Performance

A) after table sharding, the single-Table concurrency capability is improved, and the disk I/O performance is also improved. Why is the concurrency improved? Because it takes a short time to query a table. If there is a high concurrency, the total table can allocate the concurrency pressure to different small tables based on different queries. How can I achieve high disk I/O performance? A very large. MYD file is now allocated to. MYD in each small table.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More