MySQL table sharding Methods: MySQL

Source: Internet
Author: User
Tags crc32

MySQL table sharding Methods: MySQL

I. Let me explain why Table sharding is required.

When a piece of data reaches several millions, it takes more time for you to query the data at a time. If there is a joint query, I think it may die there. The purpose of table sharding is to reduce the burden on the database and shorten the query time.

 

Based on personal experience, mysql executes an SQL statement as follows:

1. Receive SQL statements;

2. Place the SQL statement in the queue;

3. Execute SQL statements;

4. Return the execution result.

What is the most time spent in this execution process? The first is the waiting time in the queue; the second is the SQL Execution time. In fact, these two are the same thing. While waiting, there must be SQL Execution. Therefore, we need to shorten the SQL Execution time.

 

Mysql has a mechanism of table locking and row locking. Why is this mechanism necessary to ensure data integrity. For example, if two SQL statements need to modify the same data record of the same table, what should we do? Can Both SQL statements modify the data record at the same time?

 

Obviously, mysql handles this situation by locking tables (myisam storage engine) and row (innodb Storage engine ). If the table is locked, neither of you can operate on the table. You must wait for me to complete the operation on the table. The same is true for row locking. Other SQL statements can operate on this data only after I have finished the operation on this data. If there is too much data, the execution time is too long, and the waiting time is longer, which is why we want to split tables.

 

Ii. Table sharding

1. Create a mysql cluster. For example, using mysql cluster, mysql proxy, mysql replication, drdb, etc.

Someone will ask mysql cluster, what is the relationship with table sharding? Although it is not actually a table Shard, it plays a role in Table sharding. What is the significance of cluster creation? To reduce the burden on a database, simply reduce the number of SQL statements in the SQL queue.

 

For example, if there are 10 SQL requests in the queue of a database server, it will take a long time. If you put these 10 SQL requests, in the queue allocated to five database servers, there are only two queues in one database server. Is the waiting time greatly shortened? This is already obvious.

 

Advantages:Good scalability, no complex operations after multiple table shards (php code)

Disadvantages:The data volume of a single table remains unchanged. The time spent on one operation is still that large, resulting in high hardware overhead.

 

2. Tables with large data volume and frequent access are estimated in advance and divided into several tables.

This estimation is very poor. The table posted in the Forum is certainly very large after a long time, which may be hundreds of thousands or even millions of users. The chat room information table contains dozens of people chatting for one night. After a long time, the data in this table must be large. There are many situations like this. Therefore, for this big data table that can be estimated, We will separate N tables in advance. The N value depends on the actual situation. Take the chat info table as an example:

 

First, we will create 100 such tables, message_00, message_01, message_02 .......... message_98, message_99. then, the user ID is used to determine the table in which the user's chat information is stored. You can use the hash method and the remainder method to obtain the information. There are many ways for each person to think about it. The following uses the hash method to obtain the table name:

<? Php function get_hash_table ($ table, $ userid) {$ str = crc32 ($ userid); if ($ str <0) {$ hash = "0 ". substr (abs ($ str), 0, 1);} else {$ hash = substr ($ str, 0, 2);} return $ table. "_". $ hash;} echo get_hash_table ('message', 'user18991'); // The result is message_10 echo get_hash_table ('message', 'user34523'); // The result is message_13?>

 

The above method tells us that all user messages of user18991 are recorded in message_10. All user messages of user34523 are recorded in message_13, you only need to read from the corresponding table.

 

Advantages:This avoids millions of data entries in a table and shortens the execution time of an SQL statement.

Disadvantages:When a rule is determined, it will be very troublesome to break this rule. In the above example, the hash algorithm I used is crc32. If I do not want to use this algorithm now, I will switch to md5 instead, messages of the same user will be stored in different tables, so that data is messy. Poor scalability.

 

3. Use the merge storage engine to implement table sharding

I think this method is suitable for scenarios where data query is slow because it has not been considered in advance. In this case, it is painful to separate the existing big data table. The most painful thing is to change the code because the SQL statements in the program have been written. Now, a table is divided into dozens or even hundreds of tables. Do SQL statements need to be rewritten? For example, I like it very much.

Mysql> show engines; you will find that mrg_myisam is actually merge.

Mysql> create table if not exists 'user1' (-> 'id' int (11) not null AUTO_INCREMENT,-> 'name' varchar (50) default null, -> 'sex' int (1) not null default '0',-> primary key ('id')->) ENGINE = MyISAM default charset = utf8 AUTO_INCREMENT = 1; query OK, 0 rows affected (0.05 sec) mysql> CREATE TABLE IF NOT EXISTS 'user2' (-> 'id' int (11) NOT NULL AUTO_INCREMENT, -> 'name' varchar (50) default null,-> 'sex' int (1) not null default '0',-> primary key ('id')->) ENGINE = MyISAM default charset = utf8 AUTO_INCREMENT = 1; Query OK, 0 rows affected (0.01 sec) mysql> INSERT INTO 'user1' ('name', 'sex ') VALUES ('zhang ying', 0); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO 'user2' ('name', 'sex ') VALUES ('tank', 1); Query OK, 1 row affected (0.00 sec) mysql> CREATE TABLE IF NOT EXISTS 'alluser' (-> 'id' int (11) not null AUTO_INCREMENT,-> 'name' varchar (50) default null,-> 'sex' int (1) not null default '0',-> INDEX (id) ->) TYPE = merge union = (user1, user2) INSERT_METHOD = LAST AUTO_INCREMENT = 1; Query OK, 0 rows affected, 1 warning (0.00 sec) mysql> select id, name, sex from alluser; + ---- + -------- + ----- + | id | name | sex | + ---- + -------- + ----- + | 1 | | 0 | 1 | tank | 1 | + ---- + -------- + ----- + 2 rows in set (0.00 sec) mysql> insert into 'alluser' ('name', 'sex') VALUES ('tank2', 0); Query OK, 1 row affected (0.00 sec) mysql> select id, name, sex from user2->; + ---- + ------- + ----- + | id | name | sex | + ---- + ------- + ----- + | 1 | tank | 1 | 2 | tank2 | 0 | + ---- + ------- + ----- + 2 rows in set (0.00 sec)

 

From the above operations, I don't know if you have found anything? Assume that I have a user table with 50 million pieces of data. Now I want to split it into two tables, user1 and user2, with each table having 25 million pieces of data,

INSERT INTO user1(user1.id,user1.name,user1.sex)SELECT (user.id,user.name,user.sex)FROM user where user.id <= 250000INSERT INTO user2(user2.id,user2.name,user2.sex)SELECT (user.id,user.name,user.sex)FROM user where user.id > 250000

 

In this way, I successfully split a user table into two tables. In this case, there is a problem: What should I do with the SQL statements in the code? I used to be a table. Now I have two tables, and the code has been greatly changed. This has brought a lot of work to the programmer. Is there a good way to solve this problem?

 

The solution is to back up the previous user table and delete it. In the above operation, I created an alluser table and changed the table name of this alluser table to user. However, not all mysql operations can be used.

 

A. If you use alter table to change the merge table to another table type, the ing to the underlying table will be lost. Instead, the rows from the underlying myisam table are copied to the replaced table, and the table is then specified with a new type.

 

B. I have seen some online saying that replace does not work. I tried it to make it work. Dizzy first

Mysql> UPDATE alluser SET sex = REPLACE (sex, 0, 1) where id = 2; Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> select * from alluser; + ---- + -------- + ----- + | id | name | sex | + ---- + -------- + ----- + | 1 | | 0 | 1 | tank | 1 | 2 | tank2 | 1 | + ---- + -------- + ----- + 3 rows in set (0.00 sec)

 

C. A merge table cannot maintain the unique constraint on the entire table. When you execute an insert statement, the data enters the first or last myisam table (depending on the value of the insert_method option ). Mysql ensures that the unique key value remains unique in that myisam table, but not across all tables in the set.

 

D. When creating a merge table, you do not check whether the underlying table exists and has the same mechanism. When a merge table is used, mysql checks whether the record lengths of each mapped table are equal, but this is not very reliable. If you create a merge table from a similar myisam table, you may encounter a strange problem.

 

C and d are not tested on the Internet. Please try it.

 

Advantage: good scalability, and the program code is not greatly changed

Disadvantage: the effect of this method is worse than that of the second method.

 

Iii. Summary

The three methods mentioned above are actually two types, the first and the second. The third method has not been done, so it is more detailed. Haha. There is a degree in everything you do. If you do more than one degree, it will become very poor. You cannot simply create a database server cluster, and you have to pay for hardware. Also, do not just split the table into 1000 tables. mysql storage still exists on the hard disk as a file, and one table corresponds to three files, the 1000 sub-tables correspond to 3000 files, which makes retrieval very slow. My suggestion is:

Methods 1 and 2 are combined for table sharding.

Methods 1 and 3 are combined for table sharding.

My two suggestions are suitable for different situations. Depending on my personal situation, I think there will be a lot of people who choose the combination of method 1 and method 3.

 

 

Author: Zhang Ying
Address: http://blog.51yip.com/mysql/949.html

I. Let me explain why Table sharding is required.

When a piece of data reaches several millions, it takes more time for you to query the data at a time. If there is a joint query, I think it may die there. The purpose of table sharding is to reduce the burden on the database and shorten the query time.

 

Based on personal experience, mysql executes an SQL statement as follows:

1. Receive SQL statements;

2. Place the SQL statement in the queue;

3. Execute SQL statements;

4. Return the execution result.

What is the most time spent in this execution process? The first is the waiting time in the queue; the second is the SQL Execution time. In fact, these two are the same thing. While waiting, there must be SQL Execution. Therefore, we need to shorten the SQL Execution time.

 

Mysql has a mechanism of table locking and row locking. Why is this mechanism necessary to ensure data integrity. For example, if two SQL statements need to modify the same data record of the same table, what should we do? Can Both SQL statements modify the data record at the same time?

 

Obviously, mysql handles this situation by locking tables (myisam storage engine) and row (innodb Storage engine ). If the table is locked, neither of you can operate on the table. You must wait for me to complete the operation on the table. The same is true for row locking. Other SQL statements can operate on this data only after I have finished the operation on this data. If there is too much data, the execution time is too long, and the waiting time is longer, which is why we want to split tables.

 

Ii. Table sharding

1. Create a mysql cluster. For example, using mysql cluster, mysql proxy, mysql replication, drdb, etc.

Someone will ask mysql cluster, what is the relationship with table sharding? Although it is not actually a table Shard, it plays a role in Table sharding. What is the significance of cluster creation? To reduce the burden on a database, simply reduce the number of SQL statements in the SQL queue.

 

For example, if there are 10 SQL requests in the queue of a database server, it will take a long time. If you put these 10 SQL requests, in the queue allocated to five database servers, there are only two queues in one database server. Is the waiting time greatly shortened? This is already obvious.

 

Advantages:Good scalability, no complex operations after multiple table shards (php code)

Disadvantages:The data volume of a single table remains unchanged. The time spent on one operation is still that large, resulting in high hardware overhead.

 

2. Tables with large data volume and frequent access are estimated in advance and divided into several tables.

This estimation is very poor. The table posted in the Forum is certainly very large after a long time, which may be hundreds of thousands or even millions of users. The chat room information table contains dozens of people chatting for one night. After a long time, the data in this table must be large. There are many situations like this. Therefore, for this big data table that can be estimated, We will separate N tables in advance. The N value depends on the actual situation. Take the chat info table as an example:

 

First, we will create 100 such tables, message_00, message_01, message_02 .......... message_98, message_99. then, the user ID is used to determine the table in which the user's chat information is stored. You can use the hash method and the remainder method to obtain the information. There are many ways for each person to think about it. The following uses the hash method to obtain the table name:

 

<? Php
Function get_hash_table ($ table, $ userid ){
$ Str = crc32 ($ userid );
If ($ str <0 ){
$ Hash = "0". substr (abs ($ str), 0, 1 );
} Else {
$ Hash = substr ($ str, 0, 2 );
}
 
Return $ table. "_". $ hash;
}
 
Echo get_hash_table ('message', 'user18991 '); // The result is message_10.
Echo get_hash_table ('message', 'user34523'); // The result is message_13.
?>

 

 

The above method tells us that all user messages of user18991 are recorded in message_10. All user messages of user34523 are recorded in message_13, you only need to read from the corresponding table.

 

Advantages:This avoids millions of data entries in a table and shortens the execution time of an SQL statement.

Disadvantages:When a rule is determined, it will be very troublesome to break this rule. In the above example, the hash algorithm I used is crc32. If I do not want to use this algorithm now, I will switch to md5 instead, messages of the same user will be stored in different tables, so that data is messy. Poor scalability.

 

3. Use the merge storage engine to implement table sharding

I think this method is suitable for scenarios where data query is slow because it has not been considered in advance. In this case, it is painful to separate the existing big data table. The most painful thing is to change the code because the SQL statements in the program have been written. Now, a table is divided into dozens or even hundreds of tables. Do SQL statements need to be rewritten? For example, I like it very much.

Mysql> show engines; you will find that mrg_myisam is actually merge.

 

Mysql> create table if not exists 'user1 '(
-> 'Id' int (11) not null AUTO_INCREMENT,
-> 'Name' varchar (50) default null,
-> 'Sex' int (1) not null default '0 ',
-> Primary key ('id ')
->) ENGINE = MyISAM default charset = utf8 AUTO_INCREMENT = 1;
Query OK, 0 rows affected (0.05 sec)
 
Mysql> create table if not exists 'user2 '(
-> 'Id' int (11) not null AUTO_INCREMENT,
-> 'Name' varchar (50) default null,
-> 'Sex' int (1) not null default '0 ',
-> Primary key ('id ')
->) ENGINE = MyISAM default charset = utf8 AUTO_INCREMENT = 1;
Query OK, 0 rows affected (0.01 sec)
 
Mysql> insert into 'user1' ('name', 'sex') VALUES ('zhang ying', 0 );
Query OK, 1 row affected (0.00 sec)
 
Mysql> insert into 'user2' ('name', 'sex') VALUES ('tank', 1 );
Query OK, 1 row affected (0.00 sec)
 
Mysql> create table if not exists 'alluser '(
-> 'Id' int (11) not null AUTO_INCREMENT,
-> 'Name' varchar (50) default null,
-> 'Sex' int (1) not null default '0 ',
-> INDEX (id)
->) TYPE = merge union = (user1, user2) INSERT_METHOD = LAST AUTO_INCREMENT = 1;
Query OK, 0 rows affected, 1 warning (0.00 sec)
 
Mysql> select id, name, sex from alluser;
+ ---- + -------- + ----- +
| Id | name | sex |
+ ---- + -------- + ----- +
| 1 | images | 0 |
| 1 | tank | 1 |
+ ---- + -------- + ----- +
2 rows in set (0.00 sec)
 
Mysql> insert into 'alluser' ('name', 'sex') VALUES ('tank2', 0 );
Query OK, 1 row affected (0.00 sec)
 
Mysql> select id, name, sex from user2
->;
+ ---- + ------- + ----- +
| Id | name | sex |
+ ---- + ------- + ----- +
| 1 | tank | 1 |
| 2 | tank2 | 0 |
+ ---- + ------- + ----- +
2 rows in set (0.00 sec)

 

 

From the above operations, I don't know if you have found anything? Assume that I have a user table with 50 million pieces of data. Now I want to split it into two tables, user1 and user2, with each table having 25 million pieces of data,

 

INSERT INTO user1(user1.id,user1.name,user1.sex)
SELECT (user.id,user.name,user.sex)FROM user where user.id <= 250000
INSERT INTO user2(user2.id,user2.name,user2.sex)
SELECT (user.id,user.name,user.sex)FROM user where user.id > 250000

 

 

In this way, I successfully split a user table into two tables. In this case, there is a problem: What should I do with the SQL statements in the code? I used to be a table. Now I have two tables, and the code has been greatly changed. This has brought a lot of work to the programmer. Is there a good way to solve this problem?

 

The solution is to back up the previous user table and delete it. In the above operation, I created an alluser table and changed the table name of this alluser table to user. However, not all mysql operations can be used.

 

A. If you use alter table to change the merge table to another table type, the ing to the underlying table will be lost. Instead, the rows from the underlying myisam table are copied to the replaced table, and the table is then specified with a new type.

 

B. I have seen some online saying that replace does not work. I tried it to make it work. Dizzy first

 

Mysql> UPDATE alluser SET sex = REPLACE (sex, 0, 1) where id = 2;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
 
Mysql> select * from alluser;
+ ---- + -------- + ----- +
| Id | name | sex |
+ ---- + -------- + ----- +
| 1 | images | 0 |
| 1 | tank | 1 |
| 2 | tank2 | 1 |
+ ---- + -------- + ----- +
3 rows in set (0.00 sec)

 

 

C. A merge table cannot maintain the unique constraint on the entire table. When you execute an insert statement, the data enters the first or last myisam table (depending on the value of the insert_method option ). Mysql ensures that the unique key value remains unique in that myisam table, but not across all tables in the set.

 

D. When creating a merge table, you do not check whether the underlying table exists and has the same mechanism. When a merge table is used, mysql checks whether the record lengths of each mapped table are equal, but this is not very reliable. If you create a merge table from a similar myisam table, you may encounter a strange problem.

 

C and d are not tested on the Internet. Please try it.

 

Advantage: good scalability, and the program code is not greatly changed

Disadvantage: the effect of this method is worse than that of the second method.

 

Iii. Summary

The three methods mentioned above are actually two types, the first and the second. The third method has not been done, so it is more detailed. Haha. There is a degree in everything you do. If you do more than one degree, it will become very poor. You cannot simply create a database server cluster, and you have to pay for hardware. Also, do not just split the table into 1000 tables. mysql storage still exists on the hard disk as a file, and one table corresponds to three files, the 1000 sub-tables correspond to 3000 files, which makes retrieval very slow. My suggestion is:

Methods 1 and 2 are combined for table sharding.

Methods 1 and 3 are combined for table sharding.

My two suggestions are suitable for different situations. Depending on my personal situation, I think there will be a lot of people who choose the combination of method 1 and method 3.

 

 

Author: Zhang Ying
Address: http://blog.51yip.com/mysql/949.html

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.