Mysql table sharding methods
I. Let me explain why table Sharding is required.
When a piece of data reaches several millions, it takes more time for you to query the data at a time. if there is a joint query, I think it may die there. The purpose of table Sharding is to reduce the burden on the database and shorten the query time.
Based on personal experience, mysql executes an SQL statement as follows:
1. receive the SQL statement; 2. put the SQL statement in the queue; 3. execute the SQL statement; 4. return the execution result. What is the most time spent in this execution process? The first is the waiting time in the queue, and the second is the SQL execution time. In fact, these two are the same thing. while waiting, there must be SQL execution. Therefore, we need to shorten the SQL execution time.
Mysql has a mechanism of table locking and row locking. Why does this mechanism occur to ensure data integrity? for example, if two SQL statements need to modify the same data record of the same table, what should we do? can both SQL statements modify this data record at the same time? Obviously, mysql handles this situation by locking tables (myisam storage engine) and row (innodb storage engine ). If the table is locked, neither of you can operate on the table. you must wait for me to complete the operation on the table. The same is true for row locking. other SQL statements can operate on this data only after I have finished the operation on this data. If there is too much data, the execution time is too long, and the waiting time is longer, which is why we want to split tables.
II. table sharding
1. create a mysql cluster, such as using mysql cluster, mysql proxy, mysql replication, and drdb.
Some people will ask the mysql Cluster, what is the relationship between the root table sharding? Although it is not a table sharding in the actual sense, it starts the role of table sharding. what is the meaning of cluster? To reduce the burden on a database, simply reduce the number of SQL statements in the SQL queue. for example, if there are 10 SQL requests in the queue of a database server, it takes a long time to allocate these 10 SQL requests to the queuing queues of five database servers. there are only two queues of one database server, is the waiting time greatly shortened? This is already obvious. So I listed it within the table sharding range. I have done some mysql clusters:
Installation, configuration, and read/write splitting of linux mysql proxy
Mysql replication is mutually active/standby installation and configuration, and data synchronization
Advantage: good scalability, no complex operations after multiple table shards (php code)
Disadvantage: the data volume of a single table remains unchanged, the time spent on one operation is still that large, and the hardware overhead is high.
2. tables with large data volume and frequent access are estimated in advance and divided into several tables.
This estimation is very poor. The Table posted in the forum is certainly very large after a long time, which may be hundreds of thousands or even millions of users. The chat room information table contains dozens of people chatting for one night. after a long time, the data in this table must be large. There are many situations like this. Therefore, for this big data table that can be estimated, we will separate N tables in advance. the N value depends on the actual situation. Take the chat info table as an example:
First, we will create 100 such tables, message_00, message_01, message_02 .......... message_98, message_99. then, the user ID is used to determine which table the user's chat information is stored in. you can use the hash method to obtain the information, and the remainder method to obtain the information. There are many methods, let everyone think about it. The following uses the hash method to obtain the table name:
Php code
The above method tells us that all user messages of user18991 are recorded in message_10. all user messages of user34523 are recorded in message_13, you only need to read from the corresponding table.
Advantage: it avoids millions of data entries in a table and shortens the execution time of an SQL statement.
Disadvantage: when a rule is determined, it will be very troublesome to break this rule. in the above example, the hash algorithm I used is crc32. if I do not want to use this algorithm now, after md5 is used, messages of the same user will be stored in different tables, so that data is messy. Poor scalability.
3. use the merge storage engine to implement table sharding
I think this method is suitable for scenarios where data query is slow without prior consideration. At this time, it is painful to separate the existing big data table. the most painful thing is to change the code, because the SQL statements in the program have been written, now, a table is divided into dozens or even hundreds of tables. do SQL statements need to be rewritten? For example, I like it very much.
Mysql> show engines; you will find that mrg_myisam is actually merge.
Query
Mysql> create table if not exists 'user1 '(
-> 'Id' int (11) not null AUTO_INCREMENT,
-> 'Name' varchar (50) default null,
-> 'Sex' int (1) not null default '0 ',
-> Primary key ('id ')
->) ENGINE = MyISAM default charset = utf8 AUTO_INCREMENT = 1;
Query OK, 0 rows affected (0.05 sec)
Mysql> create table if not exists 'user2 '(
-> 'Id' int (11) not null AUTO_INCREMENT,
-> 'Name' varchar (50) default null,
-> 'Sex' int (1) not null default '0 ',
-> Primary key ('id ')
->) ENGINE = MyISAM default charset = utf8 AUTO_INCREMENT = 1;
Query OK, 0 rows affected (0.01 sec)
Mysql> insert into 'user1' ('name', 'sex') VALUES ('Zhang Ying', 0 );
Query OK, 1 row affected (0.00 sec)
Mysql> insert into 'user2' ('name', 'sex') VALUES ('tank', 1 );
Query OK, 1 row affected (0.00 sec)
Mysql> create table if not exists 'alluser '(
-> 'Id' int (11) not null AUTO_INCREMENT,
-> 'Name' varchar (50) default null,
-> 'Sex' int (1) not null default '0 ',
-> INDEX (id)
->) TYPE = merge union = (user1, user2) INSERT_METHOD = LAST AUTO_INCREMENT = 1;
Query OK, 0 rows affected, 1 warning (0.00 sec)
Mysql> select id, name, sex from alluser;
+ ---- + -------- + ----- +
| Id | name | sex |
+ ---- + -------- + ----- +
| 1 | images | 0 |
| 1 | tank | 1 |
+ ---- + -------- + ----- +
2 rows in set (0.00 sec)
Mysql> insert into 'alluser' ('name', 'sex') VALUES ('tank2', 0 );
Query OK, 1 row affected (0.00 sec)
Mysql> select id, name, sex from user2
->;
+ ---- + ------- + ----- +
| Id | name | sex |
+ ---- + ------- + ----- +
| 1 | tank | 1 |
| 2 | tank2 | 0 |
+ ---- + ------- + ----- +
2 rows in set (0.00 sec)
Mysql> create table if not exists 'user1 '(
-> 'Id' int (11) not null AUTO_INCREMENT,
-> 'Name' varchar (50) default null,
-> 'Sex' int (1) not null default '0 ',
-> Primary key ('id ')
->) ENGINE = MyISAM default charset = utf8 AUTO_INCREMENT = 1;
Query OK, 0 rows affected (0.05 sec)
Mysql> create table if not exists 'user2 '(
-> 'Id' int (11) not null AUTO_INCREMENT,
-> 'Name' varchar (50) default null,
-> 'Sex' int (1) not null default '0 ',
-> Primary key ('id ')
->) ENGINE = MyISAM default charset = utf8 AUTO_INCREMENT = 1;
Query OK, 0 rows affected (0.01 sec)
Mysql> insert into 'user1' ('name', 'sex') VALUES ('Zhang Ying', 0 );
Query OK, 1 row affected (0.00 sec)
Mysql> insert into 'user2' ('name', 'sex') VALUES ('tank', 1 );
Query OK, 1 row affected (0.00 sec)
Mysql> create table if not exists 'alluser '(
-> 'Id' int (11) not null AUTO_INCREMENT,
-> 'Name' varchar (50) default null,
-> 'Sex' int (1) not null default '0 ',
-> INDEX (id)
->) TYPE = merge union = (user1, user2) INSERT_METHOD = LAST AUTO_INCREMENT = 1;
Query OK, 0 rows affected, 1 warning (0.00 sec)
Mysql> select id, name, sex from alluser;
+ ---- + -------- + ----- +
| Id | name | sex |
+ ---- + -------- + ----- +
| 1 | images | 0 |
| 1 | tank | 1 |
+ ---- + -------- + ----- +
2 rows in set (0.00 sec)
Mysql> insert into 'alluser' ('name', 'sex') VALUES ('tank2', 0 );
Query OK, 1 row affected (0.00 sec)
Mysql> select id, name, sex from user2
->;
+ ---- + ------- + ----- +
| Id | name | sex |
+ ---- + ------- + ----- +
| 1 | tank | 1 |
| 2 | tank2 | 0 |
+ ---- + ------- + ----- +
2 rows in set (0.00 sec)
From the above operations, I don't know if you have found anything? Assume that I have a user table with 50 million pieces of data. now I want to split it into two tables, user1 and user2, with each table having 25 million pieces of data,
Insert into user1 (user1.id, user1.name, user1.sex) SELECT (user. id, user. name, user. sex) FROM user where user. id <= 250000
Insert into user2 (user2.id, user2.name, user2.sex) SELECT (user. id, user. name, user. sex) FROM user where user. id> 250000
In this way, I successfully divided a user table into two tables. at this time, there was a problem: what should I do with the SQL statements in the code? I used to be a table and now I have changed to two tables, the code is greatly changed, which brings a lot of work to the programmer. Is there a good solution to this problem? The solution is to back up the previous user table and delete it. in the above operation, I created an alluser table and changed the table name of this alluser table to user. However, not all mysql operations can be used
A. If you use alter table to change the merge table to another table type, the ing to the underlying table will be lost. Instead, the rows from the underlying myisam table are copied to the replaced table, and the table is then specified with a new type.
B. I have seen some online saying that replace does not work. I tried it to make it work. Dizzy first
Mysql> UPDATE alluser SET sex = REPLACE (sex, 0, 1) where id = 2;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
Mysql> select * from alluser;
+ ---- + -------- + ----- +
| Id | name | sex |
+ ---- + -------- + ----- +
| 1 | images | 0 |
| 1 | tank | 1 |
| 2 | tank2 | 1 |
+ ---- + -------- + ----- +
3 rows in set (0.00 sec)
Php code
Mysql> UPDATE alluser SET sex = REPLACE (sex, 0, 1) where id = 2;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
Mysql> select * from alluser;
+ ---- + -------- + ----- +
| Id | name | sex |
+ ---- + -------- + ----- +
| 1 | images | 0 |
| 1 | tank | 1 |
| 2 | tank2 | 1 |
+ ---- + -------- + ----- +
3 rows in set (0.00 sec)
C. a merge table cannot maintain the unique constraint on the entire table. When you execute an insert statement, the data enters the first or last myisam table (depending on the value of the insert_method option ). Mysql ensures that the unique key value remains unique in that myisam table, but not across all tables in the set.
D. when creating a merge table, you do not check whether the underlying table exists and has the same mechanism. When a merge table is used, mysql checks whether the record lengths of each mapped table are equal, but this is not very reliable. If you create a merge table from a similar myisam table, you may encounter a strange problem.
So sleepy, c and d are on the internet and there is no test. please try it.
Advantage: good scalability, and the program code is not greatly changed
Disadvantage: the effect of this method is worse than that of the second method.
III. Summary
The three methods mentioned above are actually two types, the first and the second. The third method has not been done, so it is more detailed. Haha. Everything can be done at one level. if you do more than one level, it will become very poor. you cannot simply create a database server cluster. hardware requires money to purchase, and you should not blindly split tables, after dividing 1000 tables, mysql storage still exists on the hard disk as a file. one table corresponds to three files, and the other 1000 table shards correspond to 3000 files, in this way, the retrieval will become very slow. My suggestion is:
Methods 1 and 2 are combined for table sharding.
Methods 1 and 3 are combined for table sharding.
My two suggestions are suitable for different situations. depending on my personal situation, I think there will be a lot of people who choose methods 1 and 3 in combination.