Original: http://blog.51yip.com/mysql/949.html
First, let's talk about why the tables are divided.
When a piece of data reaches millions of, you spend more time searching for it, and if you have a joint query, I think it's possible to die there. The purpose of the sub-table is to reduce the burden on the database and shorten the query time.
Based on personal experience, MySQL executes a SQL process as follows:
1, receive sql;2, put SQL into queue, 3, execute sql;4, return execution result. Where do you spend the most time in this process? The first is the time to wait in the queue, and the second, the SQL execution time. In fact, these two are one thing, waiting for the same time, there must be SQL in the execution. So we want to shorten the execution time of SQL.
There is a mechanism in MySQL is table locking and row locking, why this mechanism is to ensure the integrity of the data, I give an example, if there are two SQL to modify the same table of the same data, this time what to do, is not two SQL can simultaneously modify this data? It is clear that MySQL handles this situation in the form of a table lock (MyISAM storage engine) and a row lock (InnoDB storage engine). Table locking means that you can't operate on this table, and you have to wait until I finish working on the table. Row locking is the same, other SQL must wait until I'm done with this data before I can manipulate this piece of data. If there is too much data, the time to execute is too long, and the longer the wait, which is why we have to divide the table.
Second, sub-table
1, do MySQL cluster, for example: Using MySQL cluster, MySQL proxy,mysql replication,drdb, etc.
Some people will ask MySQL cluster, what is the root table related? Although it is not the actual point of the table, but it is the role of the sub-table, what is the significance of the cluster? To reduce the burden on a database, let's just cut down the number of SQL queued SQL, for example: There are 10 SQL requests, if placed in a database server queue, he will wait a long time, if the 10 SQL requests, allocated to 5 database server queue, There are only 2 queues in a database server, so is the wait time greatly shortened? It's already obvious. So I put it in the range of the table, and I did some MySQL clusters:
installation, configuration, and read/write separation of Linux MySQL proxy
MySQL replication is the main slave installation and configuration, and data synchronization
Pros: Good extensibility, no complex operations after multiple tables (PHP code)
Cons: The amount of data in a single table is still unchanged, and the time spent on one operation is still much, and hardware overhead is high.
2, pre-estimated large data volumes and frequent access to the table, divided into a number of tables
This kind of forecast big difference, the forum in the post of the table, the time of the table is certainly very large, hundreds of thousands of, millions of is possible. Chat room inside information table, dozens of people together a chat one night, time is long, this table of data certainly very big. There are a lot of things like this. So this big data scale that can be estimated, we have to separate n table, this n is how much, depending on the actual situation. Take the chat information table as an example:
I built 100 such tables beforehand, message_00,message_01,message_02..........message_98,message_99. Then according to the user's ID to determine the user's chat information into which table, You can use a hash of the way to obtain, you can use to find the way to obtain, a lot of methods, each person to think of it. Here's a hash method to get the table name:
View copy print?
- <?php
- function get_hash_table ($table,$userid) {
- $str = CRC32 ($userid);
- if ($str <0) {
- $hash = "0". SUBSTR (abs ($str), 0, 1);
- }else{
- $hash = substr ($str, 0, 2);
- }
- return $table." _". $hash;
- }
- echo get_hash_table (' message ',' user18991 '); //result is message_10
- echo get_hash_table (' message ',' user34523 '); //result is message_13
- ?>
Explain, above this method, tells us user18991 this user's message all records in message_10 This table, user34523 This user's message all records in message_13 This table, reads, as long as reads from the respective table the line.
Pros: Avoid millions of of data in a single table, shortening the execution time of a SQL
Disadvantage: When a rule is determined, breaking this rule will be very troublesome, the above example, I use the hash algorithm is CRC32, if I do not want to use this algorithm, instead of MD5, will make the same user's message is stored in a different table, so the data is out of order. Extensibility is poor.
3, using the merge storage engine to achieve the sub-table
I think this method is more suitable, those without prior consideration, and has already appeared, the data query slow situation. This time if you want to separate the existing big data scale pain, the most painful thing is to change the code, because the SQL statement inside the program has been written, now a table to be divided into dozens of tables, even hundreds of tables, so that the SQL statement is to be rewritten? For example, I like to raise a child.
Mysql>show engines, you will find that Mrg_myisam is actually the merge.
View copy print?
- mysql> CREATE TABLE IF not EXISTS ' user1 ' (
- ' id ' int (one) not NULL auto_increment,
- ' Name ' varchar (), DEFAULT NULL,
- ' Sex ' int (1) not NULL DEFAULT ' 0 ',
- -PRIMARY KEY (' id ')
- ) Engine=myisam DEFAULT Charset=utf8 auto_increment=1;
- Query OK, 0 rows affected (0.05 sec)
- mysql> CREATE TABLE IF not EXISTS ' User2 ' (
- ' id ' int (one) not NULL auto_increment,
- ' Name ' varchar (), DEFAULT NULL,
- ' Sex ' int (1) not NULL DEFAULT ' 0 ',
- -PRIMARY KEY (' id ')
- ) Engine=myisam DEFAULT Charset=utf8 auto_increment=1;
- Query OK, 0 rows affected (0.01 sec)
- Mysql> INSERT into ' user1 ' (' name ', ' Sex ') VALUES (' Zhang Ying ', 0);
- Query OK, 1 row Affected (0.00 sec)
- mysql> INSERT into ' user2 ' (' name ', ' Sex ') VALUES (' tank ', 1);
- Query OK, 1 row Affected (0.00 sec)
- mysql> CREATE TABLE IF not EXISTS ' AllUser ' (
- ' id ' int (one) not NULL auto_increment,
- ' Name ' varchar (), DEFAULT NULL,
- ' Sex ' int (1) not NULL DEFAULT ' 0 ',
- -INDEX (ID)
- ) Type=merge union= (user1,user2) insert_method=last auto_increment=1;
- Query OK, 0 rows affected, 1 Warning (0.00 sec)
- Mysql> select Id,name,sex from AllUser;
- +----+--------+-----+
- | ID | name | sex |
- +----+--------+-----+
- | 1 | Zhang Ying | 0 |
- | 1 | Tank | 1 |
- +----+--------+-----+
- 2 rows in Set (0.00 sec)
- mysql> INSERT into ' alluser ' (' name ', ' Sex ') VALUES (' Tank2 ', 0);
- Query OK, 1 row Affected (0.00 sec)
- Mysql> Select Id,name,sex from User2
- ;
- +----+-------+-----+
- | ID | name | sex |
- +----+-------+-----+
- | 1 | Tank | 1 |
- | 2 | Tank2 | 0 |
- +----+-------+-----+
- 2 rows in Set (0.00 sec)
From the above operation, I do not know you have found something? If I have a user table users, there are 50W data, now to be split into two tables User1 and user2, each table 25W data,
INSERT into User1 (user1.id,user1.name,user1.sex) SELECT (user.id,user.name,user.sex) from user where User.ID <= 250000
INSERT into User2 (user2.id,user2.name,user2.sex) SELECT (user.id,user.name,user.sex) from user where user.id > 250000
So I succeeded in a user table, divided into two tables, this time there is a problem, the code of the SQL statement how to do, before is a table, now become two tables, code changes are very large, so that the programmer has brought a lot of work, there is no good way to solve this? The way is to back up the previous user table, and then delete, the above operation I created a alluser table, only the table name of the AllUser table is changed to user on the line. However, not all MySQL operations are available.
A, if you use ALTER TABLE to change the merge table to another table type, the mapping to the underlying table is lost. Instead, rows from the underlying MyISAM table are copied to the replaced table, which is then assigned the new type.
b, on the internet to see some say replace does not work, I tried to work. Dizzy One First
- mysql> UPDATE alluser SET sex=replace (sex, 0, 1) where id=2;
- Query OK, 1 row Affected (0.00 sec)
- Rows matched:1 changed:1 warnings:0
- Mysql> select * from AllUser;
- +----+--------+-----+
- | ID | name | sex |
- +----+--------+-----+
- | 1 | Zhang Ying | 0 |
- | 1 | Tank | 1 |
- | 2 | Tank2 | 1 |
- +----+--------+-----+
- 3 Rows in Set (0.00 sec)
C, a merge table cannot maintain unique constraints on the entire table. When you execute an INSERT, the data goes into the first or last MyISAM table (depending on the value of the Insert_method option). MySQL ensures that the unique key value remains unique in that MyISAM table, but not across all the tables in the collection.
D, when you create a merge table, there is no check to ensure that the underlying table exists and has the same organization. When the merge table is used, MySQL checks that the record length of each mapped table is equal, but this is not very reliable. If you never create a merge table from a similar MyISAM table, you are very likely to bump into strange problems.
Good sleepy Sleep, C and D on the Internet to see, no test, everyone try it.
Advantages: Good extensibility, and the program code changes are not very large
Cons: This method is less effective than the second one
Three, summarize
Of the three methods mentioned above, I have actually done two kinds, first and second. The third one has not been done, so the finer point. Ha ha. Do what things have a degree, more than a degree to become very poor, can not blindly do database server cluster, hardware is to spend money to buy, do not blindly divide the table, separated to 1000 tables, MySQL storage in the final analysis also to the situation of the file on the hard disk, a table corresponding to three files, The 1000 sub-table is the corresponding 3,000 files, so the retrieval will also become very slow. My advice is to
Method 1 and Method 2 are combined in a way to perform the sub-table
Method 1 and Method 3 are combined in a way to perform the sub-table
My two suggestions are suitable for different situations, depending on the individual situation, I think there will be a lot of people choose Method 1 and Method 3 combined way
3 Ways to table MySQL (RPM)