In the face of today's big data storage, imagine a significant drop in performance when the total record of a table in MySQL exceeds 1000W? The answer is yes, the total record of a table is more than 1000W, and retrieval at the OS level is also a very inefficient solution: Currently, there are two ways to optimize for massive data: 1, large table to remove the small table (mainly with the table and partition Technology) (1) Sub-table Technology vertical segmentation Advantage: Reduce high concurrency, Lock on the table. Insufficient: For single table, the reading and writing pressure will increase as the database records increase. Horizontal split if the IO pressure of a single table is large, you can consider using horizontal segmentation, the principle is through the hash algorithm, a table is divided into n multiple pages, and through a new table (summary table), the location of each page is recorded. If a portal site, its database table has reached 10 million records, then if the query through select, it will be inefficient (do not index the premise). In order to reduce the read/write IO pressure of a single table, divide the table into 10 pages, generate a total table, record the information of each page, if I query a id=100 record, it no longer requires a full table scan, but through the general table to find the record on which the corresponding page, and then to the corresponding page to do the search , thus reducing the IO pressure. Horizontal table technology is to split a table into multiple tables, the more common way is to split the table records according to a certain hash algorithm, and this partitioning method must be used in front-end applications to modify SQL, and for an SQL statement, may modify two tables, Then you have to modify two SQL statements to complete your logic of the transaction, will make logic judgment more and more complex, this will increase the maintenance cost of the program, so we have to avoid this situation. 2, SQL statement Optimization (index) SQL statement optimization: can be adjusted by increasing the index, but at the same time the increase in the amount of data will lead to the maintenance cost of the index increased. Partitioning advantages: 1, reduce IO2, improve read and write 3, convenient data management partition and sub-table difference: partition is the logical level of horizontal segmentation, for the application, it is still a table. Partitioning is a table of data into more than n chunks, which can be on the same disk, or on a different disk
1. The way of realization(1) MySQL sub-table is a real sub-table, a table is divided into many tables, each small table is a complete table, the corresponding three files, one. MyD data file,. Myi index file,. frm table structure file. [[email protected] test]# ls |grep useralluser. MRGalluser.frmuser1.MYDuser1.MYIuser1.frmuser2.MYDuser2.MYIuser2.frm simple explanation, the above table is the use of the merge storage engine (one of the table), AllUser is the total table, below there are two sub-tables, User1,user2. All two of them are stand-alone tables, and when it comes to data, we can take them through the general table. Here the total table is not. MYD,. Myi the two files, that is to say, the general table he is not a table, no data, the data are placed in the sub-table. Let's take a look. What is MRG really [[email protected] test]# cat AllUser. MRG |moreuser1user2 #INSERT_METHOD =last from the above we can see that alluser. In MRG, there are a few table-related relationships and the way data is inserted. The gross position table can be interpreted as a shell, or as a connection pool. (2) The partition is different, after a large table is partitioned, he is still a table, will not become two tables, but he stored the data more chunks. [[email protected] test]# ls |grep aaaa#p#p1. Mydaa#p#p1. Myiaa#p#p2. Mydaa#p#p2. Myiaa#p#p3. Mydaa#p#p3. MYIaa.frmaa.par from the above we can see that AA this table, divided into 3 districts. We all know that a table corresponds to three files. MYD,. Myi,.frm. partitions partition data and index files according to certain rules, and one more. par file, open the. par file and you can see that he recorded that the partition information for this table is a bit like the. MRG in the Sub-table. After partitioning, it is still a single sheet, not more than one table.
2. Data processing(1) After the table, the data are stored in the table, the summary is only a shell, access to data occurs in a sub-table inside. Take a look at the following example: SELECT * from user1 user2 where id= ' 12 ' on the surface, is the table AllUser operation, in fact, is not. Is the operation of the AllUser inside the table. (2) partition, there is no concept of the sub-table, the partition is only the file to store the data into a number of small pieces, partitioned tables, or a table. Data processing is still done by yourself. SELECT * from AllUser where id= ' 12 '
3. Improve Performance(1) After the table, the concurrency of the single table increased, disk I/O performance has improved. Because the time spent in the query is shortened, if there is a high concurrency, the total table can be based on different queries, the concurrency of pressure into different small tables inside. Would have been a very big one. The MyD file is now also distributed to each of the smaller tables. MyD, so the disk IO pressure is also reduced. (2) MySQL proposed the concept of partitioning, I feel like to break through the disk I/O bottleneck, want to improve the read and write ability of the disk to increase MySQL performance. At this point, the focus of partitioning and sub-table is different, the key point is how to improve the MySQL concurrency ability when accessing data, and partitioning is how to break through the disk read and write ability, so as to improve the performance of MySQL.
4. Ease of implementation on(1) There are many ways to divide the table, and it is the simplest way to divide the table with the merge. In this way, the root partitioning is almost as easy as possible and transparent to the program code. It would be more troublesome to use other sub-tables than partitioning. (2) partition implementation is relatively simple, the establishment of a partitioned table, with the normal table is no different, and the opposite side of the code is transparent. Partition type hash, range, list, key
Range Partition: Assigns multiple rows to a partition based on a column value for a given contiguous interval.
List partitioning: Similar to by range partitioning, the difference is that a list partition is selected based on a value in a set of discrete values that match a column value.
Hash partition: A partition that is selected based on the return value of a user-defined expression that is evaluated using the column values of those rows that will be inserted into the table. This function can contain any expression that is valid in MySQL that produces a non-negative integer value.
Hash is used in cases where the data is relatively random. It is based on the contents of the table hash operation after the random average distribution, assuming this column is gender, it is not suitable for hash partition, because the content is either male, or female, no randomness.
Key partitioning: Similar to partitioning by hash, the difference is that the key partition only supports the calculation of one or more columns, and the MySQL server provides its own hash function. You must have one or more columns that contain integer values. ----seldom used
How do I see if my database supports partitioning technology? Create partition:mysql> CREATE TABLE t1 (id int) partition by hash (ID) partitions 3; Query OK, 0 rows affected (0.03 sec) "experiment" creates a partitioned table and a non-partitioned table, creating a partition table for performance testing mysql> CREATE TABLE Part_tab (c1 int default NU LL, C2 varchar (+) default NULL, C3 date default null) Engine=myisam
-Partition by Range (year (C3)) (
-partition P0 values less than (1995),
-partition P1 values less than (1996),
-partition P2 values less than (1997),
-Partition P3 values less than (1998),
-partition P4 values less than (1999),
-partition P5 values less than (2000),
-Partition P6 values less than (2001),
-partition P7 values less than (2002),
-Partition P8 values less than (2003),
-Partition P9 values less than (2004),
-Partition P10 values less than (2010),
-partition P11 values less than MAXVALUE);
Query OK, 0 rows affected (0.14 sec)
Create a non-partitioned table
Mysql> CREATE TABLE No_part_tab (c1 int default NULL, c2 varchar () default NULL, C3 date default null) Engine=myisam ;
Query OK, 0 rows affected (0.11 sec)
Mysql> \d//#由于下面要用到存储过程, here you need to modify the Terminator to "//". The so-called stored procedure is actually a collection of many SQL statements.
Mysql> CREATE PROCEDURE Load_part_tab ()
Begin
-Declare v int default 0;
While v < 8000000
--Do
INSERT INTO Part_tab
Values (V, ' testing partitions ', adddate (' 1995-01-01 ', (rand (v) *36520) mod 3652));
Set v = v+1;
and end while;
-End
//
Query OK, 0 rows affected (0.04 sec)
Mysql> \d; After executing this stored procedure, you need to modify the Terminator back
The above stored procedure is actually to create a large amount of data (8 million article)
Mysql> call Load_part_tab (); Call Load_part_tab This stored procedure
Query OK, 1 row affected (9 min 18.95 sec)
Quickly insert the data inside the Part_tab into the No_part_tab
mysql> Insert No_part_tab SELECT * from Part_tab;
Query OK, 8000000 rows Affected (8.97 sec)
records:8000000 duplicates:0 warnings:0
Test One:
Before the experiment, make sure that the data in the two tables is consistent! To ensure the comparability of experiments
Mysql> Select COUNT (*) from Part_tab where C3 > Date ' 1995-01-01 ' and C3 < date ' 1995-12-31 ';
+----------+
| COUNT (*) |
+----------+
| 795181 |
+----------+
1 row in Set (0.49 sec)
Mysql> Select COUNT (*) from No_part_tab where C3 > Date ' 1995-01-01 ' and C3 < date ' 1995-12-31 ';
+----------+
| COUNT (*) |
+----------+
| 795181 |
+----------+
1 row in Set (3.94 sec)
Mysql> DESC SELECT COUNT (*) from Part_tab where C3 > Date ' 1995-01-01 ' and C3 < date ' 1995-12-31 ' \g;
1. Row ***************************
Id:1
Select_type:simple
Table:part_tab
Type:all//Full table scan
Possible_keys:null
Key:null
Key_len:null
Ref:null
rows:798458
Extra:using where
1 row in Set (0.09 sec)
ERROR:
No query specified
Mysql> DESC SELECT COUNT (*) from No_part_tab where C3 > Date ' 1995-01-01 ' and C3 < date ' 1995-12-31 ' \g;
1. Row ***************************
Id:1
Select_type:simple
Table:no_part_tab
Type:all
Possible_keys:null
Key:null
Key_len:null
Ref:null
rows:8000000
Extra:using where
1 row in Set (0.00 sec)
ERROR:
No query specified
Conclusion: It can be seen that the partition, only need to scan 790,000 statements, and do not partition, you need to do a full table scan, it can be seen that the partitioning technology, can improve read and write efficiency.
Test 2:
Create an index to view statement execution
Mysql> CREATE index IDX_C3 on No_part_tab (C3);
Query OK, 8000000 rows Affected (32.68 sec)
records:8000000 duplicates:0 warnings:0
Results Analysis:
Mysql> DESC SELECT COUNT (*) from No_part_tab where C3 > Date ' 1995-01-01 ' and C3 < date ' 1995-12-31 ' \g;
1. Row ***************************
Id:1
Select_type:simple
Table:no_part_tab
Type:range
Possible_keys:idx_c3
Key:idx_c3
Key_len:4
Ref:null
rows:785678
Extra:using where; Using Index
1 row in Set (0.16 sec)
ERROR:
No query specified
conclusion: after an index is created for an unpartitioned table, the same statement is executed again, and you can see that the SQL statement was retrieved from the range index instead of the full table scan. Obviously, the efficiency has also improved.
Test 3:
Test indexing and non-indexed read and write efficiency.
Mysql> CREATE index IDX_C3 on Part_tab (C3);
Query OK, 8000000 rows Affected (31.85 sec)
records:8000000 duplicates:0 warnings:0
Mysql> DESC SELECT COUNT (*) from Part_tab where C3 > Date ' 1995-01-01 ' and C3 < date ' 1995-12-31 ' \g;
1. Row ***************************
Id:1
Select_type:simple
Table:part_tab
Type:index
Possible_keys:idx_c3
Key:idx_c3
Key_len:4
Ref:null
rows:798458
Extra:using where; Using Index
1 row in Set (0.14 sec)
ERROR:
No query specified
Test does not create an indexed field
Mysql> Select COUNT (*) from No_part_tab where C3 > Date ' 1995-01-01 ' and C3 < Date ' 1995-12-31 ' and c2= ' hello ';
+----------+
| COUNT (*) |
+----------+
| 0 |
+----------+
1 row in Set (4.90 sec)
Conclusion:You can see that the time it takes to retrieve without indexing is longer than retrieving it from the index.
Test 4: Delete
Mysql> Delete from Part_tab where C3 > Date ' 1995-01-01 ' and C3 < date ' 1995-12-31 ';
Query OK, 795181 rows Affected (14.02 sec)
Mysql> Delete from No_part_tab where C3 > Date ' 1995-01-01 ' and C3 < date ' 1995-12-31 ';
Query OK, 795181 rows Affected (15.21 sec)
Conclusion:You can see that, in terms of deletion, there is a little bit faster than partitioning. Thus, it embodies the characteristics of easy data management.
Convenient data management This, I use the following example to illustrate: for example, the database table T1 Records of the year (12 months) The company's turnover, in the case of non-partitioned, that is, the data files are stored in the same file, then if you want to delete the first quarter of records, Then a full table scan is required to produce the results. But if T1 this table in advance partition, then I only need to delete the three-in-one three files. So, to a certain extent, it is convenient to manage.
MySQL database partition (i) (EXT)