MySQL Partitioning technology is used to reduce the burden of large amounts of data, to solve the problem of database performance degradation of one way, and other ways to build indexes, large tables, such as small table.
MySQL partitions are divided into a range partition, a list partition, a hash partition, and a key partition according to the partitioning reference method. This paper gives a detailed introduction to these kinds of partitioning methods, and give a simple example, the article concise and clear, for the students who want to understand the MySQL partition technology is a very good reference material.
First, Background introduction
When the total number of records in a table in MySQL exceeds 10 million, will there be a significant decrease in performance? The answer is yes. However, the rate of performance degradation is determined by many factors, such as architecture, application, database index, server hardware, and so on. Database up to hundreds of billions of data, the table after the single table has already broken through tens of millions, then the single table updates and so on affect the system's operational efficiency. Even a simple SQL can overwhelm the entire database, such as sorting operations on a field across the table.
At present, there are 2 methods for the optimization of massive data: the way of large table splitting table andtheoptimization of SQL statement .
SQL statement optimization: can be adjusted by increasing the index, but the increase in the amount of data will result in an increase in the maintenance cost of the index. Not detailed here, we recommend that you refer to the corresponding "high performance MySQL" and other books.
There are two main ways to remove small tables in large tables:
1, Vertical sub-table:
Figure 1, Vertical partitioning
For a vertical sub-table, it splits the Table tab of a N1+N2 field into a child table of the N1 field's TAB1 and (n2+1) field Tab2, where the child table TAB2 contains the primary key information about the TAB1 of the child table, otherwise the association of the two tables is lost. Of course, the vertical sub-table will bring the terminal SQL modification, if the application has been applied for a long time, then the program upgrade will be time-consuming and error-prone, that is, the cost of the upgrade will be very large.
2, Horizontal sub-table:
Figure 2, Horizontal partitioning
The horizontal partitioning technique splits a table into multiple tables, and the most common way is to split the records in a table into a hash algorithm, a simple split method such as the mode of modulo. Similarly, this partitioning method must be modified for SQL in the front-end application to be used. And for a SQL, it may modify two tables, then you have to write 2 SQL statements so that you can complete a logical transaction, so that the logic of the process is more and more complex, which will also lead to the maintenance cost of the program, also lost the advantage of using the database. Therefore, the partitioning technology can effectively avoid the above drawbacks and become a powerful method to solve the massive data storage.
Second, MySQL partition introduction
MySQL's partitioning technique differs from the previous sub-table technique, which is somewhat similar to a horizontal table, but it is a horizontal table at the logical level, which is still a table for the application. MySQL5.1 has 5 types of partitions:
Range Partition: Allocates multiple rows to a partition based on the column values belonging to a given continuous interval;
List partition: Similar to by range partitioning, the difference is that a list partition is selected based on a value in a set of discrete values matching a column value;
Hash partition: A partition that is selected based on the return value of a user-defined expression that is evaluated using the column values of those rows that will be inserted into the table. This function can contain any expression that is valid in MySQL that produces a non-negative integer value;
Key partitioning: Similar to partitioning by hash, the difference is that the key partition only supports the calculation of one or more columns, and the MySQL server provides its own hash function.
1,range partition
For a range partition, for example:
Example 1: Suppose you create a table of the following, which holds the staff records of 20 video stores, which are numbered from 1 to 20 in the 20 video stores. If you want to divide it into 4 small partitions, you can use the range partition to create a database table as follows:
Figure 3,range Partition Instance 1
In this example, the key is an integer data, is it not for other types of fields can not be a key? The answer is no, and example 2 illustrates this situation.
Example 2: Suppose you create a table of the following, which holds the staff records of 20 video stores, which are numbered from 1 to 20 in the 20 video stores. If you want to store employees who have separated from each other for different periods of time, you can use the Date field separated (that is, the separation time) as a key and create the following SQL statement:
Figure 4,range Partition Instance 2
This allows you to call the MySQL date function on a field of a date type to convert year () to an integer type, which can be used as the key for the range partition. This time you can see that the partition after the physical file is relatively independent:
Figure 5,range Physical file after partitioning
It is known that each partition has its own independent data files and index files, which is why you have a query, it will only access the data it needs to access the block, without accessing the physical block is not the result at all, which can greatly improve the efficiency of the system.
2,list partition
The list partition has a similar place to the range partition, with examples similar to Example 1 as follows:
Example 3: Suppose you create a table of the following, which holds the staff records of 20 video stores, which are numbered from 1 to 20 in the 20 video stores. And these 20 video stores, distributed in 4 distribution areas, as shown in the following table:
Figure 6,list Partition Instance 3
Then you can create a data table with the following list partition statement:
Figure 7,list Partitioned Instance SQL statement
Again, it identifies different partitions on the physical file:
Figure 8,list Physical file after partitioning
3,hash partition
Hash partitioning is primarily used to ensure that data is evenly distributed over a predetermined number of partitions. It can make a selection based on the return value of a user-defined expression, which is calculated using the column values of those rows that will be inserted into the table.
Example 4: Suppose you create a table of the following, which holds the staff records of 20 video stores, which are numbered from 1 to 20 in the 20 video stores. If you want to store the employees in different periods, then you can use the Date field hired (that is, the separation time) as a key and create the following SQL statement:
Figure 9,hash Partition Instance 4
Then to insert an employee E1 in 2005-09-15, follow the modulo function to place it in the 2nd partition:
MoD (Year (' 2005-09-01 '), 4) = mod (2005,4) = 1//I.E. 2nd partition
4,key partition
Similar to a hash partition, but its key can be not an integer type, such as a field of type string. The MySQL cluster (Cluster) uses the function MD5 () to implement the key partition, and for tables using other storage engines, the server uses its own internal hash function, which is based on the same algorithms as password ().
5, Comparison of different partitioning techniques
Each of the different partitioning techniques is listed above, followed by a comparison, as shown in the following table:
Figure 10, comparison of different partitioning techniques
Third, case analysis
The case is for simulation data with an employee, Department, department manager, title, and sales record, and the ER diagram looks like this, with a data volume of about 4 million. Data Download Url:https://launchpad.net/test-db
Figure 11, Case study
As can be seen above, for the same data by partitioning and non-partitioning technology, respectively, so as to facilitate the following query performance analysis and comparison. For the salaries table, it uses a range partition, which is defined as follows:
Figure 12, Case study
1, single table query
The number of sales records from the sales record for the year 1999 is very simple, and the query statement is as follows:
Select COUNT (*) from salaries s where s.from_date between "1999-01-01" and "1999-12-31";
There is a big difference in query performance before and after partitioning:
Figure 13, Comparison of query performance before and after partitioning
By using this, it is only necessary to scan the p16 partition after partitioning, and the access records are significantly reduced, so the performance naturally has a greater increase:
Figure 14, no partitioning technique and performance comparison using partitioning technology
2, Badcase of single-table query
If you now have the following query:
Select COUNT (*) from salaries s where year (s.from_date) = 1999;
So is it possible to use partitioning technology, and the answer is no. Why, because the key in the partition is S.from_date, not year (s.from_date), MySQL is not very intelligent to judge that year is 1999, then it is divided into p16 partition, this can be confirmed by the following query plan:
Figure 15, a single-table query before optimization
That is, it accesses all the partitions, so it does not make good use of the partitioning function, and the SQL is rewritten as follows:
Select COUNT (*) from salaries s where year between ' 1999-01-01 ' and ' 1999-12-31 ';
The query plan is as follows:
Figure 16, improved single-table query
It is clear that writing the correct SQL can completely show two different performance.
3, Connection query
Similarly, for connection queries, there will be a gap of about 3 times times the performance under the condition that there are no partitions. For larger data volumes, there may be a greater performance gap. SQL is as follows:
Select COUNT (*) from salaries S left JOIN employees e Ons.emp_no=e.emp_no where s.from_date between ' 1999-01-01 ' and ' 1999 -12-31 ';
Figure 17, Comparison of performance with no partitioning and partitioning
4, delete query
In order to delete sales data for 1998, it is possible to quickly clean up garbage data without using delete queries in the case of partitioning.
Figure 18, removing query performance comparisons
It is known that for the case of a partition, only need to delete a partition, time is only 0.05s, relative to the original 2.82s, this promotion is very high. Of course, the following data file information is deleted after the partition function is used:
Figure 19, using the partition function to delete the file information
So next, if you insert data for 1998 years, is the data missing? Or will it not be written in? The answer is also negation, which writes data to the P16 partition. Interested readers can receive their own trial.
Iv. Summary and deficiencies
There are many benefits to partitioning:
1, more data can be stored than a single disk or file system partition;
2, for those who have lost the meaning of the data, it is usually possible to delete those data related to the partition, it is easy to remove those data;
3, some queries can be greatly optimized, such as where statement data can only be guaranteed in one or more partitions;
4, it involves the query of aggregate functions such as SUM () and count (), which can be easily processed in parallel;
5, to achieve greater query throughput by dispersing data queries across multiple disks.
There are a number of factors to consider during the design of the partitioning process, such as:
1, the column of the partition;
2, the function used by the partition, especially the non-integer type column;
3, server performance;
4, memory size.
Depending on the partitioning technique, there are some tricks:
1, if the size of the index > RAM, consider the selection of partitions, do not use the index;
2, try not to use primary key to do partition key;
3, when CPU performance is high, consider using archive storage engine;
4, for a large number of historical data, consider using Archive+partition.
Anyway
1,mysql Partitioning technology is a kind of logic horizontal table technology;
2, it only accesses the partition which needs to access, thus improves the performance;
3, support range, hash, key, list and compound partitioning method;
4, support any storage engine supported by MySQL server;
5, in addition to the key partitioning method, the partition key must be an integer (or can be converted to an integer).
Reference:
[1]http://baidutech.blog.51cto.com/4114344/748675
[2]http://baidutech.blog.51cto.com/4114344/748627
MySQL optimized _mysql partitioning technology [reprint]