MySQL Partitioning technology

Source: Internet
Author: User

1. Background information
when the total number of records in a table in MySQL exceeds 10 million, will there be a significant decrease in performance? The answer is yes, but the rate of performance degradation varies, depending on the architecture of the system, the application, and even the various factors, such as indexing, server hardware, and so on. For example, FCDB and sfdb in the keywords, up to hundreds of millions of data, a single table after the table has already broken through tens of millions of data, resulting in a single table update, etc. affect the system's operational efficiency. Even a simple SQL can overwhelm the entire database, such as sorting operations on a field across the table. At present, there are 2 methods for the optimization of massive data: the way of large table splitting table and the optimization of SQL statement. The optimization of SQL statements can be adjusted by increasing indexes, but increasing the amount of data will result in an increase in the maintenance cost of the index. Not detailed here, we recommend that you refer to the corresponding high performance MySQL and other books. In addition, there are two main ways to remove small tables in large tables:
Vertical Sub-table:

Figure 1: Vertical Partitioning
for a vertical sub-table, it splits the Table tab of a N1+N2 field into a child table of the N1 field's TAB1 and (n2+1) field Tab2, where the child table TAB2 contains the primary key information about the TAB1 of the child table, otherwise the association of the two tables is lost. Of course, the vertical sub-table will bring the terminal SQL modification, if the application has been applied for a long time, then the program upgrade will be time-consuming and error-prone, that is, the cost of the upgrade will be very large.
Horizontal sub-table:

Figure 2: Horizontal Partitioning
The horizontal partitioning technique splits a table into multiple tables, and the most common way is to split the records in a table into a hash algorithm, a simple split method such as the mode of modulo. Similarly, this partitioning method must be modified for SQL in the front-end application to be used. And for a SQL, it may modify two tables, then you have to write 2 SQL statements so that you can complete a logical transaction, so that the logic of the process is more and more complex, which will also lead to the maintenance cost of the program, also lost the advantage of using the database. Therefore, the partitioning technology can effectively avoid the above drawbacks and become a powerful method to solve the massive data storage.2. mysql Partition introduction
MySQL's partitioning technique differs from the previous sub-table technique, which is somewhat similar to a horizontal table, but it is a horizontal table at the logical level, which is still a table for the application.
2.1 MySQL partition type
the MySQL5.1 has 5 partition types:
RANGE Partition: Allocates multiple rows to a partition based on the column values belonging to a given continuous interval;
list partition: Similar to by range partitioning, the difference is that a list partition is selected based on a value in a set of discrete values matching a column value;
Hash Partition: A partition that is selected based on the return value of a user-defined expression that is evaluated using the column values of those rows that will be inserted into the table. This function can contain any expression that is valid in MySQL that produces a non-negative integer value.
key partitioning: Similar to partitioning by hash, the difference is that the KEY partition only supports the calculation of one or more columns, and the MySQL server provides its own hash function.
2.2 Range Partition
for a range partition, for example:
Example 1. Suppose you create a table of the following, which holds the staff records of 20 video stores, which are numbered from 1 to 20 in the 20 video stores. If you want to divide it into 4 small partitions, you can use the range partition to create a database table as follows:

This example, its key is an integer type of data, is it not for other types of fields can not be a key? The answer is no, and example 2 illustrates this situation.
Example 2. Suppose you create a table of the following, which holds the staff records of 20 video stores, which are numbered from 1 to 20 in the 20 video stores. If you want to store employees who have separated from each other for different periods of time, you can use the Date field separated (that is, the separation time) as a key and create the following SQL statement:

This allows you to call the MySQL date function on a field of a date type to convert year () to an integer type, which can be used as the key for the range partition. This time, you can see that the physical files following the partitioning are relatively independent:

It is known that each partition has its own independent data files and index files, which is why you have a query, it will only access the data it needs to access the block, without accessing the physical block is not the result at all, which can greatly improve the efficiency of the system.2.3 List Partition
The list partition has a similar place to the range partition, with examples similar to Example 1 as follows:
Example 3. Suppose you create a table of the following, which holds the staff records of 20 video stores, which are numbered from 1 to 20 in the 20 video stores. And these 20 video stores, distributed in 4 distribution areas, as shown in the following table:
Region Store ID number
Bei District 3, 5, 6, 9,
East 1, 2, ten , one, A,
West Side 4 , +--
Central District 7, 8 ,
then you can create a data table with the following list partition statement:

again, it identifies different partitions on the physical file:

2.4 Hash Partition
hash partitioning is primarily used to ensure that data is evenly distributed over a predetermined number of partitions. It can make a selection based on the return value of a user-defined expression, which is calculated using the column values of those rows that will be inserted into the table.
Example 4. Suppose you create a table of the following, which holds the staff records of 20 video stores, which are numbered from 1 to 20 in the 20 video stores. If you want to store the employees in different periods, then you can use the Date field hired (that is, the separation time) as a key and create the following SQL statement:

then to insert a ' 2005-09-15 ' employee E1, follow the modulo function to place it in the 2nd partition:

MoD (Year (' 2005-09-01 '), 4) = mod (2005,4) = 1//i.e. 2nd partition

2.5 Key Partition

similar to a hash partition, but its key can be not an integer type, such as a field of type string. The MySQL cluster (Cluster) uses the function MD5 () to implement the key partition, and for tables using other storage engines, the server uses its own internal hash function, which is based on the same algorithms as PASSWORD ().
2.6 Comparison of different partitioning techniques
each of the different partitioning techniques is listed above, followed by a comparison, as shown in the following table:
Partitioning type Advantages Disadvantages Common
Range is suitable for use with date types, partitions with limited support for composite partitions are typically only for a single column
The list is suitable for columns with fixed values, supports partitions with a limited number of composite partitions, inserts records in the value of this column is not worth list, data loss is generally only for a column
Hash linear hash makes adding, deleting, and merging partitions more efficient linear hash data distribution is uneven, and the general Hash of the data distribution is generally only for a certain column
The Key column can be less efficient than other non-int types, such as character type, because functions that are complex functions (such as. MD5 or SHA functions) are generally only for a column
3. Case Studies
the case is for simulation data with an employee, Department, department manager, title, and sales record, and the ER diagram looks like this, with a data volume of about 4 million. Data download Url:https://launchpad.net/test-db


as can be seen above, for the same data by partitioning and non-partitioning technology, respectively, so as to facilitate the following query performance analysis and comparison. For the salaries table, it uses a range partition, which is defined as follows:

3.1 Single-table query
The number of sales records from the sales record for the year 1999 is very simple, and the query statement is as follows:
Select COUNT (*) from salaries s where s.from_date between "1999-01-01" and "1999-12-31";
there is a big difference in query performance before and after partitioning:

by using this, it is only necessary to scan the p16 partition after partitioning, and the access records are significantly reduced, so the performance naturally has a greater increase:

zoning Technology is adopted without zoning technology
3.2 Single-table query-bad case
If you now have the following query:
Select COUNT (*) from salaries s where year (s.from_date) =1999;
so is it possible to use partitioning technology, and the answer is no. Why, because the key in the partition is S.from_date, not year (s.from_date), MySQL is not very intelligent to judge that year is 1999, then it is divided into p16 partition, this can be confirmed by the following query plan:

That is, it accesses all the partitions, so it does not make good use of the partitioning function, and the SQL is rewritten as follows:
Select COUNT (*) from salaries s where year between ' 1999-01-01 ' and ' 1999-12-31 ';
the query plan is as follows:

It is clear that writing the correct SQL can completely show two different performance.

3.3 Connection Query
Similarly, for connection queries, there will be a gap of about 3 times times the performance under the condition that there are no partitions. For larger data volumes, there may be a greater performance gap. SQL is as follows:
Select COUNT (*) from salaries S left JOIN employees E on S.emp_no=e.emp_no where s.from_date between ' 1999-01-01 ' and ' 1999-12-31 ';

Partitioning is not used for partitioning
3.4 Delete a query
in order to delete sales data for 1998, it is possible to quickly clean up garbage data without using delete queries in the case of partitioning.

It is known that for the case of a partition, only need to delete a partition, time is only 0.05s, relative to the original 2.82s, this promotion is very high. Of course, after data deletion with partitioning, the data files are as follows:

so next, if you insert data for 1998 years, is the data missing? Or will it not be written in? The answer is also negation, which writes data to the P16 partition. Interested readers can receive their own trial.
4. Summary and deficiencies
so there are a number of benefits to partitioning:
1. More data can be stored than a single disk or file system partition
2. For data that has lost its meaning, it is often possible to delete those data by deleting the partitions associated with those data.
3. Some queries can be greatly optimized, such as where statement data can be guaranteed to exist in only one or more partitions
4. Queries involving aggregate functions such as SUM () and COUNT () can be processed easily in parallel
5. Gain greater query throughput by dispersing data queries across multiple disks
There are a number of factors to consider during the design of the partitioning process, such as:
– Columns for partitions
– Functions used by the partition, especially columns of a non-integer type
– Server Performance
– Memory Size
depending on the partitioning technique, there are some tricks:
– If the size of the index > RAM, consider partitioning, do not use the index
– Try not to use primary key to partition key
– Consider using the archive storage engine when CPU performance is high
– for a large number of historical data, consider using Archive+partition
– in short,
MySQL partition technology is a kind of logic horizontal table technology;
it only accesses the partitions that need access, thereby improving performance;
supports range, hash, key, list and compound partitioning methods;
support any storage engine supported by MySQL server;
In addition to the key partitioning method, the partition key must be an integer (or can be converted to an integer).

MySQL Partitioning technology

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.