MySQL partition table technical analysis _ MySQL

Source: Internet
Author: User
MySQL partition table technology analysis bitsCN.com

MySQL partition overview:
Allows you to allocate multiple parts of a single table across file systems based on rules that can be set to any size. In fact, different parts of a table are stored as separate tables in different locations. The user-selected data segmentation rule is called a partition function. in MySQL, it can be a modulus or a simple match for a continuous value range or value list, or an internal HASH function or a linear HASH function. The function selects the value of the expression provided by the user based on the partition type specified by the user. This expression can be an integer column value, or a function that acts on one or more column values and returns an integer. [Z1]. The value of this expression is passed to the partition function. the partition function returns a sequence number indicating the partition in which the specific record should be stored. This function cannot be a constant or any number. It cannot contain any queries, but can actually use any available SQL expressions in MySQL, as long as the expression returns a positive value smaller than MAXVALUE (the largest possible positive integer.
Because MySQL does not have the global index concept and only has a local partition index, for this reason, if a table has two or more unique indexes, this table cannot be partitioned. The columns in the database of the partition function application must be the primary key of MySQL; otherwise, the database cannot be partitioned. [Z2]
For tables that have created partitions, you can use any storage engine supported by your MySQL server. In MySQL 5.1, all partitions in the same partition table must use the same storage engine. for example, you cannot use MyISAM for one partition, but InnoDB for the other. However, this does not prevent using different storage engines for different partition tables on the same MySQL server or even in the same database.

Create a MySQL partition
MySQL can create four partition types:
RANGE partition: multiple rows are allocated to the partition based on the column values in a given continuous interval. For more information, see section 18.2.1 "RANGE partition ".
· LIST partition: Similar to partitioning by RANGE, the difference is that LIST partitions are selected based on column values matching a value in a discrete value set. For more information, see section 18.2.2 "LIST partitions ".
· HASH partition: Select a partition based on the return value of a user-defined expression. this expression uses the column values of the rows to be inserted into the table for calculation. This function can contain any expressions that are valid in MySQL and generate non-negative integer values. For more information, see Section 18.2.3 "HASH partition ".
· KEY partitioning: Similar to HASH partitioning, the difference is that KEY partitioning only supports the calculation of one or more columns, and the MySQL server provides its own HASH function. One or more columns must contain integer values. For details, see 18.2.4. KEY partition.
Subpartition: subpartition refers to the re-division of each partition in the partition table. For the writing format, see 18.2.5. subpartition.
(1) Note: Each partition must have the same number of subpartitions.
· (2) if SUBPARTITION is used to define any sub-partition on any partition in a partition table, all sub-partitions must be defined.
When creating a partition, you can specify the data storage location and index location of the partition, so that different data can be stored across disks or file systems. Data can be stored on disks to increase the data reading speed to a certain extent, because I/O operations on each disk are reduced when multiple disks are used. In addition, the storage space can be increased by using the specified partition storage location.

No matter what type of partition is used, the partition is always automatically numbered at the time of creation, and records are recorded from 0. this is very important to remember. When a new row is inserted into a partition table, these partition numbers are used to identify the correct partitions. For example, if your table uses four partitions, these partitions are numbered 0, 1, 2, and 3. For the RANGE and LIST partition types, it is necessary to confirm that each partition number defines a partition. For HASH partitions, the user function must return an integer greater than 0. For KEY partitions, this problem is automatically handled through the hash function used inside the MySQL server. Note: partition names are case-insensitive. for RANGE partitions and LIST partitions, the partition names cannot be repeated. These types can be selected based on different requirements. the commonly used RANGE partition is used.

Common MySQL partition management:
RANGE and LIST partition management
Partitions are transparent to programs, and only deletion can be performed at the partition level. partitions cannot be specified for other operations, such as query, modification, or addition.

Alter table... DROPPARTITION .... (Delete partition)
Alter table... Add partition (PARTITION p3 valuesless (...)); [Z3] add partitions
Alter table... reorganize partition ...,... INTO (
PARTITION p0 values less (...)
); [Z4] merge and split partitions.

HASH and KEY partition management
You can add partitions in the same way as RANGE and LIST partitions. you cannot delete partitions in the same way as you delete partitions from tables partitioned by RANGE or LIST, to delete partitions from a HASH or KEY partition table. However, you can use the "ALTERTABLE... coalesce partition" command to merge HASH or KEY partitions.

To view partition information, you can use SQL statements to query
SELECT * FROM INFORMATION_SCHEMA.partitions WHERETABLE_SCHEMA = schema () AND TABLE_NAME = 'XXX'

Partition Table efficiency comparison


MySQL partition table experiment
The partition is in red and the partition is not in blue.

Test Environment: CentOS virtual machine, 1 GB memory, 20 GB hard drive
Lab database: test is not partitioned (there is one table in RPT_MALEVENTS), test2 (same as test)
Background Data:
Mysql> select count (*) FROM RPT_MALEVENTS;
+ ---------- +
| COUNT (*) |
+ ---------- +
| 1, 17082107 |
+ ---------- +
1 row in set (10.84 sec)

Mysql> SELECTCOUNT (*) FROM RPT_MALEVENTS;
+ ---------- +
| COUNT (*) |
+ ---------- +
| 1, 17082107 |
+ ---------- +
1 row in set (14.63sec)

Data Distribution: 2011/8/4 ~ 2011/8/17

Partition table structure:
CREATETABLE 'rpt _ MALEVENTS '(
'Record _ date' date not null,
'Record _ hour' tinyint (2) not null,
'Record _ MINUTE 'tinyint (2) not null,
'Record _ datetime' datetime not null,
'Mc _ IP' int (10) unsigned not null,
'Pc_ip' int (10) unsigned not null,
'Netobject _ GROUP_ID 'smallint (5) DEFAULTNULL,
'Alert _ type' tinyint (3) not null,
'Sub _ type' smallint (5) not null,
'Show _ type' smallint (5) not null,
'Alert _ id' tinyint (3) not null,
'Event _ count' int (10) unsigned default null,
Primary key ('record _ date', 'record _ HOUR ', 'record _ MINUTE', 'MC _ IP', 'PC _ IP', 'alert _ type ', 'Sub _ type', 'alert _ id '),
KEY 'record _ datetime' ('record _ datetime ')
) ENGINE = InnoDB default charset = utf8 COLLATE = utf8_unicode_ci /*! 50100 partitionby range (TO_DAYS (RECORD_DATE) [z1]) (PARTITION p2011 values less than (734503) ENGINE = InnoDB, PARTITION p20110809 values less than (734724) ENGINE = InnoDB, PARTITION p20151110 values less than (734725) ENGINE = InnoDB, partitionp20151111values less than (734726) ENGINE = InnoDB, PARTITION p20151112valuesless THAN (734727) ENGINE = InnoDB, PARTITION p20151113 values less than (734728) ENGINE = InnoDB, PARTITION p20151120.14 values less than (734729) ENGINE = InnoDB, PARTITION p20151120.15 values less than (734730) ENGINE = InnoDB, PARTITION p201%16 values less than (734731) ENGINE = InnoDB, partitionp201%17 values less than (734732) ENGINE = InnoDB, PARTITION p201%18 valuesless than (734733) ENGINE = InnoDB, PARTITION pMax values less than maxvalue [z2] ENGINE = InnoDB)



The physical storage of partitioned tables is as follows. Currently, innodB storage engine is used and the table sharding structure is used.
The analysis is as follows:
(Query all data by condition)
Mysql> SELECTCOUNT (*) FROM RPT_MALEVENTS WHERE RECORD_DATE> '2017-08-01 'AND RECORD_DATE <'2017-08-19 ';
+ ---------- +
| COUNT (*) |
+ ---------- +
| 1, 17082107 |
+ ---------- +
1 row in set (21.62sec)

Mysql> SELECTCOUNT (*) FROM RPT_MALEVENTS WHERE RECORD_DATE> '2017-08-01 'AND RECORD_DATE <'2017-08-19 ';
+ ---------- +
| COUNT (*) |
+ ---------- +
| 1, 17082107 |
+ ---------- +
1 row in set (29.20sec)

(Query part of the data, excluding the columns used by the partition function)
Mysql> SELECTCOUNT (*) FROM RPT_MALEVENTS WHERE RECORD_DATETIME> '2017-08-02 'ANDRECORD_DATETIME <'2017-08-11 ';
+ ---------- +
| COUNT (*) |
+ ---------- +
| 1, 5083194 |
+ ---------- +
1 row in set (2.83sec)

Mysql> SELECTCOUNT (*) FROM RPT_MALEVENTS WHERE RECORD_DATETIME> '2017-08-02 'AND RECORD_DATETIME <'2017-08-11 ';
+ ---------- +
| COUNT (*) |
+ ---------- +
| 1, 5083194 |
+ ---------- +
1 row in set (5.60sec)

(Use other conditions to query some data)
Mysql> SELECTCOUNT (*) FROM RPT_MALEVENTS WHERE ALERT_TYPE = 1;
+ ---------- +
| COUNT (*) |
+ ---------- +
| 1, 88739 |
+ ---------- +
1 row in set (8.49sec)

Select count (*) FROMRPT_MALEVENTS WHERE ALERT_TYPE = 1;
+ ---------- +
| COUNT (*) |
+ ---------- +
| 1, 88739 |
+ ---------- +
1 row in set (12.88sec)


(Small-scale query, within a partition)

Mysql> SELECTCOUNT (*) FROM RPT_MALEVENTS WHERE RECORD_DATE> '2017-08-13 'AND RECORD_DATE <'2017-08-15 ';
+ ---------- +
| COUNT (*) |
+ ---------- +
| 1, 2116249 |
+ ---------- +
1 row in set (1.85sec)


Mysql> SELECTCOUNT (*) FROM RPT_MALEVENTS WHERE RECORD_DATE> '2017-08-13 'AND RECORD_DATE <'2017-08-15 ';
+ ---------- +
| COUNT (*) |
+ ---------- +
| 1, 2116249 |
+ ---------- +
1 row in set (3.10sec)


Analyze the execution process of SQL statements
Rows indicates that MySQL estimates the number of rows to be read based on the table Statistics and index selection.


Mysql> explain partitions select * FROMRPT_MALEVENTS WHERE RECORD_DATETIME> '2017-08-12 'AND RECORD_DATETIME <'2017-08-13' LIMIT 1/G;
* *************************** 1. row ***************************
Id: 1
Select_type: SIMPLE
Table: RPT_MALEVENTS
Partitions: p2011, p20110809, p20151110, p20151111, p201201712, p201511201713, p20151114, p201201715, p2011 [z3] 0816, p201201717, p201201718, pMax
Type: range
Possible_keys: RECORD_DATETIME
Key: RECORD_DATETIME
Key_len: 8
Ref: NULL
Rows: 355911 [z4]
Extra: Using where
1 row in set (0.00sec)

Mysql> explain select * FROM RPT_MALEVENTS WHERERECORD_DATETIME> '2017-08-12 'AND RECORD_DATETIME <'2017-08-13' LIMIT1/G;
* *************************** 1. row ***************************
Id: 1
Select_type: SIMPLE
Table: RPT_MALEVENTS
Type: range
Possible_keys: RECORD_DATETIME
Key: RECORD_DATETIME
Key_len: 8
Ref: NULL
Rows: 1002288 [z5]
Extra: Using where
1 row in set (0.00sec)

Column-independent query conditions for partition functions

Mysql> explain partitions select count (*) FROMRPT_MALEVENTS WHERE ALERT_TYPE = 1/G;
* *************************** 1. row ***************************
Id: 1
Select_type: SIMPLE
Table: RPT_MALEVENTS
Partitions: p2011, p20110809, p20151110, p20151111, p201201712, p201511201713, p20151114, p201201715, p201201716, p201201717, p201201718, pMax [z6]
Type: index
Possible_keys: NULL
Key: RECORD_DATETIME
Key_len: 8
Ref: NULL
Rows: 17084274 [z7]
Extra: Using where; Using index
1 row in set (0.00sec)

Mysql> explainselect count (*) FROM RPT_MALEVENTS WHERE ALERT_TYPE = 1/G;
* *************************** 1. row ***************************
Id: 1
Select_type: SIMPLE
Table: RPT_MALEVENTS
Type: index
Possible_keys: NULL
Key: RECORD_DATETIME
Key_len: 8
Ref: NULL
Rows: 17082459
Extra: Using where; Using index
1 row in set (0.00sec)

Columns used by partitioning functions
Mysql> explainpartitions select count (*) FROM RPT_MALEVENTS WHERE RECORD_DATE> '2017-08-09 'AND RECORD_DATE <'2017-08-15'/G;
* *************************** 1. row ***************************
Id: 1
Select_type: SIMPLE
Table: RPT_MALEVENTS
Partitions: p20151110, p20151111, p20151112, p20151113, p20151114, p20151115[ z8]
Type: range
Possible_keys: PRIMARY
Key: PRIMARY
Key_len: 3
Ref: NULL
Rows: 3767081 [z9]
Extra: Using where; Using index
1 row in set (0.08sec)

Mysql> explainpartitions select count (*) FROM RPT_MALEVENTS WHERE RECORD_DATE> '2017-08-09 'AND RECORD_DATE <'2017-08-15'/G;
* *************************** 1. row ***************************
Id: 1
Select_type: SIMPLE
Table: RPT_MALEVENTS
Partitions: NULL
Type: range
Possible_keys: PRIMARY
Key: PRIMARY
Key_len: 3
Ref: NULL
Rows: 8541229 [z10]
Extra: Using where; Using index
1 row in set (0.00sec)


Delete data. if you delete data for one whole day, because we use daily partitioning,

Mysql> ALTER TABLERPT_MALEVENTS drop partition p20110809; [z11]
Query OK, 0 rowsaffected (0.65 sec)
Records: 0 Duplicates: 0 Warnings: 0


After deletion, RPT_MALEVENTS # P # p20110809.ibd containing the index and data is deleted.

If you use the traditional non-Partition method to delete.
Mysql> DELETE FROMRPT_MALEVENTS WHERE RECORD_DATE <'2017-08-10 ';
Query OK, 3929328 rows affected (1 min 29.68 sec)

It can be seen that deleting data in the entire partition is still very fast,

If the partition table is deleted in the traditional way:

Mysql> DELETEFROM RPT_MALEVENTS WHERE RECORD_DATE <'2017-08-11 ';
Query OK, 1153866 rows affected (19.72 sec)

Mysql> DELETE FROMRPT_MALEVENTS WHERE RECORD_DATE <'2017-08-11 ';
Query OK, 1153866 rows affected (18.75 sec)


The traditional method is used to delete data of one day, which takes almost the same time.


After only the data is deleted, the data partition configuration p20151110 remains unchanged. You can use alter table t1 optimize partition for recovery, but MySQL5.1.22 has not yet been implemented.


Delete across partitions.
DELETE FROMRPT_MALEVENTS WHERE ALERT_TYPE = 1;
Query OK, 63969 rowsaffected (55.20 sec)

DELETE FROMRPT_MALEVENTS WHERE ALERT_TYPE = 1;
Query OK, 63969 rowsaffected (50.26 sec)

Partition Table deletion is slightly slower than non-partition deletion.
[Z1] partition functions
[Z2] Partition information, starting from
[Z3] columns not used by partition functions scan all partitions.
[Z4] the data volume is 681311, and the number of rows scanned after partitioning is 355911. Although the query condition does not have a partition function column, the mysql Query optimizer maps it to a time partition, this reduces the number of scanned rows.
[Z5] the data volume is 681311, and the number of scanned rows after partitioning is 1002288
[Z6] search for all partitions
[Z7] unrelated partition function fields traverse almost all rows.
[Z8] scan some partitions
[Z9] the number of scanned rows decreases accordingly.
[Z10] Estimated number of scanned rows
[Z11] the data in this partition is all data before, 3929328 in total.


Summary:
Partition table is a new feature in MySQL5.1. as of MySQL5.1.22-rc, partition technology is not very mature and many partition maintenance and management functions are not implemented. For example, the data storage space in a partition is recycled, the partition is repaired, and the partition is optimized. MySQL partitions can be used in tables that can be deleted by partition, and the database is not modified much, tables that frequently query by partition field (for example, statistical tables in malicious code are partitioned by day, and are often queried and grouped by time, and partitions can be deleted by day ). In addition, because MySQL has no global index but only a partition index, when one table has two unique indexes [z5], this table cannot be partitioned. the partition column must contain the primary key. Otherwise, MySQL reports an error.
In short, MySQL imposes many restrictions on partitions, and I personally think that the hash and key partitions are not of great significance.

Partition introduces a new method for optimizing queries (of course, there are also corresponding shortcomings ). The optimizer can use the partition function to trim partitions or completely remove partitions from the query. It determines whether data can be found in a specific partition to achieve this optimization. Therefore, in the best case, trimming allows queries to access less data. It is important to define the partition key in the WHERE clause, even if it looks redundant. Through the partition key, the optimizer can remove unnecessary partitions. Otherwise, the execution engine will access all partitions of the table as it does in the merged table, which will be very slow on the large table. Partition data is better maintained than non-partition data, and old data can be removed by deleting partitions. Partition data can be distributed to different physical locations, so that the server can more effectively use multiple hard drive.
[Z1] the return value of a partition function must be an integer. The return value of a new partition function must be greater than that of any existing partition function.
[Z2] error message for tables with primary keys: #1503
A primary key must include all columns inthe table's partitioning function. if there is no primary key, there is no such constraint.
[Z3] Note: For tables partitioned by RANGE, you can only use add partition to ADD new partitions to the high-end PARTITION list. That is, you cannot add a partition with a smaller range than this partition.

[Z4] for tables partitioned by RANGE, adjacent partitions can only be reorganized. RANGE partitions cannot be skipped. You cannot use REORGANIZEPARTITION to change the table's partition type. that is to say, for example, you cannot change the RANGE partition to a HASH partition, and vice versa. You cannot use this command to change the partition expression or column.
[Z5] note the differences between primary keys and unique indexes.


Author: "Shen Xiang Ming Dynasty selling apricot flowers"

BitsCN.com

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.