High-performance MySql evolution (12): summary of the use of partition tables in Mysql

Source: Internet
Author: User
When the data volume is very large (the table capacity reaches GB or TB), if you still use the index method to optimize the query, due to the consumption of the index and the generation of a large number of index fragments, the query process will lead to a large number of random IO generation. In this scenario, unless the index can be well overwritten, the data needs to be returned according to the index during the Query Process.

When the data volume is very large (the table capacity reaches GB or TB), if you still use the index method to optimize the query, due to the consumption of the index and the generation of a large number of index fragments, the query process will lead to a large number of random I/O generation. In this scenario, unless the index can be well overwritten, the data needs to be returned according to the index during the Query Process.

When the data volume is very large (the table capacity reaches GB or TB), if you still use the index method to optimize the query, due to the consumption of the index and the generation of a large number of index fragments, the query process will lead to a large number of random I/O generation. In this scenario, unless the index can be well overwritten, the data table must be queried Based on the index, the query performance is greatly affected. In this case, you can use the Partition Table policy to improve the query performance.

Different database management systems may have different implementations for partitions. This article mainly uses MYSQL as the basis

1 partition type 1.1RANGE Partition

A table partitioned by RANGE is partitioned by the following method. Each partition contains rows whose partition expression values are located in a given continuous interval. These intervals must be continuous and do not overlap with each other, and are defined using the values less than operator.

Suppose you want to split the table based on the YEAR when each employee leaves the company, that is, the value of YEAR (separated. An example of the create table statement that implements this partition mode is as follows.

For example, you may decide to divide the table into four intervals BY adding a partition by range clause.

CREATE TABLE employees (    id INT NOT NULL,    fname VARCHAR(30),    lname VARCHAR(30),    hired DATE NOT NULL DEFAULT '1970-01-01',    separated DATE NOT NULL DEFAULT '9999-12-31',    job_code INT,    store_id INT)PARTITION BY RANGE (YEAR(separated)) (    PARTITION p0 VALUES LESS THAN (1991),    PARTITION p1 VALUES LESS THAN (1996),    PARTITION p2 VALUES LESS THAN (2001),    PARTITION p3 VALUES LESS THAN MAXVALUE);


After inserting some test data, we found that the P1 data file increased significantly.
mysql> DELIMITER $$mysql> DROP PROCEDURE IF EXISTS SampleProc$$Query OK, 0 rows affected (0.00 sec)mysql> CREATE PROCEDURE SampleProc()    -> BEGIN    ->  DECLARE x INT;    ->  SET x = 1000;    ->  WHILE x<= 2000 DO    ->          insert into employees(id,fname,lname,hired,separated,job_code,store_id) values(x,concat('firstname',x),concat('ai',x),'1994-01-01','1995-01-01',10,20);    ->          SET  x = x + 1;    ->  END WHILE;    -> END$$Query OK, 0 rows affected (0.00 sec)mysql> call SampleProc() $$Query OK, 1 row affected (22.55 sec)mysql> delimiter ;


RANGE partitioning is particularly useful in the following scenarios:

· To delete "old" data. If you use the PARTITION scheme provided in the latest example above, you only need to simply use "ALTER TABLEemployees drop partition p0; to delete all the rows corresponding to the employees who stopped working before January 1, 1991. (For more information, see section 13.1.2, "alter table Syntax" and section 18.3, "partition management "). For tables with a large number of rows, this is more effective than running a DELETE query such as "delete from employees where year (separated) <= 1990.

· You want to use a column that contains a date or time value, or contains a value that increases from some other levels.

· Regular operations depend directly on the query of columns used to split tables. For example, when a query such as "select count (*) FROM employees where year (separated) = 2000 GROUP BYstore_id;" is executed, mySQL can quickly determine that only partitions p2 need to be scanned, because the remaining partitions cannot contain any records that match the WHERE clause.

·Partition expressionIt can be any function or other expression that is valid in MySQL, as long as they return an integer that is neither a very number nor a random number.

1.2LIST Partition

LIST partitions in MySQL are similar to RANGE partitions in many aspects. Like partitioning by RANGE, each partition must be clearly defined. The main difference between them is that the definition and selection of each partition in LIST partitions are based on the value of a column from a value in a Value LIST set, the RANGE partition is a set of continuous RANGE values. LIST partitions (Expr).Expr"Is a column value or an expression based on a column value and returns an integer (Value_list) "To define each partition, where"Value_list"Is a list of integers separated by commas.

Assume that there are 20 audio stores distributed in 4 regions with sales permissions, as shown in the following table:

Region

Store ID

North District

3, 5, 6, 9, 17

Eastern District

1, 2, 10, 11, 19, 20

West District

4, 12, 13, 14, 18

Central Area

7, 8, 15, 16

You can use the following CREATETABLE statement to split a table by storing rows in the same partition of a store in the same region.

CREATE TABLE employees (    id INT NOT NULL,    fname VARCHAR(30),    lname VARCHAR(30),    hired DATE NOT NULL DEFAULT '1970-01-01',    separated DATE NOT NULL DEFAULT '9999-12-31',    job_code INT,    store_id INT)PARTITION BY LIST(store_id)    PARTITION pNorth VALUES IN (3,5,6,9,17),    PARTITION pEast VALUES IN (1,2,10,11,19,20),    PARTITION pWest VALUES IN (4,12,13,14,18),    PARTITION pCentral VALUES IN (7,8,15,16));


The related operations are similar to those of range partitions, but you need to pay attention to the following issues:

· In MySQL 5.1, when LIST partitions are used, only integer lists can be matched.

· If you try to INSERT a column value (or the return value of the partition expression) not in a row in the partition Value List, the "INSERT" query will fail and report an error. For example, if the LIST partition adopts the above scheme, the following query will fail:

· LIST partitions do not have definitions similar to "valuesless than maxvalue" that contain other values. Any value to be matched must be found in the Value List.

1.3 HASH Partition

HASH partitions are mainly used to ensure that data is evenly distributed in pre-defined partitions. In the RANGE and LIST partitions, you must specify the partition in which a given column value or column value set should be stored. In the HASH partition, MySQL automatically completes these tasks, all you need to do is specify a column value or expression based on the column value to be hashed, and specify the number of partitions to be split into in the partitioned table.

For example, the following statement creates a table that uses hash based on the "store_id" column. The table is divided into four partitions:

CREATE TABLE employees (    id INT NOT NULL,    fname VARCHAR(30),    lname VARCHAR(30),    hired DATE NOT NULL DEFAULT '1970-01-01',    separated DATE NOT NULL DEFAULT '9999-12-31',    job_code INT,    store_id INT)PARTITION BY HASH(store_id)PARTITIONS 4;


If no partition clause is included, the number of PARTITIONS is 1 by default.

1.4 KEY Partition

Partitioning by KEY is similar to partitioning by HASH. In addition to the User-Defined expression used by HASH partition, the HASH function of KEY partition is provided by the MySQL server. MySQL Cluster uses the MD5 () function to implement KEY partitioning. For tables using other storage engines, the server uses its own internal hash function, which is based on () the same algorithm.

The syntax for "CREATETABLE... partition by key" is similar to the rule for creating a table with HASH partitions. The only difference between them is that the KEY keyword is used instead of HASH, and the KEY partition uses only one or more column names.

It is also possible to split a table using a linear KEY. The following is a simple example:

CREATE TABLE tk (    col1 INT NOT NULL,    col2 CHAR(5),    col3 DATE) PARTITION BY LINEAR KEY (col1)PARTITIONS 3;


Using the keyword LINEAR in the KEY partition has the same effect as using the HASH partition. The number of the partition is obtained through the power of 2 (powers-of-two) algorithm, rather than through the modulus algorithm.

1.5 more knowledge

The preceding list only describes the usage of common partition policies. After a partition is created, maintenance is required for the partition. For details, see:

Http://dev.mysql.com/doc/refman/5.1/zh/partitioning.html#partitioning-hash

No matter what type of partition is used, the partition is always automatically numbered at the time of creation, and records are recorded from 0. This is very important to remember. When a new row is inserted into a partition table, these partition numbers are used to identify the correct partitions. For example, if your table uses four partitions, these partitions are numbered 0, 1, 2, and 3. For the RANGE and LIST partition types, it is necessary to confirm that each partition number defines a partition. For HASH partitions, the user function must return an integer greater than 0. For KEY partitions, this problem is automatically handled through the hash function used inside the MySQL server.

2. Partition Table restrictions

· A data table can have a maximum of 1024 partitions.

· In MySQL, the partition expression result must be an integer. in MySQL, you can use columns.

· If the partition field contains a primary key or a unique index column, all primary keys or unique index columns must be included.

· Foreign keys cannot be used for partitioned tables.

· The same storage engine must be used for each partition table of the same table.

· Partition functions are restricted. They can only be any function or other expressions that are valid in MySQL and return an integer that is neither a very number nor a random number.

· Some storage engines do not support partitioning.

3. Use the "trap" of the Partition Table"

When the data size is very large, using a partitioned table can improve the performance, but the premise is that it must effectively avoid the traps listed below

· NULL Value

Partitions in MySQL are not processed with a NULL value, whether it is a column value or a user-defined expression value. In general, for NULL, or when the expression receives an invalid value (e.g. YEAR ('asdf-12-12 ') returns NULL. In this case, MySQL regards NULL as 0. If a large number of records exist, in the end, a large number of records will be concentrated in one partition, which violates the original intention of the partition.

If you want to avoid this practice, you should NOT allow NULL values during table design. The most possible method is to achieve this by declaring the column "not null.

· The partition column and index column do not match

If the defined index column does not match the partition column, the query cannot perform partition filtering. For example, define partitions on column a and define indexes on column B. Because each partition has an independent index, You need to scan each partition during indexing.

Avoid creating indexes that do not match the partition column unless the query contains conditions that can filter partitions.

· The cost of selecting a partition is high.

For Range and list partitions, all partition conditions need to be traversed each operation to determine which partition the related records belong to. If the number of partitions is large, it will waste a lot of resources in selecting partitions.

To avoid this situation, you can limit the number of partitions (<100) or select hash partitions.

· The cost of locking the underlying table is high.

When querying and accessing a partition table, MYSQL opens and locks the underlying table of the residence. This operation occurs before partitioning and filtering and is irrelevant to the partition type, which affects all partition queries.

Batch update can be used to reduce the number of operations and limit the number of partitions.

· High maintenance costs

It is quick to add/delete partitions. However, the process of restructuring or alter partitions is similar to that of alter table, which performs a large amount of data replication operations with low efficiency.

4. Optimize Query statements

For access to partition tables, the most important thing is to include partition columns in the where condition. Even if it looks redundant, only in this way can we filter unwanted partitions, otherwise, the partition table is accessed.

Let's look at a simple example.

The Employees table uses store_id as the condition for range partitions. If store_id is not used as the where condition, all partitions are queried.


When Store_id is used as the where condition, only the corresponding partition is queried.

Pay attention to the expressions in the where condition.

· Simply use partition Columns

In the Where condition, the partition column must not be processed by the function. If the where condition is written as where YEAR (store_id), the partition filtering will fail and all partitions will be checked during query. This is similar to indexing.

· Join query

If the partition table is the second table of the join operation and the join condition is that the partition is created, MYSQL will only match in the corresponding partition.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.