I. What is table partitioning?
In general, table partitioning refers to dividing a large table into several small tables based on conditions. Mysql5.1 began to support data table partitioning. For example, if the number of records in a user table exceeds 6 million, you can partition the table based on the date of receipt, or partition the table based on the location. Of course, you can also partition based on other conditions.
II. Why partition tables?
To improve scalability, manageability, and database efficiency of large tables and tables with various access modes.
Partition has the following advantages:
You can store more data than a single disk or file system partition.
For data that has lost its meaning, you can easily delete the data by deleting the partitions related to the data. On the contrary, in some cases, the process of adding new data can be easily implemented by adding a new partition for those new data. Other advantages related to partitions are listed below. These features in the MySQL partition are not yet implemented yet, but they have a high priority in our priority list. We hope to include these features in the 5.1 production version.
Some queries can be greatly optimized. This is mainly because data that satisfies a given WHERE statement can be saved in only one or more partitions, in this way, you do not need to find other remaining partitions. Because partitions can be modified after the partition table is created, data can be re-organized when the partition scheme is not configured for the first time to improve the efficiency of common queries.
The query involving aggregate functions such as SUM () and COUNT () can be easily processed in parallel. A simple example of this query is "SELECT salesperson_id, COUNT (orders) as order_total FROM sales group by salesperson_id ;". Through "parallel", this means that the query can be performed on each partition at the same time, and the final result is only obtained through the total of all partitions.
Data Query is distributed across multiple disks to increase the query throughput.
III. Partition type
RANGE partition: multiple rows are allocated to the partition based on the column values in a given continuous interval.
LIST partition: Similar to partitioning by RANGE, the difference is that LIST partition is selected based on the column value matching a value in a discrete value set.
HASH partition: Select a partition based on the return value of a user-defined expression. This expression uses the column values of the rows to be inserted into the table for calculation. This function can contain any expressions that are valid in MySQL and generate non-negative integer values.
KEY partitioning: Similar to HASH partitioning, the difference is that KEY partitioning only supports computing one or more columns, and the MySQL server provides its own HASH function. One or more columns must contain integer values.
RANGE partitioning
Multiple rows are allocated to the partition based on the column values that belong to a given continuous interval.
These intervals must be continuous and do not overlap with each other, and are defined using the values less than operator. The following are examples.
SQL code:
Create table employees (
Id int not null,
Fname VARCHAR (30 ),
Lname VARCHAR (30 ),
Hired date not null default '2017-01-01 ',
Separated date not null default '2017-12-31 ',
Job_code int not null,
Store_id INT NOT NULL
)
Partition by range (store_id )(
Partition p0 values less than (6 ),
Partition p1 values less than (11 ),
Partition p2 values less than (16 ),
Partition p3 values less than (21)
);
According to this partitioning scheme, all the rows corresponding to the employees who work in store 1 to 5 are saved in partition P0, the employees of store 6 to 10 are saved in P1, and so on. Note that each partition is defined in order, from the lowest to the highest. This is the requirement of the partition by range syntax; at this point, it is similar to the "switch... Case statement. For a new row containing data (72, 'Michael ', 'widenius', '2017-06-25 & prime;, NULL, 13, it can be easily determined that it will be inserted into the p2 partition, but what will happen if a store numbered 21st is added? In this solution, because there is no rule to include stores with store_id greater than 20, the server will not know where to store the row, which will cause an error. To avoid this error, you can use a "catchall" values less than clause in the create table Statement. This clause provides all VALUES greater THAN the specified maximum value:
SQL code:
Create table employees (
Id int not null,
Fname VARCHAR (30 ),
Lname VARCHAR (30 ),
Hired date not null default '2017-01-01 ',
Separated date not null default '2017-12-31 ',
Job_code int not null,
Store_id INT NOT NULL
)
Partition by range (store_id )(
PARTITION p0 values less than (6 ),
PARTITION p1 values less than (11 ),
PARTITION p2 values less than (16 ),
PARTITION p3 VALUES LESS THAN MAXVALUE
);
MAXVALUE indicates the largest possible integer. Now, all rows whose store_id column value is greater than or equal to 16 (the maximum value defined) will be stored in partition p3. In the future, when the number of stores has increased to 25, 30, or more, you can use the alter table statement for the store 21-25, 26-30, and so on. In almost the same structure, you can also split the table based on the employee's work code, that is, based on the continuous interval of the job_code column value. For example, assume that two digits of work code are used to represent normal (in-store) workers, three digits are used to represent the office and support personnel, and four digits are used to represent the management layer, you can use the following statement to create the partition table:
SQL code:
Create table employees (
Id int not null,
Fname VARCHAR (30 ),
Lname VARCHAR (30 ),
Hired date not null default '2017-01-01 ',
Separated date not null default '2017-12-31 ',
Job_code int not null,
Store_id INT NOT NULL
)
Partition by range (job_code )(
PARTITION p0 values less than (100 ),
PARTITION p1 values less than (1000 ),
PARTITION p2 values less than (10000)
);
In this example, all rows related to workers in the store are saved in partition p0, and all rows related to the office and support staff are saved in partition p1, all management-related rows are stored in partition p2. It is also possible to use an expression in the values less than clause. The most noteworthy restriction here is that MySQL must be able to calculate the return value of an expression as part of the less than (<) comparison. Therefore, the expression value cannot be NULL. For this reason, the hired, separated, job_code, and store_id columns of the employee table have been defined as non-NULL (not null ). In addition to table data split by store numbers, you can also use an expression based on one of two dates to split table data. For example, suppose you want to split the table based on the YEAR when each employee leaves the company, that is, the value of YEAR (separated. An example of the create table statement that implements this partition mode is as follows:
SQL code:
Create table employees (
Id int not null,
Fname VARCHAR (30 ),
Lname VARCHAR (30 ),
Hired date not null default '2017-01-01 ',
Separated date not null default '2017-12-31 ',
Job_code INT,
Store_id INT
)
Partition by range (YEAR (separated ))(
PARTITION p0 values less than (1991 ),
PARTITION p1 values less than (1996 ),
PARTITION p2 values less than (2001 ),
PARTITION p3 VALUES LESS THAN MAXVALUE
);
In this scheme, records of all employees hired before January 1, 1991 are stored in partition p0, and records of all employees hired between January 1, 1991 and January 1, 1995 are stored in partition p1, records of all employees employed between January 1, 1996 and January 1, 2000 are stored in partition p2, and information of all workers hired after January 1, 2000 is stored in p3.
RANGE partitions are particularly useful in the following scenarios: 1) when you need to delete the "old" data on a partition, you only need to delete the partition. If you use the PARTITION scheme provided in the latest example above, you just need to simply use "alter table employees drop partition p0; to delete all the rows corresponding to the employees who stopped working before January 1, 1991. For tables with a large number of rows, this is more effective than running a DELETE query such as "delete from employees where year (separated) <= 1990. 2) you want to use a column that contains a date or time value, or contains a value that increases from some other levels. 3) regular operations depend directly on the query of columns used to split tables. For example, when you perform a query such as "select count (*) FROM employees where year (separated) = 2000 group by store_id, mySQL can quickly determine that only partitions p2 need to be scanned, because the remaining partitions cannot contain any records that match the WHERE clause.
Note: This optimization has not been enabled in the MySQL 5.1 source program, but the related work is in progress.
LIST partition
Similar to partitioning by RANGE, LIST partitions are selected based on column values matching a value in a discrete value set.
LIST partitions are implemented BY using "partition by list (expr)", where "expr" is a column value or an expression based on a column value and returns an integer, then, each partition is defined using the "values in (value_list)" method. "value_list" is a list of integers separated by commas. Note: In MySQL 5.1, when LIST partitions are used, only integer lists can be matched.
SQL code:
Create table employees (
Id int not null,
Fname VARCHAR (30 ),
Lname VARCHAR (30 ),
Hired date not null default '2017-01-01 ',
Separated date not null default '2017-12-31 ',
Job_code INT,
Store_id INT
);
Assume that there are 20 audio stores distributed in 4 regions with sales permissions, as shown in the following table:
================================
Region store ID
North Area 3, 5, 6, 9, 17
Eastern Region 1, 2, 10, 11, 19, 20
West area 4, 12, 13, 14, 18
Central Area 7, 8, 15, 16
================================
To split a TABLE by storing rows in the same partition of a store in the same region, use the following "create table" statement:
SQL code:
Create table employees (
Id int not null,
Fname VARCHAR (30 ),
Lname VARCHAR (30 ),
Hired date not null default '2017-01-01 ',
Separated date not null default '2017-12-31 ',
Job_code INT,
Store_id INT
)
Partition by list (store_id)
PARTITION pNorth values in (3,5, 6,9, 17 ),
PARTITION pEast values in ),
PARTITION pWest values in (4, 12, 13, 14, 18 ),
PARTITION pCentral values in (7, 8, 15, 16)
);
This makes it easy to add or delete employee records in the specified region in the table. For example, assume that all audio stores in the West Region are sold to other companies. All records (rows) related to employees working in the West Region audio and video store can be deleted using the "alter table employees drop partition pWest;" query. It is similar to the DELETE (DELETE) statement) query "DELETE query delete from employees WHERE store_id IN (, 18);" is more effective than that. [Key points]: if you try to INSERT a column value (or the return value of the partition expression) that is not in the list of partition values, the "INSERT" query will fail and report an error. For example, if the LIST partition adopts the above scheme, the following query will fail:
SQL code:
Insert into employees VALUES (224, 'linus', 'torvalds ', '2017-05-01', '2017-10-12 ', 42, 21 );
This is because the value 21 of the "store_id" column cannot be found in the list of values used to define partitions pNorth, pEast, pWest, or pCentral. Note that LIST partitions do not have definitions that include other VALUES, such as "values less than maxvalue. Any value to be matched must be found in the value list.
In addition to combining with RANGE partitions, LIST partitions can generate a composite subpartition. It is also possible to combine with HASH and KEY partitions to generate composite subpartitions.
HASH partition
Partitions are selected based on the return values of user-defined expressions. The expressions are calculated using the column values of these rows to be inserted into the table. This function can contain any expressions that are valid in MySQL and generate non-negative integer values.
To use HASH partitions to split a TABLE, add a "partition by hash (expr)" clause to the create table statement. "expr" is an expression that returns an integer. It can only be the name of a column whose field type is MySQL integer. In addition, you may need to add a "PARTITIONS num" clause later. num is a non-negative integer, which indicates the number of PARTITIONS to be split in the table.
SQL code:
Create table employees (
Id int not null,
Fname VARCHAR (30 ),
Lname VARCHAR (30 ),
Hired date not null default '2017-01-01 ',
Separated date not null default '2017-12-31 ',
Job_code INT,
Store_id INT
)
Partition by hash (store_id)
PARTITIONS 4;
If a partition clause is not included, the number of PARTITIONS is 1 by default. Exception: for NDB Cluster tables, the default number of partitions is the same as the number of Cluster data nodes. This correction may be based on any MAX_ROWS settings, to ensure that all rows can be inserted into the partition properly.
LINER HASH
MySQL also supports linear hash, which is different from conventional hash in that the linear hash function uses a linear power (powers-of-two) algorithm, the general hash uses the modulus of the hash function value. The only difference in syntax between a LINEAR hash PARTITION and a conventional hash PARTITION is that the "LINEAR" keyword is added to the "partition by" clause.
SQL code:
Create table employees (
Id int not null,
Fname VARCHAR (30 ),
Lname VARCHAR (30 ),
Hired date not null default '2017-01-01 ',
Separated date not null default '2017-12-31 ',
Job_code INT,
Store_id INT
)
Partition by linear hash (YEAR (hired ))
PARTITIONS 4;
Assume that an expression expr is used. When linear hash is used, the partition to be stored in is N in the num partition, where N is obtained according to the following algorithm: 1. find the next one greater than num. the power of 2. We call this value V. It can be obtained through the following formula: 2. V = POWER (2, CEILING (LOG (2, num) (for example, assume that num is 13. LOG (3.7004397181411) is. If CEILING (3.7004397181411) is 4, V = POWER (), that is, 16 ). 3. set N = F (column_list) & (V-1 ). 4. when N> = num: set V = CEIL (V/2) and set N = N & (V-1), for example, assume that the table t1, create table t1 (col1 INT, col2 CHAR (5), col3 DATE) partition by linear hash (YEAR (col3) PARTITIONS 6; now we assume that we want to insert two rows of records to table t1, one record's col3 column value is '2017-04-14 & prime; and the other record's col3 column value is '2017-10-19 & prime ;. The partition to be saved in the first record is determined as follows: V = POWER (2, CEILING (LOG (2003) = 8 N = YEAR ('2017-04-14 & prime ;) & (8-1) = 2003 & 7 = 3 (3> = 6 is FALSE (FALSE): The record will be saved to partition #3) the second record is calculated as follows: V = 8 N = YEAR ('2017-10-19 & prime;) & (8-1) = 1998 & 7 = 6 (6> = 4 TRUE (TRUE): Additional steps required) N = 6 & CEILING (5/2) = 6 & 3 = 2 (2> = 4 is FALSE (FALSE): The record will be saved to #2 partition) the advantage of linear hash partitioning is that it is faster to add, delete, merge, and split partitions, facilitating processing tables containing an extremely large amount of (1000 GiB) data. Its disadvantage is that the distribution of data in each shard is unlikely to be balanced compared with the data distribution obtained by conventional HASH partitions.
KEY partition
Similar to HASH partitioning, the difference is that KEY partitioning only supports computing one or more columns, and the MySQL server provides its own HASH function. One or more columns must contain integer values.
SQL code:
Create table tk (
Col1 int not null,
Col2 CHAR (5 ),
Col3 DATE
)
Partition by linear key (col1)
PARTITIONS 3;
Using the keyword LINEAR in the KEY partition has the same effect as using the HASH partition. The number of the partition is obtained through the power of 2 (powers-of-two) algorithm, rather than through the modulus algorithm.