Partition of database sub-table

Source: Internet
Author: User
Tags generator ranges unique id mysql index

We know that if we use MySQL, when the amount of database data reaches a certain amount of data, we will consider the database to do the sub-table, and other operations, but under what circumstances to do how to slice, the following table describes.

One, 1 sub-Library reasons

First, in the case of a single database server with sufficient performance, the library has no effect on database performance. On the database storage, it database only plays a namespace role. databasethe table files in are stored in a database名 named folder. For example, the following employees database:

mysql> show tables in employees;+---------------------+| Tables_in_employees |+---------------------+| departments         || dept_emp            || dept_manager        || employees           || salaries            || titles              |+---------------------+

This is true in the operating system:

# haitian at haitian-coder.local in /usr/local/var/mysql/employees on git:master ● [21:19:47]→ ls  db.opt           dept_emp.frm     dept_manager.ibd salaries.frm     titles.ibddepartments.frm  dept_emp.ibd     employees.frm    salaries.ibddepartments.ibd  dept_manager.frm employees.ibd    titles.frm

databaseis not a file, only play namespace a role, so MySQL the size of database course there is no limit, and there is no limit to the number of tables inside.

So why divide the library?

The answer is to solve the performance problem of a single server, when a single database server can not support the current amount of data, it is necessary according to the business logic to close the table into a few points, respectively, placed in a different database server to reduce the load of a single server.

The Sub-library generally considers vertical slicing, unless the amount of data is still more than a single server can be loaded after vertical segmentation to continue to slice horizontally.

For example, a forum system database because the current server performance can not meet the needs of the sub-Library. First vertical segmentation, according to business logic to the user related data tables such as user information, points, user private messages and so put into the user database; Forum related data sheets such as plates, posts, replies, etc. into the forum database, two databases placed on different servers.

After splitting a table, it is often impossible to be completely unrelated, such as posting in a post or replying to a person that is in the user database. It is possible to get the reply of the current post, the person who posted it, the reply person and so on, before splitting it, it can only get the final data after multiple queries because the cross-database cannot be queried.

So summing up, the purpose of the library is to reduce the single server load, the principle of segmentation is based on the degree of business to split, the disadvantage is that cross-database can not be linked table query.

Ii. reasons for sub-table 1

When the amount of data is large, the B-tree index will not work. Unless the index overwrites the query, the database server needs to query all eligible records according to the results of the index scan, and if the amount of data is large, this will result in a lot of random I/O, and the database response time will be unacceptably high. In addition, the cost of index maintenance (disk space, I/O operations) is also very high.

2 Vertical sub-table

Reason:

1. Based on MySQL index implementation principle and the content of the relevant optimization strategy we know that the Innodb Primary index leaf node stores all the information for the current row, so reducing the field allows memory to load more rows of data, which facilitates querying.

2. Limited by file size in the operating system.

Segmentation principle: Dividing a field that is not commonly used or business logic tightly or stores more content into a new table allows the table to store more data.

3 Horizontal sub-table

Reason:

1. As the amount of data increases, the number of table rows is huge and the query becomes less efficient.

2. Also limited by the file size limit in the operating system, the amount of data can not be increased infinitely, when reaching a certain capacity, you need to slice horizontally to reduce the size of a single table (file).

Sharding principle: Incremental interval or hash or other business logic.

What kind of segmentation method to use is judged by the actual business logic.

For example, the access to the table is recently generated new data, historical data access less, you can consider the time increment according to a certain time period (such as annual) segmentation.

If you have a more uniform access to the table, there is no obvious hotspot area, you can consider using a range (such as 500w per table) or a normal hash or a consistent hash to slice.

Global PRIMARY Key issues:

Tables that originally relied on a database to generate primary keys (such as self-increment) need to implement their own primary key generation after splitting, because the general split rule is built on the primary key, so you need to determine the primary key when inserting new data before you can find the stored table.

In practical applications, there have been more mature schemes. For example, the self-increment table of the flickr main key, the global primary key generation scheme is a good solution to the performance and single point problem, the specific implementation principle can refer to this post. In addition, there are global primary key generation schemes similar to UUID, such as the ID generator of the delta reference Instagram .

Consistent hash:

The use of consistent hash segmentation is more extensible than normal hash segmentation, and can be used to add and delete split tables. Consistent hash of the specific principles can refer to this post, if the split table is stored on different server nodes, you can use the same as post to the node name or IP hash; If a split table exists in a server, the split table name can be hashed.

Third, MySQL's partition table

The above-mentioned traditional sub-database is implemented in the application layer, after splitting the original system to make a great adjustment to adapt to the new split library or table, such as SQL the implementation of a middleware, the original query into two queries, the implementation of a global primary key generator and so on.

The MySQL partition table described below is at the database level, MySQL its own implementation of the sub-table function, to a large extent simplifies the difficulty of the table.

1 Introduction

A partitioned table is a separate logical table for the user, but the underlying is implemented by multiple physical sub-tables.

In other words, for the original table partition, for the application layer can not change, we do not have to change the original SQL statement, equivalent to MySQL help us implement the traditional sub-table SQL Middleware, of course, MySQL the implementation of the partition table is much more complex.

In addition, when you create a partition, you can specify where the partition's index files and data files are stored, so the data tables can be distributed across different physical devices, making it possible to efficiently utilize multiple hardware devices.

Some limitations:

1. In the previous version of 5.6.7, a table has a maximum of one 1024 partition, and a table can have a maximum of one partition starting from 5.6.7 8192 .

2. Foreign KEY constraints cannot be used in partitioned tables.

3. All unique index columns (including primary keys) for the primary table must contain the partition fields. MySQLthe official documentation reads:

All columns used in the partitioning expression for a partitioned table must is part of every unique key that the table Ma Y has.

This sentence is not very good understanding, need to pass an example to understand, MySQL official documents also for this limitation deliberately made examples and explanations.

2 partition table type range partition

Depending on the range partition, the range should be contiguous but not overlapping, using PARTITION BY RANGE the VALUES LESS THAN keyword. When you do not use the COLUMNS keyword, the RANGE parentheses must be the integer field name or return a function that determines the integer.

Depending on the range of values:

CREATE TABLE employees (    id INT NOT NULL,    fname VARCHAR(30),    lname VARCHAR(30),    hired DATE NOT NULL DEFAULT ‘1970-01-01‘,    separated DATE NOT NULL DEFAULT ‘9999-12-31‘,    job_code INT NOT NULL,    store_id INT NOT NULL)PARTITION BY RANGE (store_id) (    PARTITION p0 VALUES LESS THAN (6),    PARTITION p1 VALUES LESS THAN (11),    PARTITION p2 VALUES LESS THAN (16),    PARTITION p3 VALUES LESS THAN MAXVALUE);

Depending on the TIMESTAMP range:

 CREATE TABLE quarterly_report_status (report_id INT not NULL, Report_status VARCHAR () is not NULL, Report_u pdated TIMESTAMP not NULL DEFAULT current_timestamp on UPDATE current_timestamp) PARTITION by RANGE (unix_timestamp _updated)) (PARTITION p0 values less THAN (Unix_timestamp (' 2008-01-01 00:00:00 '), PARTITION p1 values less THAN (Unix_timestamp (' 2008-04-01 00:00:00 ')), PARTITION p2 VALUES less THAN (Unix_timestamp (' 2008-07-01 00:00:00 ')), PARTITION P3 Values less THAN (Unix_timestamp (' 2008-10-01 00:00:00 '), PARTITION P4 values less THAN (unix_timestam P (' 2009-01-01 00:00:00 ')), PARTITION P5 VALUES less THAN (Unix_timestamp (' 2009-04-01 00:00:00 ')), PARTITION P6 VA Lues less THAN (Unix_timestamp ("2009-07-01 00:00:00"), PARTITION P7 VALUES less THAN (Unix_timestamp (' 2009-10-01 00 : 00:00 '), PARTITION P8 values less THAN (Unix_timestamp (' 2010-01-01 00:00:00 '), PARTITION p9 values less THAN ( MAXVALUE)); 

Add COLUMNS keywords to define non-integer and multi-column ranges, but be aware that COLUMNS only column names, unsupported functions, and multi-column ranges in parentheses must be in an incremental trend:

According to DATE , DATETIME scope:

CREATE TABLE members (    firstname VARCHAR(25) NOT NULL,    lastname VARCHAR(25) NOT NULL,    username VARCHAR(16) NOT NULL,    email VARCHAR(35),    joined DATE NOT NULL)PARTITION BY RANGE COLUMNS(joined) (    PARTITION p0 VALUES LESS THAN (‘1960-01-01‘),    PARTITION p1 VALUES LESS THAN (‘1970-01-01‘),    PARTITION p2 VALUES LESS THAN (‘1980-01-01‘),    PARTITION p3 VALUES LESS THAN (‘1990-01-01‘),    PARTITION p4 VALUES LESS THAN MAXVALUE);

Depending on the multi-column range:

CREATE TABLE rc3 (    a INT,    b INT)PARTITION BY RANGE COLUMNS(a,b) (    PARTITION p0 VALUES LESS THAN (0,10),    PARTITION p1 VALUES LESS THAN (10,20),    PARTITION p2 VALUES LESS THAN (10,30),    PARTITION p3 VALUES LESS THAN (10,35),    PARTITION p4 VALUES LESS THAN (20,40),    PARTITION p5 VALUES LESS THAN (MAXVALUE,MAXVALUE) );
List partition

Each partition value does not overlap, using, or keywords, depending on the specific value partition PARTITION BY LIST VALUES IN . Rangesimilar to partitions, when you do not use COLUMNS a keyword, you List must have an integer field name in parentheses or a function that determines an integer.

CREATE TABLE employees (    id INT NOT NULL,    fname VARCHAR(30),    lname VARCHAR(30),    hired DATE NOT NULL DEFAULT ‘1970-01-01‘,    separated DATE NOT NULL DEFAULT ‘9999-12-31‘,    job_code INT,    store_id INT)PARTITION BY LIST(store_id) (    PARTITION pNorth VALUES IN (3,5,6,9,17),    PARTITION pEast VALUES IN (1,2,10,11,19,20),    PARTITION pWest VALUES IN (4,12,13,14,18),    PARTITION pCentral VALUES IN (7,8,15,16));

Values must be overwritten by all partitions, otherwise inserting a value that does not belong to any one partition will cause an error.

mysql> CREATE TABLE h2 (    ->   c1 INT,    ->   c2 INT    -> )    -> PARTITION BY LIST(c1) (    ->   PARTITION p0 VALUES IN (1, 4, 7),    ->   PARTITION p1 VALUES IN (2, 5, 8)    -> );Query OK, 0 rows affected (0.11 sec)mysql> INSERT INTO h2 VALUES (3, 5);ERROR 1525 (HY000): Table has no partition for value 3

When inserting multiple data errors, if the table's engine supports transactions ( Innodb ), no data is inserted, and if the transaction is not supported, the data before the error is inserted and will not be executed later.

You can use the IGNORE keyword to ignore the data that has been faulted, so that all other qualifying data will be inserted unaffected.

mysql> TRUNCATE h2;Query OK, 1 row affected (0.00 sec)mysql> SELECT * FROM h2;Empty set (0.00 sec)mysql> INSERT IGNORE INTO h2 VALUES (2, 5), (6, 10), (7, 5), (3, 1), (1, 9);Query OK, 3 rows affected (0.00 sec)Records: 5  Duplicates: 2  Warnings: 0mysql> SELECT * FROM h2;+------+------+| c1   | c2   |+------+------+|    7 |    5 ||    1 |    9 ||    2 |    5 |+------+------+3 rows in set (0.00 sec)

As with Range partitions, adding COLUMNS keywords can support non-integers and multiple columns.

Hash partition

HashPartitioning is primarily used to ensure that the data is evenly distributed in a predetermined number of partitions, that only the integer Hash column or the function that returns an integer is used in parentheses, in effect using the returned integer to model the number of partitions.

CREATE TABLE employees (    id INT NOT NULL,    fname VARCHAR(30),    lname VARCHAR(30),    hired DATE NOT NULL DEFAULT ‘1970-01-01‘,    separated DATE NOT NULL DEFAULT ‘9999-12-31‘,    job_code INT,    store_id INT)PARTITION BY HASH(store_id)PARTITIONS 4;
CREATE TABLE employees (    id INT NOT NULL,    fname VARCHAR(30),    lname VARCHAR(30),    hired DATE NOT NULL DEFAULT ‘1970-01-01‘,    separated DATE NOT NULL DEFAULT ‘9999-12-31‘,    job_code INT,    store_id INT)PARTITION BY HASH( YEAR(hired) )PARTITIONS 4;

HashPartitioning also has Hash the same problems as traditional tables, with poor scalability. MySQLalso provides a consistent Hash partitioning method-linear Hash partitioning, where you only need to add keywords when defining partitions, and if you are LINEAR interested in implementing the principles, you can view the official documentation.

CREATE TABLE employees (    id INT NOT NULL,    fname VARCHAR(30),    lname VARCHAR(30),    hired DATE NOT NULL DEFAULT ‘1970-01-01‘,    separated DATE NOT NULL DEFAULT ‘9999-12-31‘,    job_code INT,    store_id INT)PARTITION BY LINEAR HASH( YEAR(hired) )PARTITIONS 4;
Key partition

Partitioning by key is similar to a user-defined expression that is used in addition to the hash partition, except for hash partitioning, while the key partition's hashing function is provided by the MySQL server. The MySQL cluster (Cluster) uses the function MD5 () to implement the key partition, and for tables using other storage engines, the server uses its own internal hash function, which is based on the same algorithms as password ().

KeyPartitioning Hash is similar to partitioning, except that the Hash functions are different, the keyword is replaced by the definition, and the Hash Key same Key partition has a Hash linear Key partitioning method.

CREATE TABLE tk (    col1 INT NOT NULL,    col2 CHAR(5),    col3 DATE)PARTITION BY LINEAR KEY (col1)PARTITIONS 3;

In addition, the column names in parentheses can be omitted when the table has a primary key or a unique index, and Key Mysql will be selected in the order of the primary key-unique index, when the unique index is not found.

Sub-partition

Sub-partitions are re-split for each partition in the partitioned table. To create a sub-partition method:

CREATE TABLE ts (id INT, purchased DATE)    PARTITION BY RANGE( YEAR(purchased) )    SUBPARTITION BY HASH( TO_DAYS(purchased) )    SUBPARTITIONS 2 (        PARTITION p0 VALUES LESS THAN (1990),        PARTITION p1 VALUES LESS THAN (2000),        PARTITION p2 VALUES LESS THAN MAXVALUE    );

And

CREATE TABLE ts (id INT, purchased DATE) PARTITION by RANGE (year (purchased)) Subpartition by HASH (To_days (Purchas ed)) (PARTITION p0 VALUES less THAN (1990) (subpartition s0 DATA DIRECTORY = '/disk0/d ATA ' INDEX directory = '/disk0/idx ', subpartition s1 DATA directory = '/disk1/dat A ' INDEX DIRECTORY = '/disk1/idx '), PARTITION p1 VALUES less THAN (subpar Tition s2 DATA directory = '/disk2/data ' INDEX directory = '/disk2/idx ', Subparti tion s3 DATA directory = '/disk3/data ' INDEX directory = '/disk3/idx '), parti                tion p2 VALUES less THAN MAXVALUE (subpartition s4 DATA DIRECTORY = '/disk4/data '                INDEX directory = '/disk4/idx ', subpartition s5 DATA directory = '/disk5/data ' INDEX DIRECTORY = '/disk5/idx ')); 

It is important to note that the number of sub-partitions per partition must be the same. If you explicitly define any sub-partitions on any partition on a partitioned table SUBPARTITION , you must define all the sub-partitions, and you must specify a name that is unique to the full table.

Use of partitioned Tables and query optimization Select the partitioning method according to the actual situation

The principle of partitioning an existing table is the same as a traditional table.

The traditional partition according to the Increment interval table corresponding to the partition Range , such as access to the table is recently produced new data, historical data access less, you can be a certain time period (such as year or month) or a certain number (such as 1 million) to partition the table, depending on the table index structure. The last partition after partitioning is the recently generated data, and when the amount of data becomes larger again over time, the last partition can be re-partitioned ( REORGANIZE PARTITION ) to separate data for a period of time (one year or January) or a certain number (such as 1 million).

The traditional hash method sub-table corresponds to the partition's Hash/key partition, which is described in detail above.

Query optimization

The purpose of partitioning is to improve query efficiency, and if the query scope is all partitions then the partition does not work, and we use explain partitions commands to view SQL the usage of the partition.

In general, the where partitioning column is added to the condition.

For example salaries , the table structure is:

Mysql> Show CREATE TABLE salaries\g;*************************** 1.  Row *************************** table:salariescreate table:create Table ' salaries ' (' emp_no ' int (one) not NULL, ' Salary ' int (one) is not null, ' from_date ' date is not null, ' to_date ' date is not null, PRIMARY KEY (' emp_no ', ' from_date ')) ENGI Ne=innodb DEFAULT charset=utf8/*!50100 PARTITION by RANGE (year (from_date)) (PARTITION p1 VALUES less THAN (1985) ENGINE = InnoDB, PARTITION p2 values less THAN (1986) engine = InnoDB, PARTITION p3 values less THAN (1987) engine = InnoDB, Partit ION P4 Values less THAN (1988) engine = InnoDB, PARTITION P5 values less THAN (1989) engine = InnoDB, PARTITION P6 values Less THAN (1990) engine = InnoDB, PARTITION p7 values less THAN (1991) engine = InnoDB, PARTITION P8 values less THAN (199 2) engine = InnoDB, PARTITION p9 values less THAN (1993) engine = InnoDB, PARTITION p10 values less THAN (1994) engine = I Nnodb, PARTITION p11 VALUES less THAN (1995) ENGINE = InnoDB, PARTITION p12Values less THAN (1996) engine = InnoDB, PARTITION p13 values less THAN (1997) engine = InnoDB, PARTITION p14 values Less THAN (1998) engine = InnoDB, PARTITION p15 values less THAN (1999) engine = InnoDB, PARTITION p16 values less THAN (2000) Engine = InnoDB, PARTITION p17 values less THAN (2001) engine = InnoDB, PARTITION p18 values less THAN MAXVALUE engine = I NNODB) */

The following query does not take advantage of the partition because it partitions contains all the partitions:

mysql> explain partitions select * from salaries where salary > 100000\G;*************************** 1. row ***************************           id: 1  select_type: SIMPLE        table: salaries   partitions: p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11,p12,p13,p14,p15,p16,p17,p18         type: ALLpossible_keys: NULL          key: NULL      key_len: NULL          ref: NULL         rows: 2835486        Extra: Using where

Only where partition columns can be added to the condition to filter out unwanted partitions:

mysql> explain partitions select * from salaries where salary > 100000 and from_date > ‘1998-01-01‘\G;*************************** 1. row ***************************           id: 1  select_type: SIMPLE        table: salaries   partitions: p15,p16,p17,p18         type: ALLpossible_keys: NULL          key: NULL      key_len: NULL          ref: NULL         rows: 1152556        Extra: Using where

As with normal searches, using a function on the left side of the operator will invalidate partition filtering, even if it is the same as the partition function:

mysql> explain partitions select * from salaries where salary > 100000 and year(from_date) > 1998\G;*************************** 1. row ***************************           id: 1  select_type: SIMPLE        table: salaries   partitions: p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11,p12,p13,p14,p15,p16,p17,p18         type: ALLpossible_keys: NULL          key: NULL      key_len: NULL          ref: NULL         rows: 2835486        Extra: Using where
Iv. Comparison of partitions and sub-tables
    • After the traditional table, count and sum other statistical operations can only operate on all the Shard table after the final statistics are calculated again at the application layer. The partition table is not affected and can be directly counted.

Queries involving aggregate functions such as SUM () and COUNT () can easily be parallelized. A simple example of such a query might is SELECT salesperson_id, COUNT (orders) as Order_total from sales GROUP by Salesper son_id;. By ' parallelized, ' we mean that the query can be run simultaneously on each partition, and the final result obtained Merel Y by summing the results obtained for all partitions.

    • Partitions have minimal changes to the original system, and the partitions involve only the database level, and the application layer does not need to make changes.

    • A limitation of a partition is that all unique fields (including primary keys) of the primary table must contain the partition fields, and the table does not have this restriction.

    • The Sub-table includes vertical and horizontal slicing, and partitioning can only play the role of horizontal slicing.

V. Common Sub-database sub-table 1 TDDL Introduction

Tddl mainly divided into three times, Matrix, group, atom layer;

Consolidated results, data execution, matrix layer SQL parsing, rule Engine calculation, group layer

Read-write separation, weight, write ha switchover, readable ha switchover, slave node

Atom Layer

1 abstraction of a single database

2 JBoss data source, IP port username password can be dynamically modified

3 thread count mode, which protects the processing thread of the business, exceeds the specified value and protects the boot.

4 dynamic blocking of a SQL execution

5 number of executions statistics and limitations

TDDL Unique Key Generation method

At present, based on TDDL, the result of the self-increment ID on a database is not globally unique under the Sub-Library table. Therefore, there is a need for a technology to generate a global unique ID after the sub-database is divided into tables.

The unique key generation must have: 1) globally unique, 2) high availability, 3) high performance;

TDDL mainly uses database + memory to implement, in-memory allocation advantages: Simple and efficient disadvantage: there is no guarantee that the self-increment sequence is as follows, the following step 1000:

Group Value
Group_0 0
Group_1 1000
Group_2 2000
Group_3 3000

When a unique key needs to be generated, select one randomly from the 4 group above to get the ID of the value+ step, for example, get the ID of 1000~1000+1000 from group_1, bulk fetch, and improve performance. Once obtained, the records for the database become the following format:

Group Value
Group_0 0
Group_1 5000
Group_2 2000
Group_3 3000

After each fetch, change the value of the corresponding group to the number of Value+group * steps.

Partition of database sub-table

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.