MySQL partition table uses some of the limitations and places to note

Source: Internet
Author: User
Tags bulk insert one table

See: http://blog.yemou.net/article/query/info/tytfjhfascvhzxcyt347

The MySQL partitioning strategy is based on two very important assumptions: queries can filter (prunning) out a lot of extra partitions, and the partition itself does not bring many additional costs. And it turns out that these two assumptions are problematic in some scenarios. Here are some issues that you might encounter.

Null bit invalidates partition filtering

About partitioned Tables One thing that's easy to misunderstand is that the value of the partition's expression can be null: The first partition is a special partition. Assuming that the partition by RANGE year (order_date) partition, all order_date are null or an illegal value, the records will be stored to the first partition. Now suppose you have the following query: WHERE order_date between ' 2012-01-01 ' and ' 2012-01-31 '. In fact, MySQL checks two partitions instead of one of the previous guesses: it checks the 2012 partition, and it checks the first partition of the table. Checking the first partition is because the year () function may return a null value when it receives an illegal value, so the value of this range may return null and be deposited to the first partition. This is also true for many other functions, such as to_days ().

If the first partition is very large, especially when using the "Full volume scan data, do not index" strategy, the cost can be very large. and scanning two partitions to find the columns is not the original intention of using the partitioned table. To avoid this, you can create a "useless" first partition, for example in the example above, you can use partition P_nulls VALUES less THAN (0) to create the first partition. If the data in the inserted table is valid, then the first partition is empty, so that even if the first partition needs to be detected, the cost is very small.

This optimization technique is not required in MySQL5.5 because it can be partitioned directly using the column itself rather than the column-based function: PARTITION by RANGE COLUMNS (order_date). So the best solution for this case is to use the MySQL5.5 syntax directly.

Partition column and index column do not match

If the defined index column and the partition column do not match, the query cannot be partitioned for filtering. Assume that an index is defined on column A and partitioned on column B. Because each partition has its own independent index, scanning the index on column B requires scanning the corresponding index within each partition. If the non-leaf nodes of the corresponding index within each partition are in memory, then the speed of the scan is acceptable, but it would be better if some partition indexes could be skipped. To avoid this problem, you should avoid creating and partitioning columns that do not match the index, unless the query also contains conditions that can filter the partitions.

It sounds easy to avoid the problem, but sometimes there are some unexpected problems. For example, in an associative query, the partitioned table is the second table in the association order, and the index and partitioning criteria used by the association do not match. Each row that matches the criteria for the first table is then required to access and search for all partitions of the second table.

The cost of choosing a partition can be high

As mentioned earlier, there are many types of partitions, and different types of partitions are implemented differently, so their performance varies. In particular, range partitioning, the cost of answering questions such as "which partition this line belongs to", "which partitions are eligible for the query", is likely to be costly because the server needs to scan the list of all partition definitions to find the correct answer. A linear search like this is inefficient, so the cost increases as the number of partitions increases.

One of the worst dispatch we've actually encountered is when writing large amounts of data by line. Each time a row of data is written to a range-partitioned table, the partition definition list needs to be scanned to find the appropriate target partition. This problem can be mitigated by limiting the number of partitions, and according to practical experience, 100 or so partitions are not a problem for most systems. Other partition types, such as key partitioning and hash partitioning, do not have such a problem.

The cost of opening and locking all the underlying tables can be high

When the query accesses the partitioned table, MySQL needs to open the well to lock down all the underlying tables, which is another cost to the partitioned table. This operation occurs before partition filtering, so it is not possible to reduce this overhead through partition filtering, and the overhead is independent of the partition type and affects all queries. This provides significant additional overhead for queries that operate very quickly, such as finding a single row from a primary key. You can reduce this overhead for a single operation in bulk operations, such as using BULK INSERT or LOAD data INFILE, deleting multiple rows of data at once, and so on. It is also necessary to limit the number of partitions.

The cost of maintaining a partition can be high

Some partition maintenance operations can be very fast, such as adding or deleting partitions (it may be slow to delete a large partition, but this is another matter). Some operations, such as reorganizing partitions or actions like ALTER statements, require copying data. The principle of reorganizing partitions is similar to alter, creating a temporary partition, then copying the data to it, and finally deleting the original partition.

As mentioned above, the partition table is not a "silver bullet". Here are some other limitations of the current partitioning implementation: As mentioned above, the partition table is not a "silver bullet". The following are some of the other limitations of the current partitioning implementation:

    • All partitions must use the same storage engine.

    • There are also limitations to the functions and expressions that can be used in a partition function. Some storage engines do not support partitioning.

    • For MyISAM partitioned tables, you can no longer use the load Indexi into cache operation.

    • For MyISAM tables, you need to open more file descriptors when using partitioned tables. Although it looks like a table, there are many separate partitions behind it, and each partition is a separate table for the storage engine. This way, even if the partitioned table consumes only one table cache entry, the file descriptor needs to be multiple. Therefore, even if the appropriate table cache has been configured to ensure that no more than a single process of the operating system can be opened with the number of file descriptors, there is still a problem that exceeds the file descriptor limit for partitioned tables.

Finally, it should be noted that the older version of MySQL will be more problematic. All of the software is bug-aware. The partitioned tables were MySQL5.1 in the 5.1.40, and the B-tonnes of many partitioned tables were repaired after the latter and 5.1.50. MySQL5.5, the partitioning table has also made a lot of improvements, which makes it possible for partitioned tables to be phased in in the production environment in the upcoming MySQL5.6 release, the partitioning table has been more enhanced, such as the newly ignited ALTER TABLE EXCHANGE PARTITION.

MySQL partition table uses some of the limitations and places to note

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.