Daily Study Notes (14) _ MySQL

Source: Internet
Author: User
Daily Study Notes (14) 1) I am reading MySQL Technology Insider InnoDB storage engine. in Chapter 4, I mentioned a problem when talking about MySQL partitions, "does partitioning a data table really improve the database query performance? ".

For OLAP applications, partitions can indeed improve query performance, because a large number of queries in OLAP applications need to frequently scan a large table. Assume that there are 0.1 billion rows of data in the table, with a timestamp attribute column. Your query needs to get the data for one year from this table. If you partition by data stamp, you only need to scan the corresponding partition.

For OLTP applications, you should be very careful when partitioning. In this application, it is impossible to obtain 10% of the data in a large table. most of the data can be obtained by returning several records through indexes. According to the principle of B + tree index, for a large table, the general B + tree needs 2 ~ The disk IO is obtained three times. Therefore, the B + tree can perform operations well without the help of partitions, and poor partition design will lead to serious performance problems.

Assume that a table has 10 million rows of data. if you create 10 HASH partitions for the primary key, each partition has only 1 million rows of data. It seems that the query speed for the primary key should be faster, for example, Select * from table where PK = @ pk. However, have you ever considered the following problem: the B + tree layers of the data of the 1 million rows and the 10 million rows are the same, and they may all be Layer 2? Therefore, the index with primary key partitions will not improve the performance. Well, if the height of the B + tree of the instant 10 million rows is the height of the B + tree of the 31 million rows is 2, the above method can avoid one IO and thus improve the query efficiency. Well, there is no problem, but this table only has a primary key index, and no other columns need to be queried? If you have the following statement: select * from table where key = @ key, you need to scan all 10 partitions for the key query, even if the query overhead of each partition is 2 IO, 20 IO is required. For the original single table design, the Key query is still 2 ~ 3 IO.

Therefore, when using the InnoDB storage engine as an OLTP application table, you should be very careful when using partitions. during the design, you must confirm the data access mode. otherwise, partitions will not increase the query speed, it may slow your application execution.

2) for some other databases, such as SQL Server, its table type is not an index organization table, which is called a heap table. This type of table feature determines that the indexes on the table are non-clustered, but the heap table does not have a primary key. Therefore, the bookmark is a row identifier and can be used as shown in "File Number: page number: to locate the actual row. Maybe it seems that the non-clustered index of the heap table does not need to be searched by the primary key. isn't it faster? Maybe, in some read-only cases, non-clustered indexes using bookmarks as row identifiers may be faster than non-clustered indexes using bookmarks as primary keys. However, when using OLTP, you may need to insert, update, or delete tables. When this type of operation is performed, a non-clustered index with the row identifier as the bookmark may need to constantly update the location of the data page pointed by the row identifier, the overhead may be greater than the non-clustered index in the primary key mode.

Why should I use indexes to organize tables on SQL Server? The bookmarks of heap tables make non-clustered search suspicious faster than primary key bookmarks, and non-clustered queries may exist in multiple tables. we need to search for multiple non-clustered indexes. In addition, for discrete reads of non-clustered indexes, non-clustered indexes on the index organization table are slower than clustered indexes on the heap table.

Indeed, in some cases, the use of heap tables is indeed faster than the indexing of tables, but most of them are due to OLAP applications. Second, whether the data in the table mentioned above needs to be updated, and whether the update will affect the change of the physical address. In addition, for sorting and range searching, the index organization table can locate all the pages to be searched through the middle node of the B + tree and then read them, the features of heap tables make it impossible. Finally, the discrete read of non-clustered indexes does have the above situation, but the pre-read technology can be used to avoid multiple discrete read operations.

Therefore, whether to create a heap table or an index organization table depends on your application.

Author: Dongting Sangren

Source: http://phinecos.cnblogs.com/

This blog follows Creative Commons Attribution 3.0 LicenseIf it is used for non-commercial purposes, you can reprint it freely, but please keep the original author information and the article URL.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.