Indexes in the database

Source: Internet
Author: User

Many friends in the operation of the database often use the index, through the index to get the data can really make this SQL statement run faster, but many people do not know why the use of the index will be faster, not to mention some of the shortcomings of the index, the main introduction of the index today.


    1. Index structure:

      First of all, there are many kinds of indexes, B-tree index, hash index, bitmap index, text index, and so on, here are only the most used B-tree indexes in work.

      In fact, the B-tree index, as the name describes, is a balanced tree (balance tree, of course, certainly not so simple), here in the MySQL InnoDB storage engine For example, the balance tree non-leaf node storage is (key,address), That is, the index value and the next layer of a block address such a data structure, and the leaf node in the non-primary key index is stored (key, primary key), that is, the index value and the corresponding primary key value of the key value pair structure, in the primary key index is stored (primary key, content), That is, the primary key value and this line of content.

The following is a structure diagram of a two-layer index: (Red represents a block of data, the database IO minimum unit, and blue represents the key-value pair discussed above)

650) this.width=650; "Src=" Http://s3.51cto.com/wyfs02/M01/8C/04/wKiom1heoG_hv42RAAA55yDl9CY305.png-wh_500x0-wm_3 -wmp_4-s_2078708614.png "title=" Diagram1.png "alt=" Wkiom1heog_hv42raaa55ydl9cy305.png-wh_50 "/>


With a data block of 16k, a non-(primary key, content) type of key value for 50+4 bytes, for example, a data block can store about (1600/50) = 300 such a key-value pair structure, then to the second layer, you can store (300*300) = 9w is indexed data corresponding to the primary key information, to the third layer is (300*300*300) =2700w, and so on, rapid growth.

Of course, with the increase or decrease of data, the RDBMS will automatically maintain the index, which is why the index is beneficial to the query, and the reasons for the disadvantages of additions and deletions.


2. Simple comparison:

Take the above index as an example, now assume that there is a 9w row of the table, a non-primary key index occupies about 300 pieces of data, then it is advisable to assume that the primary key index occupies 600 data blocks (including all the leaf node data, of course, the actual production should only be larger).

In Oracle, the LOTP system cache hit ratio of less than 95% usually suggests an increase in memory, so this assumes that all index data and table data are stored in memory.

Scenario One: Gets a row of data in the case of an indexed column

In this case, if the full table scan, you need to scan about (1+600)/2=300 data blocks, if you need to scan according to the above index 2 (based on the index column to get the primary key) +2 (it is possible to assume that the primary key index is only two layers) = 4 data blocks, ideally the latter only use the former 4/300=1/ 75 of the time.

Scenario Two: Full table scan

In this case, if the full table scan, then 600 blocks of data are scanned, if according to the above index need to scan 4*9w=36w data block, the latter time is far greater than the former!

In this scenario, by calculating the 600/4=125, you know that if you get more than 125 rows of data through an index, then using the index is not as efficient as a full-table scan, and in the Oracle Rbo era it may really be indexed regardless of cost, but Now that Oracle has changed the optimization strategy to CBO, it is not clear how MySQL will handle the full table scan when there are too many scan rows.


3. Index pros and cons

1) The index is randomly read

If the author is not mistaken, the computer composition principle, the cache will be in the memory of the adjacent data cache to the CPU, although the memory with the cache speed is not as obvious as memory and disk, but random memory read, certainly not all the data cache to the CPU fast.


2) index is not suitable for a large number of queries

As in the above example, if the scan line more than 125 rows then is not as good as the full table scan, although in the work will be different, but I hope you do range scanning time to pay more attention, do not scan too many rows at a time.


3) the index consumes space

In fact, the index is a two-column data table, in InnoDB has a column has been determined to be the primary key (even if no primary key is specified, MySQL will create a self), which is why the primary key is not too long, for a table with n B-Tree index, the primary key to store n+1 times, The columns that are indexed are also stored more than once.


4. Supplement

In fact, the index to consider a lot of things, such as the aggregation factor, lock-related information and so on, I hope everyone has time to study in the following.

2016.12.24

Ken Grass in Shenzhen


This article from "Ken Grass in Shenzhen" blog, declined reprint!

Indexes in the database

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.