MySQL: How indexing Works

Source: Internet
Author: User

Why do I need an index (why it needed)?
When the data is stored on a disk class storage medium, it is stored as a block of data. These data blocks are accessed as a whole, which guarantees the atomicity of the operation. Hard disk block storage structures are similar to linked lists, contain data parts, and a pointer to the next node (or block of data), and do not require continuous storage.

A recordset can only be sorted on a key field, so if you need to search on an unordered field, you should perform a linear search (Linear search), with an average access to the N/2 data block, and N is the number of blocks of data that the table occupies. If this field is a non-primary key field (that is, does not contain a unique access entry), then you need to search for the entire table space on n blocks of data.

But for an ordered field, you can use binary search, which accesses log2 (N) data blocks. This is why performance can be improved in nature.

What are indexes (what is indexing)?
An index is a way to sort multiple fields of a recordset. Creating an index for a field in a table creates another data structure that contains the value of the field and a pointer to the related record, and then sorts the index structure to allow the binary ordering of the data.

The side effect is that the index requires additional disk space, and for the MyISAM engine, these indexes are uniformly stored in a single table, which will quickly reach the size limits that the underlying filesystem can support, if many fields are indexed.

How indexing works (How does it work?)
First, we set up a model database table:

Field name data type size
ID (Primary key) Unsigned INT 4 bytes
FirstName Char (bytes)
LastName Char (bytes)
EmailAddress Char (+)-bytes
Note: Char is used in order to specify an accurate disk consumption size. This model database contains 5 million rows and has no indexes. We will analyze the performance of some query statements, one is to use the primary key ID (ordered) query, and one is to use FirstName (non-critical unordered field).

Example 1
Our model database has r=5,000,000 records, each record length r=204 bytes and uses MyISAM engine storage (the default chunk size is b=1024 bytes), the table block factor (blocking factor) will be BFR = (B/R) = 1024/ 204 = 5 records per disk data block. The disk block required to save this table is n = (r/bfr) = 5000000/5 = 1,000,000 blocks.

The linear search on the ID field averages the need for N/2 = 500,000 block access to find a record assuming the ID field is a query key value, but since the ID field is ordered, you can execute a binary query so that the average only needs to access log2 (1000000) = 19.93 = 20 blocks of data. We immediately saw a great improvement.

Now that the FirstName field is neither ordered nor performs a binary search nor is the value unique, the lookup for this table must go to the last record, that is, full table scan n = 1,000,000 blocks of data access. This is where the index is used to improve.

If the index record contains only one index column and one pointer to the original record data, it is obviously smaller than the original record (multiple columns). So the index itself requires fewer disk blocks and fewer scans. The FirstName Index table structure is as follows:
Field name Data type Size on disk
FirstName Char (bytes)
(Record pointer) Special 4 bytes
Note: MySQL pointers may be 2, 3, 4, or 5 bytes apart, depending on the size of the table.

Example 2
Suppose our database has R = 5,000,000 Records, a long R = 54 Byte index is established, and a default disk block size of 1,024 bytes is used. Then the block factor for the index is BFR = (B/R) = 1024/54 = 18 records per disk block. The total disk block required to accommodate this index table is n = (r/bfr) = 5000000/18 = 277,778 blocks.

You can now use the FirstName field to search to improve performance by using indexes. This allows the use of a binary lookup, averaging log2 (277778) = 18.08-19 data block access. Find the address of the actual record, which requires further block reads, so that the total is 19 + 1 = 20 block access, which is a vastly different number of data block accesses to the stoplist.

When to use the index (when should it is used?)
Given that additional disk space is required to create an index (the above example requires an additional 277,778 disk blocks), and that too many indexes cause problems with file system size limitations, it is critical to consider which fields to index and when to use the index.

Since indexes are only used to speed up data queries, it is clear that indexing a field that is only used for output wastes disk space and processing time when an INSERT or delete operation occurs, so this situation should be avoided as much as possible.

Given the characteristics of binary search, the cardinality or independence of the data is important.
Indexing on a field with a base of 2 will split the data by half, and a base of 1000 will return approximately 1000 records. The low-cardinality binary lookup efficiency is reduced to a linear sort, and the query optimizer might avoid using indexes to query the original table directly if the cardinality is less than a certain scale of records, such as 30%, so the index in this case wastes space.

Transferred from: http://blog.csdn.net/iefreer/article/details/15815455

MySQL: How indexing Works

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.