Parse MySQL Index

Source: Internet
Author: User
Tags mysql index

In MySQL, an index is a data structure that the storage engine uses to quickly find a target record. Common index types include B-tree indexes, hash indexes, spatial indexes (R-tree), full-text indexes, and so on.

Indexes are implemented at the storage engine level, and different storage engines do not work in the same way as indexes.

The following highlights the B-Tree index and the InnoDB and MyISAM storage engines.

Why choose a B-tree

The most expensive part of reading and writing disks is seeking, and it is very fast to access the range data in sequence, for two reasons:

    1. Sequential I/O does not require multiple seek, so it is much faster than random I/O (especially for mechanical hard drives).
    2. If the server is able to read the data in the order it needs, no additional sorting is required, and the Goup by query does not need to be sorted and the rows are evaluated by group-in-aggregation.

The index itself can be large, not all in memory, and therefore often stored as an index file on a disk. This will result in disk I/O consumption during index lookups. With respect to memory access, I/O access consumes several orders of magnitude, and when the index is designed, its structure organizes to minimize the number of disk I/O accesses during the lookup process.

A B-tree is a balanced lookup tree similar to a red-black tree, but better at reducing disk I/O operations, the B-tree has a lower depth, and finding an element requires only a few nodes from the disk to load into memory and quickly accesses the data to be looked up.

Introduction to B-Tree

B-Tree

A B-Tree T is a root tree with the following properties (Root is root[t]):

    1. Each node x has the following fields:
      • X.N, node x contains the number of keywords.
      • X.N keys itself, in a non-descending order, so.
      • X.leaf, Boolean value, True if X is a leaf node, or False if it is an inner node
    2. Each inner node x contains x.n+1 pointers to their children , and the leaf nodes have no children, so the pointer field of their children is undefined.
    3. If ki is a keyword stored in the node x child node:
    4. Each leaf node has the same depth, that is, the height of the tree H
    5. Each node contains a number of keywords X.N contains an upper bound and the lower bound, with a fixed integer t>=2 to the table;
      • Each non-root node contains at least t-1 keywords. Each non-root inner node has at least one child, and if the tree is non-empty, the root node contains at least a keyword.
      • Each node contains at most 2t-1 keywords, so that an inner node contains at least 2t children, we say a node is full, if the node contains exactly 2t-1 keywords.

A B-tree with a height of 3, which contains the smallest possible number of key words, shown in each node x is n[x]

B + Tree

The B + Tree is a variant of the B. C-Tree, and the B + tree is more suitable for the external storage index structure, and the MySQL storage engine generally uses B+tree to implement its index structure. The inner node contains only the key values and pointers to the child nodes, the data is stored in the leaf node, and all the record nodes are placed in the same layer's leaf nodes according to the order of the key values, and each node pointer is connected (doubly linked list).
A B + tree with a height of 2

All records are in the leaf node, well, and are stored sequentially, if we are from the leftmost
The leaf nodes begin sequential traversal and can be sorted in order of all boring values 15, 10, 15, 20, 25, 30, 50, 55, 60, 65, 75, 80, 85, 90

The size of nodes and leaf nodes within a B + tree can be different.

Index implementation

The storage engine uses B + trees in different ways, and indexed columns are organized in order. B + Tree index in the database has a feature is its high fan out, so the database of B + tree height is generally in two or three layers, that is, to find a key value of the row records, up to 2 to 3 IO.

Index based on B-tree structure

B + Tree Index

A B + Tree index in a database can be divided into a clustered index (clustered index) and a secondary clustered index (secondary index), but whether it is a clustered or nonclustered index, its interior is a B + tree, which is highly balanced, and the leaf node holds all the data.

A clustered index differs from a nonclustered index in that the leaf node is storing information for an entire row.

Let's look at how INNODDB and MyISAM store the following table:

123456 Create Table layout_test (col1 int not null,col2 int not null, primary Key (col1),key (col2) --level two index )

The MyISAM engine uses B+tree as the index structure, the index file and the data file are detached, and the index file only holds the address of the data record. The data field for each item in the leaf node holds the address of the record, which is stored on disk in the order in which it was written.

Data distribution of MyISAM table Layout_test

The primary key index and other indexes in the MyISAM are not different on the structure, and the primary key index is a unique non-empty index named primary.

MyISAM primary Key distribution for table Layout_test

Col2 column index distribution of MyISAM table Layout_test

In InnoDB, the primary key is the clustered index, and the table data file itself is an index structure organized by B+tree, and the leaf node contains all the data of the primary key and row, and the Inner node page contains only the index column primary key . The logical order of the key values in the clustered index determines the physical order of the corresponding rows in the table. The leaf node of the InnoDB two index is not the "row pointer", but the primary key value, and as a pointer to the row, the clustered index is a B + tree constructed by the primary key of the table.

InnoDB primary Key distribution for table Layout_test

Each entry for a clustered index leaf node contains a primary key, a transaction ID, a rollback pointer for transactions and MVCC, and all remaining columns.

InnoDB primary Key distribution for table Layout_test

InnoDB's Level Two index and clustered index are very different. Leaf nodes that InnoDB a level two index store are not "row pointers", but primary key values, which are used as pointers to rows. Such a strategy reduces the maintenance of the two-level index when the current row moves or the data page splits. Using a primary key value as an index pointer can take more space for a two-level index, and the benefit is that INNODB does not need to update the "pointer" in the two-level index when moving.

It is easy to see the difference between InnoDB and MyISAM saving data and indexes.

Aggregation and non-clustered contrast graphs

Advantages of Clustered Indexes

The storage wells of the clustered index are not physically contiguous, but logically contiguous, and the pages are contiguous, and the pages are connected through a two-way chain table.

Another 10 benefit of a clustered index is that it is very fast for sorting lookups and scope lookups for primary keys.

InnoDB's logical storage structure, from the logical storage structure of the INNODB storage engine, all data is logically stored in a space called a tablespace (tablespace). Table spaces are also composed of segments (segment), extents (extent), and Pages (page). Pages are sometimes called (blocks) in some documents, and the logical storage structure of the InnoDB storage engine is roughly:

Image

    • Can save the relevant data together, especially the data that is accessed on a single page, can reduce the number of IO;
    • Faster data access. A clustered index saves the index and data in the same B-tree, so getting data from a clustered index is usually faster than finding it in a nonclustered index.
    • Queries that use the overwrite index Scan can directly use the primary key values in the node.
Disadvantages of Clustered Indexes
    • Aggregated data maximizes the performance of I/O-intensive applications, but if the data is all in memory, the order of access is less important, and the clustered index is less advantageous;
    • The insertion speed is heavily dependent on the insertion order. Inserting in the order of the primary key is the fastest way to load data into the InnoDB table. However, if you do not load the data in the primary key order, it is best to reorganize the table using the Optimize table command after loading.
    • Updating a clustered index column is expensive because it forces InnoDB to move each updated row to a new location.
    • A clustered index-based table may face "page splitting" when inserting new rows, or when the primary key is updated causing the need to move rows . When a row's primary key value requires that the row be inserted into a full page, the storage engine splits the page into two pages to accommodate the row, which is a split operation . Page splitting causes the table to consume more disk space.
    • A clustered index can cause a full table scan to slow down, especially if rows are sparse, or when data storage is discontinuous due to page splitting.
    • Secondary indexes (nonclustered indexes) may be larger than expected because the leaf nodes in the Level two index contain the primary key columns of the reference row.
    • Secondary index access requires two index lookups, not one at a time.

Reference:

High performance MySQL

The data structure and algorithm principle behind MySQL index

Parse MySQL Index

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.