Learn more about MySQL's index

Last Update:2018-09-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

(i) about the storage engine

Creating an appropriate index is one of the most important techniques in SQL performance tuning. Before learning to create an index, learn about MySQL's architectural details, including how it is organized on the hard disk, how indexes and memory are used and how it works, and how differences in storage engines affect the selection of indexes.

MySQL has a number of derivative versions that support more different kinds of storage engines. This article mainly discusses three kinds of MySQL engines.

MyISAM is a non-transactional storage engine that is the default storage engine prior to MySQL 5.5.

InnoDB 's most popular transactional storage engine, starting with version 5.5, became the default engine for MySQL.

Memory-based, non-transactional, and non-persistent storage engines.

Attention:

Starting with version 5.5, the default storage engine for MySQL tables from MyISAM to InnoDB will have a big impact on users installing packages that rely on default settings or are specifically written for MyISAM.

(ii) MySQL index type

MySQL supports the creation of multiple types of indexes, such as primary key, unique key, not unique non-primary index, in all relational database tables. In addition MySQL supports both plain text and spatial index types.

MySQL's built-in storage engine has different implementations for various indexing techniques, including: B-tree, B + Tree, R-Tree, and hash type.

Index data structure theory:

1.b-Tree

There are two types of nodes in a B-tree: Index nodes and leaf nodes. Leaf nodes are used to store data, and the index node is used to tell the user the order of data stored in the leaf node and to help the user find the appropriate data.

B-Tree search, starting from the root node, the node within the keyword ordered binary search, if the hit is finished, otherwise enter the query keyword belongs to the son node of the range, repeat. Until the corresponding son pointer is empty, or is already a leaf node.

B-Tree is a multi-path search tree:

(1). Define any non-leaf node with a maximum of M sons, and m>2;

(2). The number of sons of the root node is [2,m];

(3). The number of sons of non-leaf nodes other than the root node is [m/2,m];

(4). Each node holds at least m/2-1 (rounding) and up to M-1 keywords;

(5). Number of key words for non-leaf nodes = number of pointers pointing to son nodes-1;

(6). Non-leaf node keyword:k[i]<k[i+1];

(7). Pointer to a non-leaf node: p[1],p[2],,p[m]; where P[1] is a sub-tree with a keyword smaller than k[1], and p[m] a subtree that points to a keyword greater than k[m-1];

(8). All leaf nodes are located on the same layer;

2.b+ Tree

B + Tree data structure is an enhanced version of B-Tree implementation. Although the B + tree supports all the features of the B-tree index, the most significant difference between them is that the underlying data in the B + tree is sorted according to the index column mentioned. The B + Tree also optimizes scan performance through additional references between leaf nodes.

B + Search is different, the difference is that the second + tree is only hit by the leaf node (b-tree can be hit on a non-leaf node), and its performance is equivalent to the keyword complete one-time binary search.

Features of B + trees:

(1) All keywords appear in the linked list of leaf nodes, and leaf nodes correspond to the data layer where data is stored.

(2) It is not possible to hit on a non-leaf node.

(3) The non-leaf node is equivalent to the index of the leaf node, and the leaf node corresponds to the data layer.

3. Hashing

A hash table data structure is a very simple concept that applies an algorithm to a given value to return a unique pointer or location in the underlying data storage system. The advantage of a hash table is that it always finds the location of the rows that need to be read in linear time complexity, rather than a B-tree that needs to be positioned across a multilayer node.

4. Communication R-Tree

The R-Tree data structure supports the management of geometry data based on data types. Currently only MyISAM uses R-tree implementations to support spatial indexes, and there are many limitations to using spatial indexes, such as supporting only unique NOT NULL columns.

5. Full text

The full text structure is also a basic data structure used by MySQL. This data structure is currently supported only by the MyISAM storage engine in the current version of MySQL. Version 5.6 will include full-text functionality in the InnoDB storage engine. Full-text indexing has little practical value in large systems because there are many specialized file retrieval products in large-scale systems. So there is no introduction.

MySQL implementation

After some understanding of the basic concepts of data structures such as B-tree, + + trees, and hashing, we can begin to discuss how MySQL implements different algorithms through the storage engine that supports them. At the same time, each implementation has a different impact on disk and memory usage, which is an important consideration in large database systems.

1.MyISAM B-Tree

The MyISAM storage engine uses B-tree data structures to implement a master code index, a unique index, and a non-primary code index. In the MyISAM implementation data Catalog and database schema subdirectories, users can find the. myd and. myi files that correspond to each MySQL table. The index information defined on the database table is stored in the Myi file, which has a block size of 1024 bytes. This size can be assigned by myisam-block-size system variables.

$  ls-1h/var/lib/mysql/book/source_words. my*  -rw-rw----1 mysql mysql  9.2M 2015-05-07 19:08  source_words. MYD  -rw-rw----1 mysql mysql  7.8M 2015-05-07 19:08

The internal format of these file structures can be found in the MySQL free source code, or you can view the internal MySQL manual.

In MyISAM, the B-tree structure of the non-main code index stores the index value and a pointer to the main code data, which is a significant difference between MyISAM and InnoDB. This leads to different ways of working with the indexes of two storage engines.

MyISAM indexes are managed in a common cache of memory, and the size of the cache can be defined by key_buffer_size or other named key caches. This is the main consideration when setting the cache size based on the size of the statistics and planned table indexes.

2. InnoDB's B + Tree Cluster Master code

The InnoDB storage engine uses a B + tree in its main code index (also known as the Clustered Master code), which organizes all the data with the corresponding main code and adds additional forward and backward pointers on the leaf node layer, which makes it easier to scan the range.

At the file system level, all INNODB data and index information is managed by default in the Public InnoDB table space, otherwise the administrator specifies the file path by innodb_data_file_path this variable. This is a file called Ibdatal.

Because the InnoDB uses the clustered master code to store the data, the size of the disk space occupied by the underlying information depends largely on the fill factor of the page. For sequential master codes, InnoDB uses 15/16 of the 16K page as the fill factor. For master codes that are not ordered, InnoDB assigns 50% as the fill factor for each page by default when inserting initial data.

In the implementation of the index of the B + tree on the leaf node is the data itself, the key key, if the general index, the data will point to the corresponding primary index. A B + tree with sequential access pointers is formed by adding a pointer to the adjacent leaf node on each leaf node of the B + tree. The goal is to improve the performance of the interval access.

3.InnoDB B-Tree non-master code

The non-master code index in InnoDB uses B-tree data structure, but the implementation of B-tree structure in InnoDB is not the same as in MyISAM. In InnoDB, the actual value of the main code is stored in the non-primary code index. In MyISAM, a data pointer that contains a master code value is stored in a non-main code index. This is important. First, when you define a large master code, the InnoDB index may be larger, and the size difference between indexes can become large as the number of non-primary code indexes increases. Another difference is that a non-primary index can currently contain a value for the primary key, and may not be part of the index.

4. Memory Hash Index

In the default MySQL engine index, only the memory engine supports the hash data structure, the strength of the hash structure can be expressed as the simplicity of the direct key lookup, and the similarity pattern matching query of the hash index is slower than the direct query. You can also specify a B-tree index implementation for the memory engine.

5. Memory B-Tree index

For large memory tables, index-range searches using hash indexes are inefficient, and B-tree indexes are indeed faster than using the default hash index when executing a direct key query. Based on the different depths of the B-tree, the B-tree index may indeed be faster than the hashing algorithm in individual operations.

6.InnoDB Internal Hash Index

The InnoDB storage Engine stores the main code in the clustered B + Tree index: But inside the InnoDB, the hash table in memory is used to make the main code query more efficient. This mechanism is managed by the InnoDB storage engine, and the user can only choose whether to enable this unique configuration option through the Innodb_adaptive_hash_index configuration item.

Learn more about MySQL's index

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More