Data Structure and algorithm principles behind MySQL indexes (I)

Last Update:2015-04-27 Source: Internet

Author: User

Tags mysql index

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Data Structure and algorithm principles behind MySQL indexes (I)

This article takes MySQL database as the research object and discusses some topics related to database indexes. In particular, MySQL supports many storage engines, and various storage engines also support different indexes. Therefore, MySQL database supports multiple index types, such as BTree indexes and hash indexes, full-text index. To avoid confusion, this article will only focus on the B-tree index, because this is an index that is commonly used in MySQL. This article will not discuss hash indexes and full-text indexes for the moment.

The main content of this article is divided into three parts.

The first part mainly discusses the mathematical basis of MySQL database indexes in terms of data structure and algorithm theory.

The second part discusses topics such as clustered index, non-clustered index, and covered index based on the index architecture of MyISAM and InnoDB data storage engines in the MySQL database.

The third part discusses the high-performance index Usage Policy in MySQL based on the above theoretical basis.

The essence of basic data structure and algorithm Indexes

MySQL officially defines indexes:Index is a data structure that helps MySQL efficiently obtain data.. Extract the sentence trunk to get the essence of the index: Index is the data structure.

We know that database query is one of the most important functions of the database. We all want to query data as quickly as possible, so the designers of the database system willQuery Algorithm Optimization. Of course, the most basic query algorithm is linear search. This algorithm with the complexity of O (n) is obviously bad when the data volume is large, fortunately, the development of computer science provides many better search algorithms, such as binary search and binary tree search. If you analyze it a little, you will find that,Each search algorithm can only be applied to a specific data structure.For example, binary search requires that the retrieved data be ordered, while binary search can only be applied to binary search trees,The organization structure of the data itself cannot fully satisfy various data structures(For example, theoretically it is impossible to organize both columns in order at the same time). Therefore,In addition to data, the database system also maintains data structures that meet specific search algorithms.,These data structures reference (point to) data in some wayIn this way, you canImplement advanced search algorithms on these data structures.This data structure is an index..

Let's look at an example:

Figure 1 shows a possible indexing method. On the left is a data table with a total of seven records in two columns, and on the left is the physical address of the data records (note that logically adjacent records are not physically adjacent on the disk ). To accelerate Col2 search, you can maintain a binary search tree shown on the right. Each node containsIndex key valueAnd a physical address pointing to the corresponding data recordPointerIn this way, you can use binary search to obtain the corresponding data in the complexity of O (log2n.

Although this is a real index, the actual database system does not use a binary tree or its evolved red-black tree. The reason is described below.

B-Tree and B + Tree

Currently, most database systems and file systems use B-Tree or its variant B + Tree as the index structure, in the next section of this article, we will discuss why B-Tree and B + Tree are widely used in Indexing Based on the memory principle and computer access principle, this section describes them from the perspective of data structure.

B-Tree

To describe B-Tree, first define a data record as a binary group [key, data]. The key is the key value of the record. Keys are different for different data records; data records data except the key. B-Tree is a data structure that meets the following conditions:

D is a positive integer greater than 1, called the degree of B-Tree.

H is a positive integer called the height of B-Tree.

Each non-leaf node consists of N-1 keys and n pointers, where d <= n <= 2d.

Each leaf node contains at least one key and two pointers, and at most two D-1 keys and two 2d pointers. The pointers of leaf nodes are null.

All leaf nodes have the same depth, which is equal to the height of the tree H.

Keys and pointers are separated from each other. The two ends of a node are pointers.

Keys in a node are arranged from left to right in non-descending order.

All nodes form a tree structure.

Each pointer is either null or pointing to another node.

If a pointer is left at the far left of the node and is not null, all keys pointing to the node are smaller than v (key1), where v (key1) is the value of the first key of the node.

If a pointer is at the far right of a node and is not null, all keys pointing to the node are greater than v (keym), where v (keym) is the value of the last key of the node.

If the adjacent keys of a pointer between the left and right nodes are keyi and keyi + 1 and are not null, all keys pointing to the node are smaller than v (keyi + 1) and greater than v (keyi ).

Figure 2 is a B-Tree with d = 2:

Due to the characteristics of B-Tree, the algorithm for retrieving data by key in B-Tree is very intuitive: first, perform a binary search from the root node. If yes, the data of the corresponding node is returned, otherwise, recursive search is performed on the node pointed to by the pointer in the corresponding interval until the node is found or the null pointer is found. The former is found successfully, and the latter fails to be searched. The pseudo code for searching algorithms on B-Tree is as follows:

BTree_Search(node, key){    if(node == null) return null;    foreach(node.key)    {        if(node.key[i] == key) return node.data[i];        if(node.key[i] > key) return BTree_Search(point[i]->node);    }    return BTree_Search(point[i+1]->node);}data = BTree_Search(root, my_key);

B-Tree has a series of interesting properties. For example, if a B-Tree with a degree of d has N keys indexed, the upper limit of its H is logd (N + 1) /2), retrieves a key, and the progressive complexity of finding the number of nodes is O (logdN ). From this point, we can see that B-Tree is a very efficient index data structure.

In addition, insertion and deletion of new data records will damage the nature of B-Tree. Therefore, when inserting and deleting data records, we need to split, merge, transfer, and other operations on the Tree to maintain the B-Tree nature. This article does not intend to fully discuss the B-Tree content, because many documents have already elaborated on the mathematical nature of B-Tree and the insertion and deletion algorithm, interested friends can find relevant materials in the reference column at the end of this Article for reading.

B+Tree

B-Tree has many variants, the most common of which is B + Tree. For example, MySQL generally uses B + Tree to implement its index structure.

Compared with B-Tree, B + Tree has the following differences:

The maximum pointer value for each node is 2d instead of 2d + 1.

Internal nodes do not store data, but only store keys. Leaf nodes do not store pointers.

Figure 3 shows a simple B + Tree diagram.

Because not all nodes have the same region, the size of the inner node and the inner node in the B + Tree is generally different. This is different from B-Tree. Although the number of keys and pointers stored by different nodes in B-Tree may be different, the domain and upper limit of each node are the same, therefore, in implementation, B-Tree usually applies for the same size of space for each node.

Generally, B + Tree is more suitable for implementing the external storage index structure than B-Tree. The specific reason is related to the principle of external storage and the principle of computer access, which will be discussed below.

B + Tree with sequential access pointers

Generally, the B + Tree structure used in the database system or file system is optimized on the basis of the classic B + Tree, and sequential access pointers are added.

4. Add a pointer to the adjacent leaf node in each leaf node of B + Tree to formB + Tree with sequential access pointers. The purpose of this optimization isImprove the Performance of range accessIn Example 4, if you want to query all data records whose key is from 18 to 49,After 18 is found, you only need to traverse the nodes and pointers to access all the data nodes at a time. This greatly mentions the interval query efficiency..

This section briefly introduces B-Tree and B + Tree, the next section describes why B + Tree is the preferred data structure for database system index implementation based on the principle of memory access.

Why use B-Tree (B + Tree)

As mentioned above, data structures such as red and black trees can also be used for indexing. However, B-/+ Tree is widely used as the index structure in file systems and database systems, this section discusses B-/+ Tree as the theoretical basis of Indexing Based on the knowledge of computer composition principles.

In general, the index itself is also very large, it is impossible to store all in the memory, so the index is often stored on the disk in the form of index files. In this way, disk I/O consumption is required during index search. Compared with memory access, I/O access consumes several orders of magnitude.The most important indicator for evaluating a data structure as an index is the progressive complexity of the number of disk I/O operations during the search process.. In other words,The structure of the index should be organized to minimize the number of disk I/O accesses during the search process. Next we will first introduce the memory and disk access principles, and then combine these principles to analyze the efficiency of B-/+ Tree as an index.

Principle of primary storage access

Currently, the main memory used by computers is generally random read/write memory (RAM). The structure and access principle of modern RAM are complicated. Here, the specific differences are given, abstract A very simple access model to illustrate how RAM works.

From the abstract point of view, the primary storage is a matrix composed of a series of storage units, each of which stores a fixed size of data. Each storage unit has a unique address, and the addressing rules of modern primary storage are complicated. Here, it is simplified into a two-dimensional address: a row address and a column address can be used to uniquely locate a storage unit. Figure 5 shows a 4x4 primary storage model.

The access process of the primary storage is as follows:

When the system needs to read the primary storage, the address signal is put into the address bus and uploaded to the primary storage. After reading the address signal, the primary storage parses the signal and locates the specified storage unit, the stored unit data is then placed on the Data Bus for reading by other components.

The process of writing the primary storage is similar. The system will write the Unit Address and data respectively on the address bus and data bus. The primary storage will read the content of the two bus and write the data accordingly.

It can be seen that the time of primary storage access is only linearly related to the number of accesses, because there is no mechanical operation, the "distance" of the data accessed twice will not affect the time, for example, it takes the same time To get A0, then A1, and A0, and then D3.

Disk access principle

As mentioned above, indexes are generally stored on disks as files, and index retrieval requires disk I/O operations. Unlike primary storage, disk I/O consumes a lot of time due to mechanical movement.

Figure 6 shows the overall structure of the disk.

A disk is composed of circular disks of the same size and coaxial. the disk can be rotated (each disk must be rotated synchronously ). There is a head bracket on one side of the disk, and the head bracket is fixed with a set of heads, each of which is responsible for accessing the content of a disk. The head cannot be rotated, but it can be moved along the radius of the disk (in fact, it is a diagonal movement). Each head must be coaxial at the same time, that is, from top to bottom, all heads are overlapped at any time (however, multiple heads are already available independently, but not limited to this ).

Figure 7 shows the disk structure.

The disc is divided into a series of concentric rings with the center of the disc. Each concentric ring is calledTrackAll the tracks with the same radius constitute oneCylindrical. The track is divided into small segments along the radius line. Each segment is calledSlice,Each sector is the minimum storage unit of the disk.. For simplicity, we assume that the disk has only one disk and one head.

When data needs to be read from the disk, the system will pass the data Logical Address to the disk. the disk control circuit translates the logical address into a physical address according to the addressing logic, determine the track and sector of the data to be read. To read data from this sector, you need to place the head above this sector. To achieve this, the head needs to be moved and aligned to the corresponding track. This process is called track finding and the time it takes is calledSeek timeAnd then the disk rotates the target sector to the bottom of the head. the time consumed in this process is calledRotation time.

Local principle and disk pre-read

Because of the characteristics of the storage medium, the access to the disk itself is much slower than the primary storage, coupled with the cost of mechanical movement, the access speed of the disk is often one of the primary storage, so in order to improve efficiency, minimize disk I/O. To achieve this goal, the disk is usually not read strictly on demand, but preread every time. Even if only one byte is required, the disk starts from this location, read data of a certain length in sequence into the memory. This theory is based on the well-known local principle in Computer Science:

When a data is used, the data nearby it is usually used immediately.

The data required during the program running is usually concentrated.

Because sequential disk reading is highly efficient (with little rotation time required without seeking time), preread can improve I/O efficiency for local programs.

The preread length is generally an integer multiple of the page. Pages are logical blocks for computer memory management. Hardware and operating systems often divide primary and disk storage areas into contiguous blocks of the same size, each block is called a page (in many operating systems, the page size is usually 4 k). The primary storage and disk exchange data in pages. When the data to be read by the program is not in the primary storage, a page missing exception is triggered, and the system sends a disk reading signal to the disk, the disk finds the starting position of the data and reads one or more pages consecutively into the memory. If an exception is returned, the program continues to run.

B-/+ Tree index Performance Analysis

At last, we can analyze the performance of B-/+ Tree indexes.

As mentioned above, the index structure is evaluated by the number of disk I/O operations. From B-Tree analysis, according to the definition of B-Tree, we can see that a maximum of h nodes can be accessed at a time. The database system designer cleverly utilizes the disk pre-read principle,Set the size of a node to equal to one page.,In this way, each node can be fully loaded only once I/O.. To achieve this goal, you also need to use the following techniques to implement B-Tree:

Each time you create a node, you can directly apply for a page space to ensure that a node is physically stored on a page. In addition, the computer storage allocation is page-aligned, A node only needs one I/O operation.

A B-Tree retrieval requires a maximum of H-1 I/O (root node resident memory), and the progressive complexity is O (h) = O (logdN ). Generally, in practice, the output degree d is a very large number, usually greater than 100, so h is very small (usually no more than 3 ).

In conclusion, the efficiency of using B-Tree as the index structure is very high.

The structure of the red and black trees is much deeper than h. Because logically close nodes (Parent and Child) may be far physically unable to use locality, the I/O complexity of the red and black trees is O (h ), the efficiency is much lower than that of B-Tree.

As mentioned above, B + Tree is more suitable for external storage indexes because it is related to the degree d of inner nodes. From the above analysis, we can see that the larger the value of d, the better the index performance, and the upper limit of the degree of exit depends on the size of the key and data in the node:

dmax = floor(pagesize / (keysize + datasize + pointsize)) (pagesize – dmax >= pointsize)

Or

dmax = floor(pagesize / (keysize + datasize + pointsize)) – 1 (pagesize – dmax < pointsize)

Floor indicates downgrading.Because the nodes in B + Tree remove the data field, they can have a higher degree of output and better performance..

This chapter discusses the index-related data structures and algorithms theoretically. The next chapter will discuss how B + Tree is implemented as an index in MySQL, at the same time, we will introduce two different index implementation modes, non-clustered index and clustered index, in combination with MyISAM and InnDB storage engine.

MySQL index implementation

In MySQL, indexes belong to the concept of storage engine level. Different storage engines implement indexes differently. This article mainly discusses the index implementation methods of MyISAM and InnoDB Storage engines.

MyISAM index implementation

The MyISAM engine uses B + Tree as the index structure. The data domain of the leaf node isData Record address. Is the principle of MyISAM index:

There are three columns in the table. If we use Col1 as the Primary key, Figure 8 shows the Primary index (Primary key) of the MyISAM table. We can see thatThe index file of MyISAM only stores the data record address. In MyISAM, the primary index and Secondary index (Secondary key) have no difference in structure, but the primary index requires that the key is unique, and the Secondary index key can be repeated. If we create a secondary index on Col2, the index structure is shown in:

It is also a B + Tree that stores data records in the data field. Therefore,The index search algorithm in MyISAM first searches for indexes based on the B + Tree search algorithm. If the specified Key exists, the value of its data domain is taken out, and the address is the value of the data domain, read data records.

The index method of MyISAM is also called "non-clustered". The reason for this is to distinguish it from the clustered index of InnoDB.

InnoDB Index implementation

Although InnoDB uses B + Tree as the index structure, the implementation method is different from that of MyISAM.

The first major difference is thatInnoDB data files are index files.. As mentioned above, MyISAM index files and data files are separated,The index file only stores the address of the data record. WhileIn InnoDB, the table data file itself is an index structure organized by B + Tree. The leaf node data field of this Tree stores the complete data records.. The key of this index is the primary key of the data table, so the InnoDB table data file itself is the primary index.

Figure 10 shows the InnoDB primary index (also a data file). The leaf node contains the complete data record. This index is calledClustered Index. Because the data files in InnoDB need to be clustered by the primary key, InnoDB requires that the table have a primary key (MyISAM may not). If it is not explicitly specified, mySQL automatically selects a column that uniquely identifies a data record as the primary key. If this column does not exist, MySQL automatically generates an implicit field for the InnoDB table as the primary key,This field is 6 bytes in length and its type is long integer..

The second difference from MyISAM index is that InnoDB's secondary index data domain stores the value of the primary key of the corresponding record instead of the address. In other words, all secondary indexes of InnoDB reference the primary key as the data domain. For example, Figure 11 shows a secondary index defined on Col3:

Here we use the ASCII code of English characters as a comparison criterion. Clustered index makes the search by primary key very efficient,Secondary index search requires two indexes: First retrieve the secondary index to obtain the primary key, and then use the primary key to retrieve the record in the primary index.

Understanding the index implementation methods of different storage engines is very helpful for correct use and optimization of indexes. For example, after knowing the index Implementation of InnoDB, it is easy to understand whyToo long fields are not recommended as primary keys.Because all secondary indexes reference the primary index,If the primary index is too long, the secondary index will become too large.. For example, it is not a good idea to use non-monotonous fields as the primary key in InnoDB, because the InnoDB data file itself is a B + Tree, non-monotonous primary keys will cause frequent split and adjustment of data files to maintain the features of B + Tree during the insertion of new records, which is very inefficient, using an auto-increment field as the primary key is a good choice.

The next article will discuss these index-Related Optimization Strategies in detail. Http://www.bkjia.com/database/201504/393710.html)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More