Data structure and algorithm principles behind MySQL index (on)

Source: Internet
Author: User
Tags mysql index

This article takes the MySQL database as the research object, discusses some topics related to the database index. In particular, MySQL there are many storage engines supported, and the various storage engines support the indexes differently, so the MySQL database supports multiple index types such as BTree indexes, hash indexes, full-text indexes, and so on. In order to avoid confusion, this article will focus only on the BTree index, as this is the index that is used primarily to deal with the usual use MySQL , and the hash index and the full-text index are not discussed in this article.

The main content of the article is divided into three parts.

The first part mainly discusses MySQL the mathematical basis of database index from the data structure and algorithm theory level.

The second part discusses topics such as clustered indexes, nonclustered indexes, and overlay indexes, in conjunction with the architecture of the MyISAM and INNODB data storage engines indexed in the MySQL database.

The third part discusses the strategy of high performance using indexes in MySQL based on the theoretical basis above.

The nature of data structure and algorithm base index

MySQLThe official definition of an index is: an index ( Index ) is a data structure that helps MySQL to obtain data efficiently . By extracting the skeleton of a sentence, you can get the essence of the index: The index is the data structure.

We know that database query is one of the most important functions of database. We all want to query the data as fast as possible, so the designers of the database system are optimized from the point of view of the query algorithm . The most basic query algorithm, of course, is sequential lookup ( linear search ), the complexity O(n) of the algorithm in the large amount of data is obviously bad, fortunately, the development of computer science provides a lot of better search algorithms, such as binary search ( binary search ), Binary tree lookup ( binary tree search ) and so on. If you analyze it a little bit, you will find that each finding algorithm can only be applied to a particular data structure, such as a binary lookup requires an orderly retrieval of data, while a binary tree lookup can only be applied to a binary search tree, but the structure of the data itself cannot fully satisfy various data structures (for example, it is theoretically impossible to organize both columns sequentially), so, outside of the data, the database system also maintains a data structure that satisfies a particular lookup algorithm that references (points to) data in some way , so that it can Implement advanced Find algorithms on these data structures . This data structure is the index .

See an example:

Figure 1 shows a possible way to index. On the left is the data table, a total of two columns seven records, the leftmost is the physical address of the data record (note that logically adjacent records on disk is not necessarily physically adjacent). For faster Col2 lookups, you can maintain a two-fork lookup tree on the right, with each node containing index key values and a pointer to the physical address of the corresponding data record, so that you can use the binary lookup O(log2n) Data is obtained in the complexity of the

While this is a real-world index, the actual database system is almost never implemented using a two-fork search tree or its evolutionary red-black tree (red-black tree) , as explained below.

B-tree and B+tree

At present, most of the database systems and file systems are used B-Tree or their variants B+Tree as the index structure, in the next section of this article will be combined with the principle of memory and the principle of computer access to discuss why B-Tree and B+Tree is so widely used in the index, this section is simply from the data structure angle to describe them.

B-tree

To describe B-Tree , first define a data record as a two-tuple [key, data] , key for the record of the key value, for different data records, key is not the same; records the data data except for the data key . B-Treea data structure that satisfies the following conditions:

  1. dis a positive integer greater than 1, called B-Tree the degree.

  2. his a positive integer, called B-Tree the height.

  3. Each non-leaf node consists of n-1 a key and n a pointer, wherein d<=n<=2d .

  4. Each leaf node contains at least one key and two pointers, with a maximum 2d-1 key of one and a pointer, and a 2d pointer to a leaf node null .

  5. All leaf nodes have the same depth and are equal to the tree height h .

  6. keyand pointers are spaced, and the nodes are pointers at each end.

  7. The key left-to-right non-descending arrangement of a node.

  8. All nodes make up the tree structure.

  9. Each pointer is either null , or points to, a different node.

  10. If a pointer is at the far left of the node node and is not null , it points to all of the nodes that are key less than v(key1) , which v(key1) is node the first key value.

  11. If a pointer is at the far right of the node node and is not null , it points to all of the nodes that are key greater than the v(keym) v(keym) node last key value.

  12. If a pointer node is adjacent to the left or right of the node key keyi and is keyi+1 and NOT NULL, then all of its points to the node are less than and key v(keyi+1) greater than v(keyi) .

Figure 2 is one d=2 of the B-Tree following:

Because of the properties of B-tree , the algorithm for retrieving data in b-tree by key is straightforward: First binary lookup from the root node, and if found, returns the corresponding node's Data , otherwise the node pointing to the corresponding interval is recursively searched until a node is found or a null pointer is found, which finds a success, and the latter finds a failure. The pseudo-code for the lookup algorithm on b-tree is as follows:

btree_search (node, key) {if  (node = = null ) return  null ; foreach (Node.key) {if  (node.key[i] = = key) return
      Node.data[i]; if  (Node.key[i] > key) return     Btree_search (Point[i]->  node); } return  btree_search (point[i+1 ]->  node);} data = Btree_search (root, My_key);  

About a B-Tree series of interesting properties, such as a degree d B-Tree , set its index, the N key upper limit of its tree height h is logd((N+1)/2) , retrieve one key , its search node number of the progressive complexity of O(logdN) . As can be seen from this point, B-Tree it is a very efficient index data structure.

In addition, since inserting deletes new data records will break B-Tree the nature, so when inserting delete, need to do a split, merge, transfer and so on to maintain the nature of the tree, B-Tree This article does not intend to fully discuss B-Tree these content, because there is already a lot of information in detail B-Tree The mathematical nature and the insertion and deletion algorithm, interested friends can find the corresponding information in the reference column at the end of this article to read.

B+tree

B-TreeThere are many variants, the most common of which are B+Tree , for example, MySQL B+Tree implementing their index structure for general use.

B-Treecompared with the B+Tree following, there are different points:

    1. The maximum pointer for each node is 2d instead of 2d+1 .

    2. The inner node is not stored data , only stored key ; the leaf node does not store pointers.

Figure 3 is a simple B+Tree schematic.

Because not all nodes have the same domain, the middle B+Tree and inner nodes are generally of different sizes. This differs from the B-Tree fact that, although the B-Tree number of different nodes stored in key and pointers may be inconsistent, the domain and upper bound of each node are consistent, so in the implementation of B-Tree each node is often requested the same size of space.

In general B+Tree , B-Tree more appropriate than the implementation of the external storage index structure, the specific reason and the principle of external memory and computer access principle, will be discussed below.

B+tree with sequential access pointers

Generally, the structure used in the database system or file system B+Tree is optimized on the classical B+Tree basis, and the sequential access pointers are added.

As shown in 4, B+Tree a pointer to an adjacent leaf node is added to each leaf node, and a sequential access B+Tree pointer is formed. The purpose of this optimization is to improve the performance of the interval access , in Example 4, if you want to query key for 18 49 all data records from to, when 18 is found, simply follow the node and the pointer in order to traverse the whole data node can be accessed once, The efficiency of interval query is greatly mentioned .

This section describes B-Tree and B+Tree makes a brief introduction, the next section, combined with the memory access principle, explains why the B+Tree database system is currently the preferred data structure for indexing.

Why use B-tree (B+tree)

As mentioned above, data structures such as red and black trees can also be used for indexing, but file systems and database systems are generally used B-/+Tree as index structures, and this section will be B-/+Tree based on the theoretical basis of knowledge discussion of computer composition principles.

In General, the index itself is large and cannot be stored in memory, so the index is often stored as an index file on the disk . In this way, the index lookup process will generate disk I/O consumption, relative to memory access, I/O the consumption of access to a few orders of magnitude, so the evaluation of a data structure as an index is the most important indicator of the number of disk operations in the process I/O of incremental complexity . In other words, the structural organization of the index minimizes the I/O number of disk accesses during the lookup process . The following first describes the memory and disk access principles, and then combined with these principles to analyze the B-/+Tree efficiency as an index.

Main Memory Access principle

At present, the main memory of the computer is basically random read-write memory ( RAM ), modern RAM structure and access principle is more complex, here the paper gives up the specific differences, abstract a very simple access model to illustrate RAM the working principle.

From an abstract point of view, main memory is a series of storage units composed of a matrix, each storage unit stores a fixed size of data. Each storage unit has a unique address, modern main memory of the site rules are more complex, here it simplifies to a two-dimensional address: through a row address and a column address can be uniquely positioned to a storage unit. Figure 5 shows a main memory model of 4 x 4.

The main memory access process is as follows:

When the system needs to read main memory, the address signal is put to address bus upload to main memory, main memory read the address signal, parse the signal and locate the designated storage unit, and then put this storage unit data on the bus for other parts to read.

The process of writing main memory is similar, the system will write the unit address and data on the address bus and bus, respectively, main memory reads two bus contents, does the corresponding write operation.

As can be seen here, the main memory access time is only linear with the number of accesses, because there is no mechanical operation, two access to the data "distance" will not have any impact on time, for example, the first fetch A0 A1 and take the time to fetch A0 D3 the same amount.

Disk Access principle

As mentioned earlier, indexes are typically stored as files on disk, and index retrieval requires disk I/O operations. Unlike main memory, the disk I/O has a mechanical motion, so I/O the disk's time consumption is enormous.

Figure 6 is the overall structure of the disk.

A disk consists of a circular disc of the same size and the same axis, and the disk can be rotated (each disk must be rotated synchronously). There is a head bracket on one side of the disk, and the head bracket secures a set of heads that each head is responsible for accessing the contents of a disk. The head cannot be rotated, but it can be moved in the direction of the disk radius (actually oblique-tangential motion), and each head must also be coaxial at the same moment, that is, looking down from the top, all the heads are overlapping at any time (although there is already a multi-head independent technique, which is not limited by this).

Figure 7 is the disk structure.

The platters are divided into a series of concentric rings centered on the center of the disc, each concentric ring is called a track , and all tracks with the same radius form a cylindrical surface . Tracks are divided into small segments along the radius lines, each of which is called a sector , and each sector is the smallest storage unit of the disk . For the sake of simplicity, let's assume that the disk has only one platter and one head.

When data needs to be read from disk, the system will pass the logical address of the data to the disk, and the control circuit of the disk will translate the logical address into the physical address according to the addressing logic, that is, determine which track and which sector the data is to be read. In order to read the data in this sector, the head needs to be placed above this sector, in order to achieve this, the head needs to move to the corresponding track, this process is called seek, the time spent is called seek time , and then the disk rotation of the target sector is rotated to the head, the process of the time spent is called Rotation time .

Principle of locality and disk pre-reading

Due to the characteristics of the storage media, the disk itself is much slower than main memory, coupled with mechanical movement, disk access speed is often one of the hundreds of main memory, so in order to improve efficiency, to minimize the disk I/O . To do this, the disk is often not read strictly on-demand, but is read-ahead every time, even if only one byte is required, and the disk starts from this location, sequentially reading a certain length of data into memory. The rationale for this is the well-known local principle in computer science:

When a data is used, the data around it is usually used immediately.

The data that is required during the program run is usually relatively centralized.

Due to the high efficiency of disk sequential reads (no seek time required, and minimal rotational time), pre-reading can improve efficiency for programs with local features I/O .

The length of the read-ahead is generally the page integer multiple of the page (). Page is the logical block of Computer Management memory, hardware and operating system tend to divide main memory and disk storage area into contiguous size equal blocks, each storage block is called a page (in many operating systems, the size of the page is usually 4k ), main memory and disk in the page to exchange data. When the program to read the data is not in main memory, will trigger a page fault, the system will send a read signal to the disk, the disk will find the starting position of the data and sequentially read one or several pages back into memory, and then return unexpectedly, the program continues to run.

Performance analysis of B-/+tree indexes

Finally, the performance of the index can be analyzed B-/+Tree .

As mentioned above, it I/O is common to use disk count to evaluate index structure. First B-Tree , from the analysis, according to B-Tree the definition, the retrieval of the most need to access h nodes at a time. The designer of the database system skillfully exploits the principle of disk pre-reading, setting the size of a node equal to one page , so that each node I/O can be fully loaded only once . In order to achieve this, the following techniques are required in the actual implementation B-Tree :

Each time you create a new node, request a page space directly, so that a node is physically stored in a page, in addition to the computer storage allocation is page-aligned, the implementation of a node only once I/O .

B-TreeThe maximum number of h-1 times I/O A retrieval is required (root node resident memory), and the progressive complexity is O(h)=O(logdN) . In general practical applications, the out-of-size d numbers are very large, usually over 100 , and therefore h very small (usually not more than 3).

In summary, the B-Tree efficiency is very high as the index structure.

And the red-black tree structure, h obviously much deeper. Because the logically close node (parent-child) may be far away physically, it is not possible to take advantage of the locality, so the progressive complexity of the red-black tree is I/O also O(h) , the efficiency is obviously B-Tree much worse.

As mentioned above, it is B+Tree more suitable for external memory index, because it is related to the internal node d . As you can see from the above analysis, the larger the index, the better the d performance, and the upper limit depends on key data the size within the node:

dmax = floor(pagesize / (keysize + datasize + pointsize))   (pagesize – dmax >= pointsize)

Or

dmax = floor(pagesize / (keysize + datasize + pointsize)) – 1   (pagesize – dmax < pointsize)

floorRepresents a downward rounding. because the B+Tree inner node is stripped of the data domain, it can have a greater degree of granularity and better performance .

This chapter discusses the problem of data structure and algorithm related to indexes from the theoretical point of view, the next chapter will discuss how to implement the index B+Tree MySQL in detail, and introduce the combination MyISAM and InnDB storage engine into two different index implementations of nonclustered index and clustered index.

MySQL Index implementation

In MySQL , the index belongs to the concept of storage engine level, different storage engine implementation of the index is different, this article mainly discusses MyISAM and InnoDB two storage engine index implementation way.

MyISAM Index Implementation

MyISAMThe engine B+Tree is used as an index structure, and the domain of the leaf node data holds the address of the data record . is MyISAM the schematic diagram of the index:

There are three columns in the table, assuming that we Col1 are the primary key, then Figure 8 is the MyISAM Main index () of a table Primary key . You can see MyISAM that the index file only holds the address of the data record . In MyISAM , the primary and secondary indexes ( Secondary key ) do not differ structurally, except that the primary index requirements key are unique and secondary indexes key can be duplicated. If we Col2 create a secondary index on, the structure of this index is as follows:

The same is the B+Tree address where the data domain holds the data record. Therefore, the MyISAM algorithm in index retrieval is to search the index according to the B+Tree search algorithm first, if the specified Key existence, then take out the value of its data domain, and then data read the corresponding data record with the value of the domain as the address .

MyISAMThe index is also called "nonclustered", and the reason for this is to differentiate it from InnoDB the clustered index.

InnoDB Index Implementation

Although it InnoDB is also used B+Tree as an index structure, the implementation approach is MyISAM completely different.

The first major difference is that InnoDB the data file itself is the index file . As you know above, the MyISAM index file and the data file are detached, and the index file only holds the address of the data record . in InnoDB this case, the table data file itself is B+Tree an indexed structure of the Organization, and the tree's leaf node data field holds the complete data record . This index key is the primary key of the data table, so the InnoDB table data file itself is the primary index.

Figure 10 is InnoDB the primary index (also the data file), and you can see that the leaf node contains the complete data record. This index is called a clustered index . Because InnoDB the data file itself is clustered by the primary key, the InnoDB table must have a primary key ( MyISAM can not), and if it is not explicitly specified, the MySQL system automatically selects a column that uniquely identifies the data record as the primary key, and if no such column exists, the MySQL Automatically InnoDB generates an implicit field for the table as the primary key with a length of 6 bytes and a long shape for the type .

MyISAM The second secondary index field, which differs from the index, InnoDB stores the value of the data corresponding record primary key instead of the address . In other words, InnoDB all secondary indexes refer to the primary key as the data domain. For example, Figure 11 is Col3 a secondary index defined on:

Here the code of the English characters ASCII as the comparison criterion. Clustered index This implementation makes search by primary key efficient, but secondary index search needs to retrieve two times index : first retrieves the secondary index to obtain the primary key, and then retrieves the record with the primary key to the primary index.

Understanding how index implementations of different storage engines can be useful for proper use and optimization of indexes, such as knowing InnoDB the index implementation, makes it easy to understand why it is not recommended to use a long field as the primary key , because all secondary indexes reference the primary index. A long primary index can cause the secondary index to become too large . For example, it is not a good idea to use a non-monotonic field as the primary key, InnoDB because the InnoDB data file itself is a single B+Tree , non-monotonic primary key that causes the data file to be frequently split in order to maintain characteristics when inserting new records B+Tree , which is very inefficient, Using the self-increment field as the primary key is a good choice.

The next article will discuss these index-related optimization strategies in detail.

Data structure and algorithm principles behind MySQL index (on)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.