The principle of algorithm used by Java Advanced Programming--mysql

Source: Internet
Author: User
Tags data structures

The main content of the article is divided into two parts.

The first part mainly discusses the mathematical basis of MySQL database index from the data structure and algorithm theory level.

The second part discusses the topics such as clustered indexes, nonclustered indexes, and overlay indexes, in conjunction with the schema of indexes in the MyISAM and InnoDB data storage engines in MySQL database.

The principle of algorithm used by Java Advanced Programming--mysql
Data structure and Algorithm basics
The nature of the index

The official MySQL definition of an index is: index is the data structure that helps MySQL to get data efficiently. By extracting the skeleton of a sentence, you can get the essence of the index: The index is the data structure.

We know that database query is one of the most important functions of database. We all want to query the data as fast as possible, so the designers of the database system are optimized from the point of view of the query algorithm. The most basic query algorithm, of course, is sequential lookup (linear search), the complexity of the O (n) algorithm is obviously bad when the volume of data is large, fortunately, the development of computer science provides a lot of better search algorithms, such as binary search, Binary tree search (binary trees search), and so on. If you look at it a little bit, you will find that each lookup algorithm can only be applied to a particular data structure, such as a binary lookup requires an orderly retrieval of data, while a binary tree lookup can only be applied to a binary lookup tree, but the data itself cannot be fully organized to meet a variety of data structures (for example, It is theoretically impossible to organize both columns sequentially, so in addition to the data, the database system maintains a data structure that satisfies a particular lookup algorithm that references (points to) data in some way, so that an advanced find algorithm can be implemented on those data structures. This data structure is the index.

See an example:

The principle of algorithm used by Java Advanced Programming--mysql
Figure 1

Figure 1 shows a possible way to index. On the left is the data table, a total of two columns seven records, the leftmost is the physical address of the data record (note that logically adjacent records on disk is not necessarily physically adjacent). To speed up the search for Col2, you can maintain a two-fork lookup tree on the right, each containing the index key value and a pointer to the physical address of the corresponding data record, so that the binary lookup can be used to obtain the corresponding data in the complexity of O (log2n).

Although this is a true index, the actual database system is almost not implemented using a two-fork search tree or its evolutionary breed red-black tree (Red-black tree), for reasons that are described below.

The principle of algorithm used by Java Advanced Programming--mysql
B-tree and B+tree
At present, most of the database system and file system use B-tree or its variant B+tree as the index structure, in the next section of this article will combine the memory principle and the computer access principle to discuss why B-tree and B+tree are so widely used in index, this section first describes them from the data structure angle.


To describe the B-tree, first define a data record as a two-tuple [key, Data],key is the key value of the record, for different data records, the key is not the same; data records for data other than key. Then B-tree is a data structure that meets the following criteria:

    1. D is a positive integer greater than 1, called the degree of B-tree.

    2. h is a positive integer, called the height of the b-tree.

    3. Each non-leaf node consists of a n-1 key and n pointers, where d<=n<=2d.

    4. Each leaf node contains at least one key and two pointers, with a maximum of 2d-1 keys and 2d pointers, with null pointers for leaf nodes.

    5. All leaf nodes have the same depth, which equals the tree height H.

    6. The key and the pointer are spaced each other, and the nodes are pointers.

    7. Keys in one node are not descending from left to right.

    8. All nodes make up the tree structure.

    9. Each pointer is either null or points to a different node.

    10. If a pointer is on the leftmost node and is not NULL, then all keys that point to the node are less than V (key1), where V (key1) is the value of node's first key.

    11. If a pointer is on the rightmost node and is not NULL, all keys that point to the node are greater than V (Keym), where V (keym) is the value of the last key of node.

    12. If a pointer is keyi and keyi+1 and NOT NULL on the left and right sides of node nodes, then all keys to the node are less than V (keyi+1) and greater than V (keyi).

Figure 2 is a d=2 b-tree.

The principle of algorithm used by Java Advanced Programming--mysql
Figure 2

Because of the characteristics of B-tree, the algorithm of retrieving data by key in B-tree is very intuitive: first, binary lookup from the root node, if found to return the corresponding node of data, otherwise the corresponding interval pointer to the node to find recursively, until the node is found or a null pointer, the former lookup success , the latter lookup failed. The pseudo-code for the lookup algorithm on B-tree is as follows:

The principle of algorithm used by Java Advanced Programming--mysql
About B-tree has a series of interesting properties, such as a degree of D B-tree, set its index N key, the upper limit of its tree height h is logd ((n+1)/2), retrieve a key, its search node number of the progressive complexity of O (LOGDN). As can be seen from this point, B-tree is a very efficient index data structure.

In addition, since inserting deletes a new data record destroys the nature of the b-tree, it is necessary to do a split, merge, transfer, etc. to preserve the b-tree nature of the tree when inserting and deleting it, and this article does not intend to discuss the contents of B-tree completely. Because there is already a lot of information detailing the mathematical nature of b-tree and the insertion and deletion algorithm, interested friends can find the corresponding information in the reference column at the end of this article to read.


B-tree has many variants, the most common of which is b+tree, such as MySQL, which generally uses b+tree to implement its index structure.

Compared with B-tree, B+tree has the following differences:

    1. The pointer to each node is capped at 2d instead of 2d+1.

    2. The inner node does not store data, only the key, and the leaf node does not store the pointer.

Figure 3 is a simple b+tree schematic.

The principle of algorithm used by Java Advanced Programming--mysql
Figure 3

Since not all nodes have the same domain, the b+tree and inner nodes are generally of different sizes. This is different from B-tree, although the number of keys and pointers stored in B-tree may be inconsistent, but the domain and the upper bound of each node are consistent, so in the implementation b-tree often apply for the same size of space for each node.

In general, B+tree is more suitable than b-tree to achieve the external storage index structure, the specific reason and the principle of external memory and the principle of computer access, will be discussed below.

B+tree with sequential access pointers

Generally, the b+tree structure used in database system or file system is optimized on the basis of classical b+tree, and the sequential access pointers are added.

The principle of algorithm used by Java Advanced Programming--mysql
As shown in 4, a pointer to an adjacent leaf node is added to each leaf node of B+tree, and a b+tree with sequential access pointers is formed. The purpose of this optimization is to improve the performance of the interval access, example 4, if you want to query key from 18 to 49 of all data records, when the 18 is found, simply follow the node and the pointer in order to traverse to all data nodes, and greatly mentions the efficiency of the interval query.

This section provides a brief introduction to B-tree and B+tree, and the next section, combined with the memory access principle, explains why the current b+tree is the preferred data structure for the database system implementation index.

Why use B-tree (B+tree)

As mentioned above, data structures such as red and black trees can also be used to implement indexes, but file systems and database systems generally use B-/+tree as the index structure, and this section will discuss B-/+tree as the theoretical basis of index based on the knowledge of computer composition principles.

In general, the index itself is large and cannot be stored in memory, so the index is often stored as an index file on the disk. In this way, the index lookup process will generate disk I/O consumption, relative to memory access, I/O access to the consumption of a few orders of magnitude, so the evaluation of a data structure as an index is the most important indicator of the number of disk I/O operations in the process of incremental complexity. In other words, the structural organization of the index minimizes the number of disk I/O accesses during the lookup process. The following first describes the memory and disk access principles, and then combined with these principles to analyze the efficiency of B-/+tree as an index.

Main Memory Access principle

At present, the main memory used in the computer is random read-write memory (RAM), modern RAM structure and access principle is more complex, here the paper gives up the specific differences, abstract a very simple access model to illustrate how RAM works.

The principle of algorithm used by Java Advanced Programming--mysql
Figure 5

From an abstract point of view, main memory is a series of storage units composed of a matrix, each storage unit stores a fixed size of data. Each storage unit has a unique address, modern main memory of the site rules are more complex, here it simplifies to a two-dimensional address: through a row address and a column address can be uniquely positioned to a storage unit. Figure 5 shows a main memory model of 4 x 4.

The main memory access process is as follows:

When the system needs to read main memory, the address signal is put to address bus upload to main memory, main memory read the address signal, parse the signal and locate the designated storage unit, and then put this storage unit data on the bus for other parts to read.

The process of writing main memory is similar, the system will write the unit address and data on the address bus and bus, respectively, main memory reads two bus contents, does the corresponding write operation.

As can be seen here, the main memory access time is only linear with the number of accesses, because there is no mechanical operation, two times access to the data "distance" will not have any impact on time, for example, first take A0 and then take A1 and take A0 and then take D3 time consumption is the same.

Disk Access principle

As mentioned earlier, indexes are typically stored as files on disk, and index retrieval requires disk I/O operations. Unlike main memory, disk I/O is mechanically active, so the time spent on disk I/O is huge.

Figure 6 is the overall structure of the disk.

The principle of algorithm used by Java Advanced Programming--mysql
Figure 6

A disk consists of a circular disc of the same size and the same axis, and the disk can be rotated (each disk must be rotated synchronously). There is a head bracket on one side of the disk, and the head bracket secures a set of heads that each head is responsible for accessing the contents of a disk. The head cannot be rotated, but it can be moved in the direction of the disk radius (actually oblique-tangential motion), and each head must also be coaxial at the same moment, that is, looking down from the top, all the heads are overlapping at any time (although there is already a multi-head independent technique, which is not limited by this).

Figure 7 is the disk structure.

The principle of algorithm used by Java Advanced Programming--mysql
Figure 7

The platters are divided into a series of concentric rings centered on the center of the disc, each concentric ring is called a track, and all tracks with the same radius form a cylindrical surface. Tracks are divided into small segments along the radius lines, each of which is called a sector, and each sector is the smallest storage unit of the disk. For the sake of simplicity, let's assume that the disk has only one platter and one head.

When data needs to be read from disk, the system will pass the logical address of the data to the disk, and the control circuit of the disk will translate the logical address into the physical address according to the addressing logic, that is, determine which track and which sector the data is to be read. In order to read the data in this sector, the head needs to be placed above this sector, in order to achieve this, the head needs to move to the corresponding track, this process is called seek, the time spent is called seek time, and then the disk rotation of the target sector is rotated to the head, the process of time is called rotation time.

Principle of locality and disk pre-reading

Due to the characteristics of the storage media, the disk itself is much slower than main memory, coupled with mechanical movement, disk access speed is often one of the hundreds of of main memory, so in order to improve efficiency, to minimize disk I/O. To do this, the disk is often not read strictly on-demand, but is read-ahead every time, even if only one byte is required, and the disk starts from this location, sequentially reading a certain length of data into memory. The rationale for this is the well-known local principle in computer science:

When a data is used, the data around it is usually used immediately.

The data that is required during the program run is usually relatively centralized.

Due to the high efficiency of disk sequential reads (no seek time required and minimal rotational time), pre-reading can improve I/O efficiency for programs with locality.

The length of the read-ahead is generally the integer multiple of the page. Page is the logical block of Computer Management memory, hardware and operating system tend to divide main memory and disk storage area into contiguous size equal blocks, each storage block is called a page (in many operating systems, the page size is usually 4k), main memory and disk in the page to exchange data. When the program to read the data is not in main memory, will trigger a page fault, the system will send a read signal to the disk, the disk will find the starting position of the data and sequentially read one or several pages back into memory, and then return unexpectedly, the program continues to run.

Performance analysis of B-/+tree indexes

Here you can finally analyze the performance of the B-/+tree index.

As mentioned above, the index structure is generally evaluated using disk I/O times. First, from the B-tree analysis, according to the definition of b-tree, it is necessary to retrieve up to H nodes at a time. The designer of the database system skillfully exploits the principle of disk pre-reading, setting the size of a node equal to one page, so that each node can be fully loaded with only one I/O. To achieve this, the following techniques are required to implement B-tree in practice:

Each time you create a new node, request a page space directly, so that a node is physically stored in a page, and the computer storage allocation is page-aligned, the implementation of a node only one time I/O.

B-tree requires a maximum of h-1 I/O (root node resident memory) in a single retrieval, and a progressive complexity of O (h) =o (LOGDN). In general practice, the out-of-size D is a very large number, usually more than 100, so H is very small (usually not more than 3).

In summary, using B-tree as index structure efficiency is very high.

And the red-black tree structure, H is obviously much deeper. Because the logically close node (parent-child) may be far away physically, it is not possible to take advantage of locality, so the I/O asymptotic complexity of the red-black tree is also O (h), and the efficiency is significantly worse than B-tree.

As mentioned above, B+tree is more suitable for external memory index, because it is related to the internal node out d. From the above analysis, it can be seen that the better the performance of the larger index, and the upper limit depends on the size of the key and data within the node:

DMax = Floor (pagesize/(keysize + datasize + pointsize)) (Pagesize–dmax >= pointsize)


DMax = Floor (pagesize/(keysize + datasize + pointsize)) –1 (Pagesize–dmax < pointsize)

Floor indicates rounding down. Because the B+tree node removes the data domain, it can have a greater degree of granularity and better performance.

This chapter discusses data structures and algorithms related to indexes from a theoretical point of view, and the next chapter will discuss how B+tree is specifically implemented as an index in MySQL, while combining the MyISAM and INNDB storage engines to introduce two different types of index implementations for nonclustered and clustered indexes.

The principle of algorithm used by Java Advanced Programming--mysql

The principle of algorithm used by Java Advanced Programming--mysql

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.