Principle and optimization of database indexing

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article mainly comes from the mainstream articles on the Internet, just follow the personal understanding of a little integration, followed by reference links. I. Summary

This paper takes MySQL database as the research object and discusses some topics related to database indexing. In particular, MySQL supports many storage engines, and the various storage engines support the indexes differently, so the MySQL database supports multiple index types such as btree indexes, hash indexes, full-text indexes, and so on. To avoid confusion, this article will focus only on the Btree index, as this is the primary index for dealing with MySQL, and the hash index and the full-text index are not discussed in this article. second, the common query algorithm and data structure

Why do you want to talk about query algorithms and data structures here? Because the reason is to build an index, in order to build a data structure, you can apply an efficient query algorithm above, and ultimately improve the data query speed. 2.1 Nature of the index

The official MySQL definition of an index is: index is the data structure that helps MySQL to get data efficiently. By extracting the skeleton of a sentence, you can get the essence of the index: The index is the data structure. 2.2 Common query algorithms

We know that database query is one of the most important functions of database. We all want to query the data as fast as possible, so the designers of the database system are optimized from the point of view of the query algorithm. So what query algorithms can make queries faster? 2.2.1 sequential lookups (linear search)

The most basic query algorithm, of course, is sequential lookup (linear search), which is the method of comparing each element, but the algorithm is extremely inefficient when the data is large.
Data structures: ordered or unordered queues
Complexity: O (N)
Instance code:

Order Lookup
int sequencesearch (int a[], int value, int n)
{
    int i;
    for (i=0; i<n; i++)
        if (a[i]==value)
            return i;
    return-1;
}

2.2.22-point lookup (binary search)

A faster query than the sequential lookup method should be two-point lookup, the principle of binary lookup is that the search process starts from the middle element of the array, if the intermediate element is exactly the element to be found, then the search process ends; If a particular element is greater than or less than the middle element, it is found in the half of the array greater than or less And starting with the beginning of the comparison from the middle element. If an array of steps is empty, the representation cannot be found.
Data structures: Ordered arrays
Complexity: O (LOGN)
Instance code:

Binary lookup, recursive version
int BinarySearch2 (int a[], int value, int low, int. high)
{
    int mid = low+ (high-low)/2;
    if (A[mid]==value)
        return mid;
    if (a[mid]>value)
        return BinarySearch2 (A, value, low, mid-1);
    if (a[mid]<value)
        return BinarySearch2 (A, value, mid+1, high);
}

2.2.32 Fork Sorting Tree Lookup

The binary sorting tree is characterized by: if its left subtree is not empty, the value of all nodes on the left subtree is less than the value of its root node, and if its right subtree is not empty, the value of all nodes on the right subtree is greater than the value of its root node, and its left and right subtrees are also two-fork sorting trees.

The principle of search: If B is an empty tree, the search fails, otherwise: if x equals the value of the data field of the root node of B, the search succeeds; otherwise: if x is less than the value of the data field of the root node of B, the left subtree is searched; otherwise: find the right subtree.

Data structure: Two-fork sorting tree
Time complexity: O (log2n) 2.2.4 Hash method (hash table)

The principle is to first create a hash table (hash table) based on the key value and the hash function, the fuel consumption is based on the key value, through the hash function, the position of the data element.

Data structure: Hash table
Time complexity: Almost O (1), depending on how much conflict is generated. 2.2.5 Block Lookup

Block lookup, also known as index order lookup, is an improved method for sequential lookups. The idea of the algorithm is to divide n data elements "ordered by block" into M-Block (m≤n). The nodes in each block do not have to be ordered, but the blocks and blocks must be "ordered by block"; that is, the keyword of any element in the 1th block must be less than the keyword of any element in the 2nd block, and any element in the 2nd block must be less than any element in the 3rd block, and so on.

Algorithm Flow: First select the largest keywords in each block to form an index table; find two parts: The index table is first binary lookup or sequential lookup to determine which piece of unknown origin records, and then in the determined block with the order method to find.

Each comparison of this search algorithm reduces the search scope by half. Their query speed has been greatly improved, the complexity is. If you look at it a little bit, you will find that each lookup algorithm can only be applied to a particular data structure, such as a binary lookup requires an orderly retrieval of data, while a binary tree lookup can only be applied to a binary lookup tree, but the data itself cannot be fully organized to meet a variety of data structures (for example, It is theoretically impossible to organize both columns sequentially, so in addition to the data, the database system maintains a data structure that satisfies a particular lookup algorithm that references (points to) data in some way, so that an advanced find algorithm can be implemented on those data structures. This data structure is the index. 2.3 Balanced Multi-search tree B-tree (b-tree)

The above refers to the two-fork tree, its search time complexity is O (log2n), so its search efficiency and the depth of the tree, if you want to improve the query speed, it is necessary to reduce the depth of the tree. To reduce the depth of the tree, it is natural to use a multi-fork tree, combined with the idea of a balanced binary tree, we can build a balanced multi-fork tree structure, and then we can build a balanced multi-path search algorithm, improve the efficiency of large data volume. 2.3.1 B Tree

B-Tree (Balance tree) is also called B-tree (in fact B-is translated by B-tree, so B-tree and B-tree is a concept), it is a balanced multi-path search tree. The following figure is a typical B-tree:

From the above image we can see some of the characteristics of B-tree, in order to better describe the B-tree, we define the record as a two-tuple [key, Data],key is the key value of the record, data represents other data (only key in the image above, no data is drawn). Here is a detailed definition of the B-tree:

1. There is a root node, the root node has only one record and two children or the root node is empty;
2. The key and the pointer in each node record are spaced each other, the pointer points to the child node, and
3. D is the width of the tree, except for the leaf node, where each node has [d/2,d-1] records. And some of the keys in the records are left to right by size, there are [d/2+1,d] children;
4. In one node, all keys in the nth subtree are smaller than the nth key in this node, greater than the N-1 key, For example, all keys in the 2nd sub-node E of the B node in the image above are less than the 2nd key 9 in B, greater than the 1th key 3;
5. All leaf nodes must be at the same level, i.e. they have the same depth;

Because of the characteristics of B-tree, the algorithm of retrieving data by key in B-tree is very intuitive: first, binary lookup from the root node, if found to return the corresponding node of data, otherwise the corresponding interval pointer to the node to find recursively, until the node is found or a null pointer, the former lookup success , the latter lookup failed. The pseudo-code for the lookup algorithm on B-tree is as follows:

Btree_search (node, key) {
     if (node = = NULL) return null;
     foreach (Node.key) {
          if (node.key[i] = = key) return node.data[i];
          if (Node.key[i] > key) return Btree_search (Point[i]->node);
      }
     Return Btree_search (Point[i+1]->node);
  }
data = Btree_search (root, My_key);

About B-tree has a series of interesting properties, such as a degree of D B-tree, set its index N key, the upper limit of its tree height h is logd ((n+1)/2), retrieve a key, its search node number of the progressive complexity of O (LOGDN). As can be seen from this point, B-tree is a very efficient index data structure.

In addition, since inserting deletes a new data record destroys the nature of the b-tree, it is necessary to do a split, merge, transfer, etc. to preserve the b-tree nature of the tree when inserting and deleting it, and this article does not intend to discuss the contents of B-tree completely. Because there is already a lot of information detailing the mathematical nature of b-tree and the insertion and deletion algorithm, interested friends can consult other literatures for detailed study. 2.3.2 B+tree

In fact, B-tree has many variants, the most common of which is b+tree, such as MySQL, which generally uses b+tree to implement its index structure. Compared to B-tree, B+tree has the following differences: The pointer of each node is 2d instead of 2d+1; the inner node does not store data, only the key; leaf nodes do not store pointers;

The following is a simple b+tree schematic.

Since not all nodes have the same domain, the b+tree and inner nodes are generally of different sizes. This is different from B-tree, although the number of keys and pointers stored in B-tree may be inconsistent, but the domain and the upper bound of each node are consistent, so in the implementation b-tree often apply for the same size of space for each node. In general, B+tree is more suitable than b-tree to achieve the external storage index structure, the specific reason and the principle of external memory and the principle of computer access, will be discussed below. 2.3.3 B+tree with sequential access pointers

Generally, the b+tree structure used in database system or file system is optimized on the basis of classical b+tree, and the sequential access pointers are added.

As shown in the figure, a pointer to an adjacent leaf node is added to each leaf node in B+tree, and a b+tree with sequential access pointers is formed. The purpose of this optimization is to improve the performance of the interval access, such as Figure 4 if you want to query key from 18 to 49 of all data records, when the 18 is found, simply follow the node and pointer in order to traverse the data node can be accessed once, greatly mentioning the efficiency of the interval query.

This section provides a brief introduction to B-tree and B+tree, and the next section, combined with the memory access principle, explains why the current b+tree is the preferred data structure for the database system implementation index. third, the index data structure setting related computer principle

As mentioned above, the binary tree, red black tree and other data structures can also be used to implement the index, but the file system and database system generally adopt b-/+tree as the index structure, this section will combine the computer composition principle related knowledge discusses b-/+tree as the theoretical basis of the index. 3.1 Two types of storage

In a computer system, there are generally two types of storage, computer main memory (RAM) and external memory (such as hard disks, CDs, SSDS, etc.). When designing the index algorithm and the storage structure, we have to take into account both types of storage characteristics. Main memory reading speed, relative to main memory, the data read rate of the external disk is several orders of magnitude slower than master and slave, the specific differences between them will be described in detail later. All of the above-mentioned query algorithms are hypothetical data stored in the main memory of the computer, the main memory of the computer is generally small, the actual database data are stored in the external memory.

In general, the index itself is large and cannot be stored in memory, so the index is often stored as an index file on the disk. In this way, the index lookup process will generate disk I/O consumption, relative to memory access, I/O access to the consumption of a few orders of magnitude, so the evaluation of a data structure as an index is the most important indicator of the number of disk I/O operations in the process of incremental complexity. In other words, the structural organization of the index minimizes the number of disk I/O accesses during the lookup process. Here is a detailed introduction to memory and disk access principles, and then combine these principles to analyze the efficiency of B-/+tree as an index. 3.2 Main Memory Access principle

At present, the main memory used in the computer is random read-write memory (RAM), modern RAM structure and access principle is more complex, here the paper gives up the specific differences, abstract a very simple access model to illustrate how RAM works.

From an abstract point of view, main memory is a series of storage units composed of a matrix, each storage unit stores a fixed size of data. Each storage unit has a unique address, modern main memory of the site rules are more complex, here it simplifies to a two-dimensional address: through a row address and a column address can be uniquely positioned to a storage unit. The figure above shows a 4 x 4 main memory model.

The main memory access process is as follows:

When the system needs to read main memory, the address signal is put to address bus upload to main memory, main memory read the address signal, parse the signal and locate the designated storage unit, and then put this storage unit data on the bus for other parts to read. The process of writing main memory is similar, the system will write the unit address and data on the address bus and bus, respectively, main memory reads two bus contents, does the corresponding write operation.

As can be seen here, the main memory access time is only linear with the number of accesses, because there is no mechanical operation, two times access to the data "distance" will not have any impact on time, for example, first take A0 and then take A1 and take A0 and then take D3 time consumption is the same. 3.3 Disk Access principle

As mentioned earlier, indexes are typically stored as files on disk, and index retrieval requires disk I/O operations. Unlike main memory, disk I/O is mechanically active, so the time spent on disk I/O is huge.

The disk reads the data by the mechanical movement, when needs to read the data from the disk, the system will pass the data logical address to the disk, the disk's control circuit translates the logical address according to the addressing logic to the physical address, namely determines to read the data in which track, which sector. In order to read the data in this sector, the head needs to be placed above this sector, in order to achieve this, the head needs to move to the corresponding track, this process is called seek, the time spent is called seek time, and then the disk rotation of the target sector is rotated to the head, the process of time is called the rotation time, The last is the transmission of the read data. So the time spent on each reading of the data can be divided into seek time, rotation delay, transmission time three parts. Where: Seek time is the time required for the magnetic arm to move to the specified track, and the mainstream disk is typically below 5ms. Rotational delay is what we often hear of disk speed, such as a disk 7200 rpm, indicating that can be rotated 7,200 times per minute, that is, 1 seconds can go 120 times, the rotation delay is 1/120/2 = 4.17ms. Transfer time refers to the time that reads from disk or writes data to disk, typically in fraction milliseconds, and is negligible relative to the first two times.

Then the time to access a disk, that is, a disk IO time is approximately equal to 5+4.17 = 9ms, sounds pretty good, but to know that a 500-mips machine can execute 500 million instructions per second, because the instruction depends on the nature of the electricity, In other words, the time to execute an IO can execute 400,000 instructions, the database with 1.001 billion or even tens data, each time 9 milliseconds, it is obviously a disaster. 3.4 Principle of locality and disk pre-reading

Due to the characteristics of the storage media, the disk itself is much slower than main memory, coupled with mechanical movement, disk access speed is often one of the hundreds of of main memory, so in order to improve efficiency, to minimize disk I/O. To do this, the disk is often not read strictly on-demand, but is read-ahead every time, even if only one byte is required, and the disk starts from this location, sequentially reading a certain length of data into memory. The rationale for this is the well-known local principle of computer science: When a data is used, the data around it is usually used immediately. The data that is required during the program run is usually relatively centralized.

Due to the high efficiency of disk sequential reads (no seek time required and minimal rotational time), pre-reading can improve I/O efficiency for programs with locality. The length of the read-ahead is generally the integer multiple of the page. Page is the logical block of Computer Management memory, hardware and operating system tend to divide main memory and disk storage area into contiguous size equal blocks, each storage block is called a page (in many operating systems, the page size is usually 4k), main memory and disk in the page to exchange data. When the program to read the data is not in main memory, will trigger a page fault, the system will send a read signal to the disk, the disk will find the starting position of the data and sequentially read one or several pages back into memory, and then return unexpectedly, the program continues to run. IV. data structure B-/+tree of database index and its performance analysis

Here you can finally analyze why database indexes are B-/+TREE storage structures. As mentioned above, database indexes are stored on disk and we generally use disk I/O times to evaluate the index structure. First, from the B-tree analysis, according to the definition of b-tree, it is necessary to retrieve a maximum of H-1 nodes (root node resident memory). The designer of the database system skillfully exploits the principle of disk pre-reading, setting the size of a node equal to one page, so that each node can be fully loaded with only one I/O. In order to achieve this, in the actual implementation of the B-tree also need to use the following techniques: each time a new node, directly request a page of space, so that a node can be physically stored in a page, in addition to the computer storage allocation is page-aligned, the implementation of a node only one time I/O.

B-tree requires a maximum of h-1 I/O (root node resident memory) in a single retrieval, and a progressive complexity of O (h) =o (LOGDN). In general practice, the out-of-size D is a very large number, usually more than 100, so H is very small (usually not more than 3).

In summary, if we adopt b-tree storage structure, the number of I/O in the search is generally not more than 3 times, so using B-tree as index structure efficiency is very high. Performance Analysis of 4.1 B + Tree

We know from the above that the search complexity of B-Tree is O (h) =o (LOGDN), so the larger the size of the tree, the smaller the depth h, the less the number of I/O. B+tree can increase the width of the degree d, because each node size is one page size, so the upper limit depends on the size of the key and data within the node:

Dmax=floor (pagesize/(keysize+datasize+pointsize))//floor means rounding down

Because the node in the B+tree removes the data domain, it can have a greater degree of out-of-order and better performance. 4.2 B + Tree discovery process

B-Tree and B + Tree search process are basically the same. As shown in the figure above, if you want to find the data item 29, then the disk Block 1 is loaded into memory, the first time Io, in memory with a binary lookup determined 29 between 17 and 35, locking disk Block 1 of the P2 pointer, memory time because of very short (compared to the disk IO) can be negligible, Disk Block 1 through disk address of the P2 pointer to the disk block 3 is loaded into memory, the second io,29 between 26 and 30, locking disk block 3 of the P2 pointer, loading disk blocks 8 through the pointer to memory, a third Io, while in-memory binary find found 29, the end of the query, a total of three IO. The real situation is, the 3-tier B + tree can represent millions of data, if millions of data to find only three Io, the performance will be huge, if there is no index, each data item will occur once IO, then a total of millions of Io, it is obviously very expensive.

This chapter discusses data structures and algorithms related to indexes from a theoretical point of view, and the next chapter will discuss how B+tree is specifically implemented as an index in MySQL, while combining the MyISAM and INNDB storage engines to introduce two different types of index implementations for nonclustered and clustered indexes. v. MySQL index implementation

In MySQL, the index is the concept of storage engine level, different storage engine implementation of the index is different, this article mainly discusses the MyISAM and InnoDB two storage engine index implementation way. 5.1 MyISAM Index Implementation

The MyISAM engine uses B+tree as the index structure, and the data domain of the leaf node holds the address of the record. The following figure is a schematic diagram of the MyISAM index:

There are three columns in the table, assuming that we have Col1 as the primary key, the above figure is a MyISAM table main index (Primary key) schematic. You can see that the index file of MyISAM only stores the address of the data record. In MyISAM, the primary index and secondary index (secondary key) are structurally indistinguishable, except that the primary index requires that key be unique, and the secondary index key can be duplicated. If we set up a secondary index on Col2, the structure of this index is as follows:

It is also the address of a b+tree,data field that holds data records. Therefore, the algorithm of index retrieval in MyISAM is to search the index according to the B+tree search algorithm first, if the specified key exists, the value of its data field is fetched, then the data record is read with the value of the data field.
The index of MyISAM is also called "non-aggregation", and the reason for this is to differentiate it from InnoDB's clustered index. 5.2 InnoDB Index Implementation

Although InnoDB also uses B+tree as the index structure, the implementation is very different from MyISAM.

The first major difference is that the InnoDB data file itself is the index file. As you know above, the MyISAM index file and the data file are detached, and the index file only holds the address of the data record. In InnoDB, the table data file itself is an index structure organized by B+tree, and the tree's leaf node data field holds the complete data record. The key of this index is the primary key of the data table, so the InnoDB table data file itself is the primary index.

The figure above is a schematic diagram of the InnoDB primary index (which is also a data file), and you can see that the leaf node contains the complete data record. This index is called a clustered index. Because the InnoDB data file itself is clustered by the primary key, the INNODB requires that the table must have a primary key (MyISAM can not), and if it is not explicitly specified, the MySQL system automatically selects a column that uniquely identifies the data record as the primary key, and if no such column exists, Then MySQL automatically generates an implicit field for the InnoDB table as the primary key, which is 6 bytes long and has a length of type.

The second difference from the MyISAM index is that the secondary index of the InnoDB data field stores the value of the corresponding record primary key instead of the address. In other words, all secondary indexes of InnoDB refer to the primary key as the data domain. For example, the following diagram defines a secondary index on the Col3:

Here is the ASCII code of the English character as the comparison criterion. Clustered index This implementation makes search by primary key efficient, but secondary index search needs to retrieve two times index: first retrieves the secondary index to obtain the primary key, and then retrieves the record with the primary key to the primary index.

Understanding how index implementations of different storage engines can be useful for proper use and optimization of indexes, such as knowing the InnoDB index implementation, makes it easy to understand why it is not recommended to use too-long fields as primary keys, because all secondary indexes refer to the primary index, and the long primary index makes the secondary index too large. For example, using non-monotonic fields as primary keys is not a good idea in InnoDB because the InnoDB data file itself is a b+tree, and a non-monotonic primary key causes the data file to be frequently split in order to maintain b+tree characteristics when inserting new records, which is inefficient, Using the self-increment field as the primary key is a good choice.

The next chapter will discuss these index-related optimization strategies in detail. Vi. Index usage strategy and optimization

MySQL optimization is mainly divided into structural optimization (Scheme optimization) and query Optimization (optimization). The high performance indexing strategy discussed in this chapter belongs to the structure optimization category. The content of this chapter is based entirely on the theoretical basis above, in fact, once understanding the mechanism behind the index, then the choice of high-performance strategy becomes pure reasoning, and can understand the logic behind these strategies. 6.1 Combined index and leftmost prefix principle Federated Index (composite index)

Let's start by introducing a federated index. Federated indexes are simple enough that there is only one field relative to a generic index, and a federated index can create an index for multiple fields. It is also very simple, for example, when we create a federated index on the (a,b,c) field, the index record is sorted first by the A field, then by the B field and then by the C field, so the federated index is characterized by: the first field must be ordered

When the first field value is equal, the second field is ordered, for example, when a=2 in the following table, all B values are ordered, and so on, when the same B is worth all of the C fields ordered.

| A | B | C |
| 1 | 2 | 3 |
| 1 | 4 | 2 |
| 1 | 1 | 4 |
| 2 | 3 | 5 |
| 2 | 4 | 4 |
| 2 | 4 | 6 |
| 2 | 5 | 5 |

In fact, the search for a federated index is the same as the dictionary, first check according to the first letter, and then according to the second letter, or only according to the first letter, but can not skip the first letter from the second letter to start the search. This is the so-called leftmost prefix principle. the leftmost prefix principle

Let's take a look at the Federated Index query in more detail. As in the above example, we have a federated index on the (a,b,c) field, so this index is arranged first by a and then by B and then by C, so:

The following query methods can be used in the index

select * FROM table where a=1, select * from table where a=1 and b=2, select * from table where a=1 and b=2; c=3 e>

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Principle and optimization of database indexing

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Principle and optimization of database indexing

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support