Data structure and algorithm basis
The nature of the index
MySQL's official definition of index is: Index is a data structure that helps
MySQL obtain data efficiently. Extract the main sentence of the sentence, you can get the essence of the
index: the index is a data structure.
We know that
database query is one of the most important functions of the database. We all hope that the speed of querying data can be as fast as possible, so the designer of the database system will optimize from the perspective of the query algorithm. The most basic query algorithm is of course linear search. This O(n) algorithm is obviously bad when the amount of data is large. Fortunately, the development of computer science provides many better search algorithms. , Such as binary search (binary search), binary tree search (binary tree search), etc. If you analyze a little, you will find that each search algorithm can only be applied to a specific data structure. For example, binary search requires the retrieved data to be ordered, while binary tree search can only be applied to binary search trees, but the data itself The organizational structure cannot fully satisfy various data structures (for example, it is theoretically impossible to organize both columns in sequence at the same time). Therefore, in addition to data, the database system also maintains a data structure that meets a specific search algorithm. These data Structures reference (point to) data in a certain way, so that advanced search algorithms can be implemented on these data structures. This data structure is the index.
Look at an example:
Shows a possible indexing method. On the left is the data table. There are two columns of seven records. The leftmost is the physical address of the data record (note that logically adjacent records are not necessarily physically adjacent on the disk). In order to speed up the search of Col2, a binary search tree as shown on the right can be maintained. Each node contains an index key and a pointer to the physical address of the corresponding data record.
Although this is a genuine index, the actual database system is hardly implemented using binary search trees or red-black trees. The reasons will be introduced below.
B-Tree and B+Tree
At present, most database systems and file systems use B-Tree or its variant B+Tree as the index structure. In the next section of this article, we will discuss why B-Tree and B+Tree are being used in this way in conjunction with the principle of memory and computer access. Widely used in indexes, this section describes them purely from the perspective of data structure.
B-Tree
In order to describe B-Tree, first define a data record as a two-tuple [key, data], key is the key value of the record, for different data records, the key is different from each other; data is the data of the data record except the key . Then B-Tree is a data structure that meets the following conditions:
d is a positive integer greater than 1, called the degree of B-Tree.
h is a positive integer and is called the height of the B-Tree.
Each non-leaf node is composed of n-1 keys and n pointers, where d<=n<=2d.
Each leaf node contains at least one key and two pointers, and at most 2d-1 keys and 2d pointers. The pointers of the leaf nodes are all null.
All leaf nodes have the same depth, which is equal to the tree height h.
The key and pointer are separated from each other, and the two ends of the node are pointers.
The keys in a node are arranged non-decreasingly from left to right.
All nodes form a tree structure.
Each pointer is either null or points to another node.
If a pointer is on the leftmost side of the node node and is not null, all keys that it points to the node are less than.
Due to the characteristics of B-Tree, the algorithm for retrieving data by key in B-Tree is very intuitive: first, perform a binary search from the root node, if found, return the data of the corresponding node, otherwise search for the node pointed to by the pointer of the corresponding interval recursively , Until a node is found or a null pointer is found, the former finds success, the latter fails. The pseudo code of the search algorithm on B-Tree is as follows:
<pre>
BTree_Search(node, key) {
if(node == null) return null;
foreach(node.key)
{
if(node.key[i] == key) return node.data[i];
if(node.key[i]> key) return BTree_Search(point[i]->node);
}
return BTree_Search(point[i+1]->node);
}
data = BTree_Search(root, my_key);
</pre>
There are a series of interesting properties about B-Tree. For example, a B-Tree with degree d is indexed with N keys. From this point, it can be seen that B-Tree is a very efficient index data structure.
In addition, since inserting and deleting new data records will destroy the nature of B-Tree, when inserting and deleting, it is necessary to perform operations such as splitting, merging, and transferring the tree to maintain the nature of B-Tree. This article does not intend to discuss B-Tree completely. These contents, because there are already many materials detailing the mathematical nature of B-Tree and insertion and deletion algorithms, friends who are interested can find the corresponding materials in the reference column at the end of this article to read.
B+Tree
There are many variants of B-Tree, the most common of which is B+Tree. For example, MySQL generally uses B+Tree to implement its index structure.
Compared with B-Tree, B+Tree has the following differences:
The upper limit of the pointer of each node is 2d instead of 2d+1.
Internal nodes do not store data, only keys; leaf nodes do not store pointers.
Since not all nodes have the same domain, leaf nodes and inner nodes in B+Tree generally have different sizes. This is different from B-Tree. Although the number of keys and pointers stored in different nodes in B-Tree may be inconsistent, the domain and upper limit of each node are the same, so in the implementation, B-Tree often applies for each node to be equal The size of the space.
Generally speaking, B+Tree is more suitable to implement external storage index structure than B-Tree. The specific reason is related to the principle of external storage and computer access principles, which will be discussed below.
B+Tree with sequential access pointer
Generally, the B+Tree structure used in the database system or file system is optimized on the basis of the classic B+Tree, and the sequential access pointer is added.
Each leaf node of the B+Tree adds a pointer to the adjacent leaf node to form a B+Tree with sequential access pointers. The purpose of this optimization is to improve the performance of interval access. If you want to query all data records with key from 18 to 49, when you find 18, you can access all data nodes at one time by traversing in the order of nodes and pointers. , Greatly mentioned the efficiency of interval query.
This section gives a brief introduction to B-Tree and B+Tree, and the next section introduces why B+Tree is currently the preferred data structure for database systems to implement indexing based on the principle of memory access.
Why use B-Tree (B+Tree)
As mentioned above, data structures such as red-black trees can also be used to implement indexes, but file systems and database systems generally use B-/+Tree as the index structure. This section will discuss B-/+ with the knowledge of computer composition principles. Tree serves as the theoretical basis of indexing.
Generally speaking, the index itself is also very large, it is impossible to store all of it in memory, so the index is often stored in the form of an index file on the disk. In this case, disk I/O consumption will be generated during the index search process. Compared with memory access, the consumption of I/O access is several orders of magnitude higher. Therefore, the most important indicator for evaluating the pros and cons of a data structure as an index is The incremental complexity of the number of disk I/O operations in the search process. In other words, the structural organization of the index should minimize the number of disk I/O accesses during the search process. The following first introduces the principles of memory and disk access, and then combines these principles to analyze the efficiency of B-/+Tree as an index.
Main memory access principle
At present, the main memory used by computers is basically random read-write memory (RAM). The structure and access principle of modern RAM are more complicated. This article ignores the specific differences and abstracts a very simple access model to illustrate the working principle of RAM.
From an abstract point of view, the main memory is a matrix composed of a series of storage units, and each storage unit stores data of a fixed size. Each storage unit has a unique address. The addressing rules of modern main memory are more complicated. Here, it is simplified into a two-dimensional address: a storage unit can be uniquely located through a row address and a column address.
The main memory access process is as follows:
When the system needs to read the main memory, it puts the address signal on the address bus and uploads it to the main memory. After the main memory reads the address signal, it analyzes the signal and locates it to the designated storage unit, and then puts the storage unit data on the data bus , For other components to read.
The process of writing the main memory is similar. The system puts the unit address and data to be written on the address bus and the data bus respectively, and the main memory reads the contents of the two buses and performs corresponding write operations.
It can be seen here that the main memory access time is only linearly related to the number of accesses. Because there is no mechanical operation, the "distance" of the data accessed twice will not have any effect on the time. For example, take A0 first and then take it. The time consumption of A1 is the same as taking A0 first and then taking D3.
Principles of Disk Access
As mentioned above, indexes are generally stored on disk in the form of files, and index retrieval requires disk I/O operations. Unlike main memory, disk I/O has mechanical movement costs, so the time consumption of disk I/O is huge.
A disk is composed of circular disks of the same size and coaxial. The disks can rotate (each disk must rotate synchronously). There is a head holder on one side of the disk. The head holder fixes a group of heads. Each head is responsible for accessing the contents of a disk. The magnetic head cannot rotate, but it can move along the radius of the disk (actually in the oblique tangential direction). Each head must also be coaxial at the same time, that is, when viewed from directly above, all the heads overlap at all times (but At present, there are independent technologies for multiple heads, which are not subject to this limitation).
The disc is divided into a series of concentric rings. The center of the circle is the center of the disc. Each concentric ring is called a track, and all tracks with the same radius form a cylinder. The track is divided into small segments along the radius line. Each segment is called a sector, and each sector is the smallest storage unit of the disk. For the sake of simplicity, we assume that the disk has only one platter and one head.
When data needs to be read from the disk, the system transmits the logical address of the data to the disk, and the control circuit of the disk translates the logical address into a physical address according to the addressing logic, that is, to determine which track and sector the data to be read is on. In order to read the data in this sector, the magnetic head needs to be placed above this sector. In order to achieve this, the magnetic head needs to move to align with the corresponding track. This process is called seek, and the time spent is called seek time. The time the target sector rotates under the head is called the rotation time.