Talking about the function and principle of database index, talking about database index
A database index is an identifier attached to a table field to increase the query speed. Many people mechanically understand the index concept and believe that adding an index has only the advantages and no disadvantages. In fact, it is far from that. Here we will introduce it in detail as much as possible.
First, understand why the index increases the speed. When DB executes an SQL statement, the default method is to scan the entire table based on the search conditions, and add the matching conditions to the search result set. If we add an index for a field, we will first locate the number of rows of a specific value in the index list, which greatly reduces the number of matched rows, and significantly increases the query speed. Should indexes be added at any time? Here are several counterexamples: 1. If you need to retrieve all the table records every time, you must scan the entire table in any case, so it doesn't make sense to add an index. 2. Adding an index for a non-unique field, such as a large number of repeated fields such as "gender", is meaningless. 3. For tables with fewer records, increasing indexes will not lead to speed optimization, but will waste storage space, because indexes require storage space, in addition, each execution of update/insert/delete requires that the index of the field be recalculated and updated.
So when should I add an index? Let's take a look at the example in the Mysql manual. Here is an SQL statement:
SELECT c.companyID, c.companyName FROM Companies c, User u WHERE c.companyID = u.fk_companyID AND c.numEmployees >= 0 AND c.companyName LIKE '%i%' AND u.groupID IN (SELECT g.groupID FROM Groups g WHERE g.groupLabel = 'Executive')
This statement involves the join of three tables and contains many search conditions such as size comparison and Like matching. The number of rows to be scanned by Mysql is 77721876 if no index is available. After adding an index to the companyID and groupLabel fields, the number of scanned rows only needs 134 rows. In Mysql, you can use Explain Select to view the number of scans. It can be seen that in the case of such join tables and complex search conditions, the performance improvement brought by indexes is far more important than the disk space occupied by indexes.
How is the index implemented? Most DB vendors implement indexes based on the B-tree data structure. Because B-tree is suitable for organizing dynamic search tables on direct storage devices such as disks. B is defined as follows: An m (m> = 3) Level B tree is a m Cross Tree that meets the following conditions:
1. Each node includes the following scopes (j, p0, k1, p1, k2, p2,... ki, pi). Where j is the number of keywords, p is the Child pointer.
2. All leaf nodes are on the same layer, and the layers are equal to the height of the tree.
3. The number of keywords contained in each non-root node must meet the requirements of [m/2-1] <= j <= S-1
4. If the tree is not empty, the root has at least one keyword. If the root is not a leaf, there are at least two Subtrees, and at most m Subtrees.
Let's look at a B-tree example. The B-tree with 26 English letters can be constructed as follows:
We can see that the complexity of searching for English letters in Tree B is only o (m). When the data volume is large, such a structure can greatly increase the query speed. However, another data structure query function is faster than Tree B-hash. The Hash table is defined as follows: Set all possible keyword sets to u, and the actual stored keywords are recorded as k, while | k | ratio | u | is much smaller. The hash function h maps u to the subscript of table T [0 M-1], so that the keyword in u is a variable, h is the storage address of the corresponding node. So that the search can be completed in o (1) time.
However, there is a defect in the hash, that is, the hash conflict, that is, the two keywords are computed using the hash function to obtain the same results. Set m and n to indicate the length of the hash list and the number of filled nodes, respectively. n/m is the filling factor of the hash list. A larger factor indicates a larger chance of hash conflicts.
Because of this defect, the database does not use a hash as the default index implementation, mysql claims that it will try to convert the disk-based B-tree index to an appropriate hash index based on the Query format to further improve the search speed.
Summary
This article introduces the functions and principles of database indexes and hopes to help you. If you are interested, refer to: Introduction to oracle Database Import TXT file Method Introduction to oracle Database startup stage analysis oracle virtual private database details. If you have any questions, you can leave a message at any time. The editor will reply to you in a timely manner. Thank you for your support!