B +, B-tree (MySQL, Oracle, MongoDB)
It is mainly used in relational database indexes, such as Oracle and MySQL InnoDB; MongoDB indexes are also implemented by B-tree; and datablock indexes in hfile in hbase.
Dynamic search trees include Binary Search Tree, balanced binary search tree, and red-black tree ), b-tree/B +-tree/B *-tree (B ~ Tree ). The first three are typical binary search tree structures with time complexity.O(Log2N) Is related to the depth of the tree, reducing the depth of the tree will naturally improve the search efficiency.
However, we have a practical problem: in the context of large-scale data storage and index query, the number of elements stored on the tree node is limited (if the number of elements is large, the query degrades to the linear query in the node). As a result, the binary query tree structure is causedIf the depth of the tree is too large, the disk I/O reads and writes are too frequent, resulting in low query efficiency.To reduce the depth of the tree (of course, the amount of data queried cannot be reduced), the basic idea is to useMulti-treeStructure (because the number of tree node elements is limited, the number of Subtrees of the node is also limited ).
That is to say, because disk operations are time-consuming and resource-consuming, it is inefficient to perform frequent searches. So how to improve efficiency, that is, how to avoid frequent disk searches? The number of disk-based queries and accesses is usually determined by the height of the tree. Therefore, as long as we use a better tree structure, we can minimize the height of the tree, can this effectively reduce the number of disk search and access times? What is the effective tree structure?
In this way, we propose a new query tree-multi-path query tree. Inspired by a balanced binary tree, we naturally think of a balanced tree structure for multiple searches, that isB ~ Tree,That is, the B-tree structure (later we will see that various operations on the B-tree can keep the B-tree at a low height, so as to effectively avoid disk too frequent query and access operations, so as to effectively improve the search efficiency ).
Hash Table + bucket (redis)
The adaptive hash index in MySQL and the data storage implementation in redis use hash to efficiently query data.
A hash table (also called a hash table) is a data structure that is directly accessed based on the key value. That is to say, It maps the key value to a location in the table to access records to speed up the search. This ing function is called a hash function, and the array storing records is called a hash function.
The hash table method is actually very simple, that is, to convert the key into an integer using a fixed algorithm function called a hash function, and then perform the remainder operation on the array length, the remainder result is used as the subscript of the array, and the value is stored in the array space with the number as the base object.
When a hash table is used for query, the hash function is used again to convert the key to the corresponding array subscript and locate the space to obtain the value, the array positioning performance can be fully utilized for data locating.
Arrays are characterized by ease of addressing and difficulty in insertion and deletion. linked lists are characterized by difficulties in addressing and insertion and deletion. Combining the two features, we designed a data structure that is easy to address and easy to insert and delete, such as a hash table implemented by the zipper method.
Booleam Filter (HBase)
The rowkey setting in hbase establishes a booleam filter ing to quickly determine whether the rowkey is in an hfile. Distributed databases are used in many ways.
The bitmap-based storage structure uses a hash function to map an element to a point on an M-length array. When this point is 1, then this element is in the Set, and vice versa, it is not in the set. The disadvantage of this method is that the number of elements detected may conflict frequently. The solution is to use k hash functions to correspond to k vertices. If all vertices are 1, the element is in the set. If there is 0, the element is no longer in the set.