Database index structure-Principles of the T-tree

Source: Internet
Author: User

Indexes are used to improve query efficiency. You can define an index for a field in each table to improve the query efficiency of this field. Because the amount of data to be processed by the database is very large, and the memory is expensive, and the capacity is limited, and must meet certain real-time requirements, the data storage and indexing methods in the database are studied, it is necessary to find an effective data organization method. The typical Index Technology of the disk database system is B-tree index. The main purpose of the B-tree structure is to reduce the number of disk I/O required to complete the index search of data files. B-tree controls the internal index value of the node to achieve this goal. It contains as many index entries as possible in the node (add one index entry that can be accessed by disk I/O ).
T-tree is an index technology optimized for primary storage access. T-tree is a balanced binary tree that contains multiple index entries in a node. The index items of T-tree are much simpler than those of B-tree in terms of size and algorithm. The T-tree search algorithm does not distinguish the value of the search from the current node or other places in the memory. Each time a new index node is accessed, the index range is halved.
The T-tree index is used to query the range of keywords. A t-tree is a specially balanced binary tree (AVL). Each node of the T-tree stores a set of keywords for sorting key values. T-tree not only has a high Node space share, but also has an advantage in the complexity and execution time of search algorithms that traverse a tree. Now T-tree has become the most important index method in memory databases.

1. concepts related to T-tree
T-tree has the following features: ① the difference between the left and right sub-trees cannot exceed 1; ② multiple key values can be saved on a storage node, its leftmost and rightmost values are the minimum and maximum keys of the node respectively. Its left subtree only contains records whose key values are less than or equal to the minimum key values, similarly, the right subtree only contains records whose key values are greater than or equal to the maximum key values. ③ nodes with both left and right Subtrees are called internal nodes, A node with only one subtree is called a half leaf and a node without a subtree is called a leaf. ④ To maintain space utilization, each internal node must contain a minimum key value. It can be seen that the T tree is a balanced binary tree with multiple keywords in each node. the keywords in each node are arranged in an orderly manner, and the left subtree is smaller than the root node keyword, the right subtree is larger than the root node keyword.
The preceding T-Tree node structure contains the following information:
(1) Balance (balance factor), the absolute value of which is not greater than 1, balance = right subtree height-left subtree height;
(2) left_child_ptr and right_child_ptr indicate the left subtree and right subtree pointers of the current node respectively;
(3) max_item indicates the maximum number of key values that can be accommodated in the node;
(4) Key [0] to K [Max_Item-1] is the keyword of the node memory;
(5) nitem is the number of keywords actually stored on the current node.
T-trees have the following features:
(1) similar to the AVL Tree, the height difference between the left and right subtree of any node in the T tree is 1;
(2) different from the AVL Tree, the T-Tree node can store multiple key values, and these key values are arranged in an orderly manner;
(3) The key value in the left subtree of the T-Tree node is not greater than the left-key value in the node. The key value in the right subtree is not smaller than the right-click value in the node;
(4) to ensure that each node has a high space usage, each internal node contains the number of key values must not be less than a specified value, usually (Max_Item-2) (max_item is the maximum key value in the node ).

2. T-tree index operations
Using the T-tree as an index mainly completes three tasks: search, insert, and delete. Insert and delete are all based on search. The following describes the three operations.
(1) T-tree search is similar to a binary tree. The difference is that the comparison on each node is not for each element value in the node, first, check whether the target key value to be searched is included in the range determined by the leftmost key value and rightmost value of the current node. If yes, you can use the binary method to search for the key value list of the current node. If the key value of the target node is smaller than the left-key value of the current node, you can search for the left child node of the current node; if the target key value is greater than the right-click value of the current node, search for the right child node of the current node.
(2) T-Tree insertion is based on search. The application search operation locates the target key-value insertion position and records the last node encountered during the search process. If the query is successful, check whether there is sufficient storage space in the node. If yes, the target key value is inserted into the node; otherwise, the target key value is inserted into the node, insert the left-key value in the node to its left subtree (recursive insert operation), and then end. Otherwise, allocate a new node and insert the target key value; then, based on the relationship between the target key value and the maximum and minimum key value of the node, link the newly allocated node to the left or right child of the node, and check the tree, determine whether the balance factor of the T tree meets the condition. If the balance factor does not meet the condition, perform the rotation operation.

(3) The delete operation of the T-tree is also based on the search, and the application search operation locates the target key value. If the search fails, it ends. Otherwise, place N as the node where the target key value is located, and delete the target key value from node n. If node N is empty after the node is deleted, delete node N and check the balance factor of the tree to determine whether to perform the rotation operation. If the number of key values in node N is less than the minimum value, based on the balance factor of N, the maximum key value is removed from the left subtree of node N or the minimum value is removed from the right subtree to fill in.

3. Key Technologies for T-tree Indexing
To implement the T-tree index, You need to implement the T-tree search, insertion, and deletion. The maintenance of the T-tree is the key to the rotation of the T-tree based on search. If the tree is unbalanced due to insertion or deletion of key values, the T-tree rotation is required. To bring it to a new balance.
In the case of insertion, check all the nodes in the path from the newly created node to the root node until the following two conditions occur: the height of the two Subtrees at a checked node is equal, so no rotation is required. The height difference between the two Subtrees at a checked node is greater than 1, in this case, you only need to perform one rotation operation on the node.
In the case of deletion, it is similar to checking all nodes in the path from the parent node of the node to be deleted to the root node in sequence, during the check, when the height difference between the left and right subtree of a node is found to exceed the limit, a rotation operation is required. Unlike the insert operation, after the rotation operation is completed, the check process cannot be aborted, but must be executed until the root node is checked.
It can be seen that for an insert operation, a maximum of one rotation operation is required to restore the T-tree to the equilibrium state. For a delete operation, an upward chain reaction may occur, as a result, the High-level node may need to be rotated multiple times.
To balance the T-tree, rotation is required. rotation is the most important and difficult operation in the T-tree. The following describes the technology of T-tree rotation. Rotation can be divided into four situations: the rotation caused by the insertion (or deletion) of the left child's left subtree is recorded as ll rotation, similar to LR, RR, and RL rotation. The insertion is similar to deletion.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.