"Go" B-tree, B-tree, + + tree, b* tree, red black tree, binary sort tree, trie tree double Array Dictionary Find tree Introduction

Source: Internet
Author: User

B-Tree is a binary search tree:

1. All non-leaf nodes have a maximum of two sons (left and right);

2. All nodes store a keyword;

3. The left pointer of a non-leaf node points to a subtree smaller than its key, and the right pointer points to a subtree larger than its key;

Such as:

B-Tree search, starting from the root node, if the keyword of the query is equal to the keyword of the node, then hit; otherwise, if the query keyword is smaller than the node keyword, enter the left son; if the pointer to the left son or right son is empty, the report cannot find the corresponding keyword;

If the number of nodes of all non-leaf nodes in the B-tree remains approximately (balanced), then the search performance of B-tree is approximate to the binary lookup; but it has the advantage over the binary lookup of contiguous memory space that changing the B-tree structure (inserting and deleting nodes) does not require moving large segments of memory data, or even constant overhead;

Such as:

However, after several insertions and deletions, the B-tree may lead to different structures:


The right side is also a B-tree, but its search performance is already linear, the same keyword collection may lead to different tree structure index, so, the use of B-tree to keep the B-tree as far as possible to maintain the structure of the left graph, and avoid the structure of the right graph, so-called "balance" problem;

The actual B-tree is based on the original B-tree to add the balance algorithm, namely "balanced binary tree"; How to keep the equilibrium algorithm of B-tree node distribution evenly is the key to balance binary tree; The equilibrium algorithm is a strategy to insert and delete nodes in the B-tree.


B-Tree

is a multi-path search tree (not two-pronged):

1. Definition of any non-leaf node up to only m sons; m>2;

2. The number of sons of the root node is [2, M];

3. The number of sons of non-leaf nodes outside the root node is [M/2, M];

4. Each node is stored at least m/2-1 (rounded) and up to M-1 keywords; (at least 2 keywords)

5. Number of key words for non-leaf nodes = number of pointers to sons-1;

6. Non-leaf node keywords: k[1], k[2], ..., k[m-1]; K[i] <K[i+1];

7. Pointers to non-leaf nodes: p[1], p[2], ..., p[m], where p[1] a subtree that points to a keyword less than k[1], p[m] a subtree that points to a keyword greater than k[m-1], and the other p[i] a subtree that points to a keyword (k[i-1],k[i]);

8. All leaf nodes are located on the same floor;

such as: (M=3)

B-Tree search, starting from the root node, the node in the key (ordered) sequence of binary search, if the hit is finished, otherwise enter the query keyword to the range of the son node; repeat until the corresponding son pointer is empty, or is already a leaf node;

B-Tree Features:

1. The keyword set is distributed throughout the tree;

2. Any keyword appears and appears only in one node;

3. Search may end at non-leaf nodes;

4. Its search performance is equivalent to doing one-time binary search within the complete range of keywords;

5. Automatic level control;

Due to the restriction of non-leaf nodes outside the root node, at least the M/2 son, to ensure the minimum utilization of the end point, its minimum search performance is:

where m is the maximum number of subtree of non-leaf nodes, and n is the total number of keywords;

So the performance of B-tree is always equivalent to binary lookup (independent of M-value), there is no problem of B-tree equilibrium;

Due to the limitation of the M/2, when the node is inserted, if the node is full, it is necessary to divide the node into two M/2 nodes, and to delete the nodes, it is necessary to merge the two M/2 brothers nodes;


B + Tree

B + trees are variants of B-trees and are also a multi-path search tree:

1. Its definition is basically the same as the B-tree, except:

2. The sub-tree pointer of non-leaf node is the same as the number of keywords;

3. The subtree pointer of the non-leaf node p[i], pointing to the subtree (b-tree is open interval) of the key value belonging to [K[i], k[i+1]);

5. Add a chain pointer for all leaf nodes;

6. All keywords appear at the leaf node;

such as: (M=3)

The B + search is basically the same as the B. C-tree, except that the second B-tree only hits the leaf nodes (b-trees can be hit on non-leaf nodes), and its performance is equivalent to doing a binary search in the keyword complete.

Features of B +:

1. All keywords appear in the list of leaf nodes (dense index), and the key words in the list are in order;

2. Cannot be hit on non-leaf nodes;

3. The non-leaf node is equivalent to the index of the leaf node (sparse index), and the leaf node is equivalent to the data layer of storing (key) data;

4. More suitable for file indexing system;

b* Tree

is a variant of B + tree in which the non-root and non-leaf nodes of the B + tree are added to the pointer of the brother;

b* Tree defines the number of non-leaf node keywords at least (2/3) *m, that is, the minimum usage of the block is 2/3 (instead of the B + Tree 1/2);

B + Tree Division: When a node is full, a new node is allocated, and 1/2 of the original node is copied to the new node, and the pointer to the new node is added to the parent node, and the division of the tree is affected only by the original node and the parent node, without affecting the sibling node, so it does not need to point to the brother's pointer

b*: When a node is full, if its next sibling node is not full, then move part of the data to the sibling node, insert the keyword at the original node, and finally modify the keyword of the sibling node in the parent node (because the sibling node's keyword range has changed); If the brothers are full, The new node is added between the original node and the sibling node, and each copy 1/3 of the data to the new node, and finally the pointer of the new node is added to the parent node;

Therefore, the probability of allocating new nodes to b* tree is lower than that of B + tree, and the space utilization rate is higher.

Summary

B-Tree: Two fork tree, each node only stores a keyword, equal to hit, less than the left node, more than the right node;

B-Tree: Multi-path search tree, each node storage M/2 to M keywords, non-leaf node storage points to the key range of sub-nodes;

All keywords appear in the whole tree, and only once, non-leaf nodes can hit;

B + Tree: On the basis of the tree, the leaf nodes are added to the list pointers, all the keywords appear in the leaf nodes, the non-leaf nodes as the index of the leaf nodes; B + The tree is always hit by the leaf knot.

b* tree: On the basis of B + tree, for non-leaf nodes also increase the linked list pointer, the minimum utilization rate of nodes increased from 1/2 to 2/3;

Red black tree rbtree two fork sorting tree

Map is the use of red and black tree storage, red and black Trees (rbtree) is a balanced binary tree, the advantage is the tree to the leaf node depth consistent, the search efficiency is the same, For Logn. In the implementation of the search, insert, delete the efficiency is consistent, and when it is all static data, there is not much advantage, it may be appropriate to use the hash table.

Hash_map is a Hashtable take up more memory, find more efficient, but the time of the hash is much longer.

Overall, the Hash_map lookup speed is faster than map, and the lookup speed is basic and the data data size is the constant level, and the map lookup speed is the log (n) level. Not necessarily the constant is smaller than log (n), hash and hash function time-consuming, understand, if you consider efficiency, especially when the element reaches a certain order of magnitude, consider hash_map. But if you are particularly strict about memory usage, and you want the program to consume as little memory as possible, be careful, hash_map may embarrass you, especially if your hash_map object is so special, you're more out of control, and Hash_map's construction is slow.

Do you know how to choose now? Weigh three factors: find speed, amount of data, memory usage.

Trie Tree Double Array Dictionary lookup tree

The trie tree can be used for general dictionary searches as well as for index lookups.
Each node corresponds to a state of the DFA and the terminating state is the end of the lookup. The process of orderly finding is equivalent to the constant transformation of state
For a given string A1,a2,a3,..., an. Search by Trie tree to complete the search by N times. But it seems that there is no B-tree search efficiency is high, B-Tree search algorithm complexity of LOGT (N+1/2). When t tends to be large, search efficiency becomes efficient. No wonder DB2 's access memory is set to a page size of virtual memory, and the frame switching frequency is reduced without the need for frequent page switching.
Below we have and,as,at,cn,com these keywords, then how to build trie tree?

From the above figure, we can find some interesting features more or less.

First: The root node does not contain characters, and each child node outside of the root node contains a single character.

Second: From the root node to a node, the characters that pass through the path are concatenated, which is the corresponding string for that node.

Third: The common prefix for each word is saved as a character node.

Scope of Use:

Now that we learn trie tree, we must know what this thing is for.

First: Word frequency statistics.

May be someone to say, the word frequency statistics simple ah, a hash or a heap can be finished, but the problem comes, if the memory is limited? can still be so

Do you play? So here we can use the trie tree to compress the space, because the public prefix is saved with a node.

Second: prefix matching

Take the above picture, if I want to get all the strings starting with "a", it is obvious that: And,as,at, if not trie tree,

What are you going to do about it? It is clear that the simple practice of Time complexity O (N2), then the trie tree is not the same, it can do h,h for you to retrieve the length of the word,

It can be said that this is the effect of the second kill.

For example: There is a string "and" with a number 1, we want to insert into the trie tree, using the idea of dynamic planning, the number "1" is counted into the node of each path,

Then we'll find the number of the string "A", "an", "and" prefixed with a lot of ease.

"Go" B-tree, B-tree, + + tree, b* tree, red black tree, binary sort tree, trie tree double Array Dictionary Find tree Introduction

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.