Search algorithm Summary (i)-order, binary, binary, red-black

Source: Internet
Author: User

1. Sequential Lookup

In the lookup we iterate through all the keys in the table in one order and use the Equals () method to find the matching key.

Advantages: The structure of an array has no specific requirements, it can be implemented using arrays or linked lists, and the algorithm is simple.

Disadvantage: When the number of arrays n is large, inefficient.

Time complexity: When looking for a hit, the maximum time complexity is O (n), the minimum time complexity is O (1), the average time complexity is O (N/2), and when a miss is missed, an O (n) comparison is always required.

Inserting n different pieces into an empty table requires a N2 comparison.

2. Binary search based on ordered array

When looking for, we first compare the key found and the middle key of the sub-array. If the found key is less than the middle key, we will continue to find in the left sub-array, if greater than we continue to find in the right sub-array, otherwise the middle key is the key we are looking for.

Non-recursive binary lookup:

View Code

Non-recursive binary lookup:

View Code

In general, binary lookups are much faster than sequential lookups, and for a static table (which does not allow insertions), it is worthwhile to sort them at initialization time.

For binary lookups, if the table is too large, dynamic tables (more insert operations) are not applicable. As shown in the table above, the binary lookup algorithm is still at the O (n) level at the time of insertion.

The core question is whether we can find algorithms and data structures that can simultaneously ensure that both the find and insert operations are of a number of levels. The answer is an exciting "yes"!

How can we achieve this goal? To support efficient insert operations , we seem to need a chain structure . But a single linked list is not possible with the binary lookup method,

Because the efficiency of binary lookups comes from intermediate elements that can quickly get any sub-array through the index (but the only way to get the middle element of a linked list is to traverse the linked list).

In order to combine the efficiency of binary lookup with the flexibility of the list, we need more complex data structures. The only thing that can have both is a two-fork search tree .

The above details can refer to the blog: A brief introduction to the algorithm and data structure: six symbol table and its basic implementation

3. Two fork Find tree

Binary search tree: is a binary tree in which each node contains a key and a value associated with it, and each node has a key greater than the key of any node in its left subtree, which is less than the key of any node in its right sub-tree.

Binary search tree Insert, find, maximum key minimum key, rounding up and down rounding, range lookup, delete operations in the algorithm (fourth edition) book has a good implementation, the algorithm thought is worth learning.

Refer to the Blog: A brief introduction to algorithms and data structures: 72-fork Search tree

Implementation code:

View Code

Analysis: The run time of the algorithm using the two-fork find tree depends on the shape of the tree, and the shape of the tree depends on the order in which it is inserted , and in the best case, a tree with n nodes is fully balanced, and each empty link to the root node is LGN.

In the worst case, there may be N nodes on the search path. For the analysis of this model, the two-fork find tree and the quick sort are almost "twins", and the root node of the tree is the first shard element in the quick sort.

Binary tree Prior good performance relies on the distribution of keys in which the distributions are random enough to eliminate the long path. For fast sorting, we can break the array first, and for the API of the symbol table, we can't do it, because the use case of the symbol table controls the order of the various operations,

。 The worst case scenario is that all keys are inserted into the symbol table sequentially or in reverse order.

This problem can still be solved because there is also a balanced binary lookup tree, which guarantees that regardless of the order in which the keys are inserted, the height of the tree will be the logarithm of the total number of keys.

4. balancing the search tree

We want to maintain the balance of binary search trees, and all lookups can end in LGN comparisons.

4.1 2-3 Find Tree

To ensure the balance of the tree, we need some flexibility, so here we allow a node in the tree to hold multiple keys.

First we introduce 2-3 find tree, 2-3 find tree has both 2-node (a key and two links) and 3-node (two keys and three links), of course, this is just the transition, after the introduction of red and black trees and B-tree.

And the standard two-fork search tree is grown from top to bottom, and 2-3 trees grow from the bottom up (the decomposition of the nodes), and precisely because of this, 2-3 trees are guaranteed to be balanced .

Refer to the blog: the Algorithm and data structure: eight-balance search tree 2-3 tree

2-3 tree implementation is not convenient, we need to maintain two different types of nodes, link and other information from one node to another node, the node from one data type to another data type, etc.

Implementing this requires not only a lot of code, but also the additional overhead they incur may make the algorithm slower than the standard two-fork lookup tree.

In fact, we just need to pay a price to solve this problem, that is, we introduced the red black tree below.

4.2 red and black binary search tree

The basic idea of the red and black binary search tree is to represent 2-3 trees with a standard two-fork lookup tree (complete with 2-node composition) and some additional information (replacing the 3-node).

We are talking about two types of links in the tree: the left diagonal red link connects two 2-nodes to form a 3-node, and the black link is the normal link in the 2-3 tree.

For any 2-3-tree, as long as the node is converted, we can immediately derive a corresponding two-fork search tree.

Equivalent definition:

① red links are left links

② no node is connected to two red links at the same time

③ The tree is perfectly black balanced, that is, any empty link to the root node on the path of the same number of black links.

Keep a red-black tree balanced by rotating and color-flipping these balancing operations. the red-black tree does not seek "complete balance"-it only requires partial balancing requirements, reducing the need for rotation, thus improving performance.

Refer to this blog post for details:

A brief discussion on the algorithm and data structure: Nine balance search tree of red and black trees

Red and Black Tree applications:

TreeMap and TreeSet are two important members of the Java Collection Framework, where TreeMap is a common implementation class for the Map interface, and TreeSet is a common implementation class for the Set interface.

Although the interface specifications implemented by HASHMAP and HashSet are different, the TreeSet is implemented by TREEMAP, so they are implemented exactly the same way. The implementation of TREEMAP is the red and black tree algorithm.

For TreeMap, because it uses a "red-black tree" at the bottom to hold the Entry in the collection, this means that the performance of TreeMap adding elements and removing elements is lower than HashMap: When TreeMap adds an element, it needs to loop to find the insertion position of the new Entry, so When extracting elements from TREEMAP, it is necessary to pass the loop to find the right Entry, and also to compare the performance of consumption.

However, the advantage of TreeMap, TreeSet than HashMap and HashSet is that all TreeMap in Entry always keep an orderly state by key according to the specified collation, and all elements in TreeSet always remain ordered according to the specified collation.

Search algorithm Summary (i)-order, binary, binary, red-black

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.