Comparison of red and black trees and hash tables

Source: Internet
Author: User

What is a hash?

Hash, also known as "hash", is the arbitrary length of input, through the hashing algorithm, transformed into a fixed-length output, the output is the hash value. This is a compression map, that is, the space of the hash value is usually much smaller than the input space, and different inputs may be hashed to the same output (that is, a many-to-one relationship).

The construction of a hash table

In all linear data structures, the positioning of the array is the fastest, because it can be positioned directly to the corresponding array space by the array subscript, there is no need to find each. The hash table is the use of arrays that can quickly locate the data structure to solve the above problems.

"Array can be directly positioned to the corresponding space by subscript", yes that is the case, the hash table is actually very simple, that is, the key through a fixed algorithm function of the so-called hash function is converted to an integer number, and then the number of the logarithm of the length of the array to take the remainder of the results as the subscript, The value is stored in the array space in which the number is subscript, and when querying using a hash table, the hash function is used again to convert the key to the corresponding array subscript, and to locate the space to get value, so that the positioning performance of the array can be fully utilized for data positioning.

For example: If a hash function is this,

index = value% 5;

As in, the left is a length of 5 pointer data, subscript from 0 to 4, each data element is a list of the head pointer, so through VALUE%5 formed a one-to-many relationship, narrowing the scope of the search.

Although we do not want a conflict (with more than one value for the same key), there is a real possibility that the conflict will actually occur. When the value of a keyword is much larger than the length of a hash table, it is not known in advance when the keyword is specified. Conflict will inevitably occur. In addition, when the actual value of the keyword is greater than the length of the Hashtable, and the table is filled with records, if a new record is inserted, not only is there a conflict, but also an overflow occurs. Therefore, dealing with conflicts and overflows are two important issues in hashing technology. Generally have open address law, chain address method.

Scope of application

Quick Find, delete the basic data structure, usually requires the total amount of data can be put into memory .

What is map

Map is a class of associative containers provided by C + + standard library STL, which provides key-value storage and lookup functions.

Map is based on a red-black tree (same set is also), then its search speed is log (n) level.

The advantage is that it takes up less memory.

The difference between hash and map

Weigh three factors: find speed, data volume, memory usage, scalability, and order.
In general, the hash lookup speed is faster than the RB tree, and the search speed is basically independent of the size of the data, which is the constant level, while the RB tree's lookup speed is the log (n) level. The constant is not necessarily smaller than log (n), because hash also has the time-consuming hash function. When the element reaches a certain order of magnitude, consider the hash. But if you are particularly strict with memory, and want the program to consume as little memory as possible, then hash may embarrass you, especially when your hash object is particularly long, you are more uncontrollable, and the hash is constructed at a slower speed.

Red and black trees do not fit into all fields of application trees. If the data is essentially static, then let them stay where they are able to plug in and have better performance without compromising the balance. If the data is completely static, for example, making a hash table, performance might be better.

In a real system, for example, a firewall system that needs to use dynamic rules, using a red-black tree instead of a hash table is proven to have better scalability . The Linux kernel uses red-black trees to maintain memory blocks when it manages vm_area_struct.

Summarize:

Red and black trees are ordered, hash is unordered, according to the needs of the choice.

The red and black trees occupy less memory (only the nodes they exist are allocated memory), and the hash should allocate enough memory to store the hash list beforehand (even if some slots may be deprecated).

Red and black Tree The time complexity of finding and deleting is O (Logn), and the time complexity of hash lookup and deletion is O (1).

Add:

If you only need to determine whether a value in the map exists or not, of course, the hash implementation is more efficient.

If a large number of comparison operations are required to set the intersection difference set of two maps, it is more efficient to map the red and black tree.

Comparison of red and black trees and hash tables

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.