Negative STL-familiar with non-standard hash containers)

Source: Internet
Author: User

 

Clause 25: familiar with non-standard hashed containers

Generally, STL programmers will not be surprised after a long time. "vector, list, and map are good, but where is the hash table "? Alas, there is no hash in the Standard C ++ library. Everyone agrees that this is unfortunate, but the Standards Committee feels that the work to be added to them may be postponed in a transitional manner. It is certain that the next version of the standard will contain a hash, but currently, STL does not have a hash.

But if you like to scatter the list, cheer up. You don't need to give up or do it yourself. STL-compatible hashed associated containers can be obtained from multiple sources, and they even haveActuallyStandard names: hash_set, hash_multiset, hash_map, and hash_multimap.The name of the hashed container isUnordered_set,Unordered_multiset, unordered_mapAndUnordered_multimap. It happens to avoid the existing hash _ * name .)

There are different implementations after these public names. They differ in terms of interfaces, capabilities, internal data structures, and operational efficiency. It is still possible to write the moderately portable code that uses the hash, but it is not as easy as if the hash container is standardized. (Now you know why standards are important .)

For the implementation of several possible hash containers, the most common two are SGI (see clause 50) and dinkumware (see appendix B, I restrict myself to the design of hash containers from these vendors. Stlport (refer to clause 50 again) also provides hash containers, but stlport's hash containers are based on SGI. For the purpose of these terms, it is assumed that anything I write about the SGI hashed container also applies to the stlport hashed container.

Hash containers are associated containers, so you shouldn't be surprised. Just like all associated containers, they need to know the object types stored in the container, used for the comparison functions of these objects, and the Allocator used for these objects. In addition, the description of the hash function is required for the hash container. The following is the hash container declaration:

template<typename T,typename HashFunction,typename CompareFunction,typename Allocator = allocator<T> >class hash_container;

This is very similar to the SGI Declaration of the hash container. The main difference is that SGI provides the default type for hashfunction and comparefunction. The sgi declaration of hash_set looks basically like this (I have adjusted it a little for demonstration purposes ):

template<typename T,typename HashFunction = hash<T>,typename CompareFunction = equa_to<T>,typename Allocator = allocator<T> >class hash_set;

One noteworthy aspect of SGI design is the use of performance_to as the default comparison function. This violates the standard container Association convention-the default comparison function is less. The design result is not only a simple change to the default comparison function. The SGI hash container determines whether two objects in a hash container have the same value throughEqualTest, not equivalent (see clause 19 ). This is not an unreasonable decision for hashed containers, because containers associated with hashed columns do not need to be in order, unlike their standard (tree-based) brothers.

The hash containers designed by dinkumware adopt different policies. It still allows you to specify the object type, hash function type, comparison function type, and distributor type, however, it moves the default hash and comparison functions into a separate class named hash_compare with similar features, and uses hash_compare as the default value of the hashinginfo real parameter of the container template. (If you are not familiar with the concept of the "feature" class, open a good STL reference, for example, josutis's c ++ standard library [3] and learns the motivation and Implementation of the char_traits and iterator_traits templates .)

For example, this is the hash_set declaration of dinkumware (adjusted for demonstration again ):

template<typename T, typename CompareFunction>class hash_compare;template<typename T,typename HashingInfo = hash_compare<T, less<T> >typename Allocator = allocator<T> >class hash_set;

The interesting part of this interface design is the use of hashinginfo. The hash and comparison functions of containers are stored in the table, but the hashinginfo type also contains the minimum number of buckets in the control table and the enumeration of the maximum allowable ratio of container elements to buckets. When this ratio is exceeded, the number of buckets in the table increases, and some elements in the table need to be hashed again. (SGI provides a member function with a ratio similar to the bucket-to-bucket ratio of elements in a control table .)If you do not know the principles of bucket and hash, see the hash section in the data structure .)

After some adjustments are made for demonstration, hash_compare (default value of hashinginfo) looks more or less like this:

Template <typename T, typename comparefunction = less <t> class hash_compare {public: Enum {bucket_size = 4, // maximum ratio of elements to the bucket min_buckets = 8 // the minimum number of buckets}; size_t operator () (const T &) const; // The bool operator () Hash Function () (const T &, // Compare function const T &) const ;... // ignore some things, including // use of comparefunction}

Overload operator () (here is the implementation of hash and comparison functions) is a policy that appears more frequently than you can imagine. For another application of this idea, refer to clause 23.

The dinkumware design allows you to write your own hash_compare-like class (maybe derived from hash_compare), and as long as your class defines bucket_size, min_buckets, and two operator () functions (one with a real parameter and one with two parameters), and some things I have saved, you can use it to control the configuration and behavior of hash_set or hash_multiset of dinkumware. Configuration control for hash_map and hash_multimap is similar.

Note that whether it is SGI or dinkumware design, you can leave all the decisions for implementation and simply write something like this:

Hash_set <int> inttable; // creates an int hash.

To compile the hash function, the hash function must contain an integer (for example, INT), because the default hash function is generally limited to the integer type. (The default hash function of SGI is slightly more flexible. Clause 50 tells you where to find all the details .)

On the backend, the implementation methods of SGI and dinkumware are very different. SGI uses an open hash method consisting of a pointer array (bucket) of a commonly used single-link table of an element. Dinkumware also utilizes the open hash method, but its design is based on a novel Data Structure-an element bidirectional linked list consisting of arrays of iterators (essentially buckets, the adjacent pair of the iterator represents the range of elements in each bucket. (For details, refer to the topic column related to plauger, hash tables [16].)

As users of these implementations, you may be interested in the fact that SGI Stores Table Elements in a single-chain table, while dinkumware uses a two-way linked list. The difference is worth noting because it affects the types of iterators used for two implementations. The SGI hashed container provides a forward iterator, so you have to discard the reverse iteration capability: the SGI hashed container does not have rbegin or rend member functions. The iterators used for dinkumware hashed containers are bidirectional, so they can provide forward and reverse traversal. In terms of memory usage, SGI is designed to be a little more frugal than dinkumware.

Which design is most beneficial to you and your program? I cannot know. Only you can determine, and these terms do not try to give you enough information to draw a reasonable conclusion. Instead, the goal of these terms is to let you know that, although STL itself lacks a hash container, it is not hard to get a hash container compatible with STL (with different interfaces, capabilities, and behavioral trade-offs. For SGI and stlport implementations, you can even get them for free because they can be freely downloaded.

 

 

In actual development, if you are using Visual Studio 2005 or later, you will find that Microsoft has its own hash container. For more information, see msdn. If you need a hash container, you can use it with confidence.

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.