Introduction to algorithms-Part 3 2hash table

Last Update:2018-12-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Hash table is a promotion of ordinary arrays, because it can directly address the array, so you canO (1)To access any element of the array. To illustrate the advantages and necessity of hash, we first introduce the introduction of direct addressing tables.

When the global U of the keyword is relatively small, direct addressing is simple and effective. In fact, it is an array. We directly use the keyword to correspond to the following table values. No matter whether it is searched or inserted or deleted, only O (1) is required). However, there is a problem with direct addressing: If the domain U is large, there will be a problem when the machine allocates memory (for example, when a-W array is defined in the previous sorting, it will stack overflow ); on the other hand, if the actually stored keyword set K is smaller than u, it will cause a great waste of space.

Speaking of this, I suddenly thought,Count sorting and bucket sortingWhen counting sorting is directly expanded, if the gap is large, the space will be wasted a lot. The auxiliary array C will have many elements of 0, so the introduction of the bucket will concentrate the elements, the probability of waste is very low. Let's look at it as a principle.

Speaking of hash tables, the core lies in the use of hash functions. To put it bluntly, hash tables are actually the way to map elements to Arrays for storage through hash functions, there may be conflicts in the ing process, that is, different elements are mapped to the same location, so there is a solution to the conflict.

Let's first try the Link Method to Solve the collision. The basic idea of the link method is to put all elements hashed to the same position in a linked list (this is actually like the one in the bucket sorting, haha, the conflict is inevitable, so how is the operation running time? The worst case of the insert operation is O (1), and the insert process is very fast, because it must be assumed that the inserted X is not in the table. Locate the location, insert (of course, you can also search before insertion to determine whether the table already exists). The worst case of the search operation is proportional to the table length. The deletion can also be completed in O (1. (However, there is a difference between a single-chain table and a double-chain table. It seems that a single-chain table does not work.) What is the performance of the link hash? In the worst case, all the keywords are stored in a linked list, with random (N). On average, all dictionary operations can be performed in O (1.

Hash function. A good hash function should contain all the keywords.Possible DistributionTo any of M slots. Speaking of hash, it seems that tankywoo liked hash very much.CodeAll output arrays are hash_print (). Such as the simplest hash function, Division hash, multiplication hash, and global hash. I personally like to divide hash columns, which is simple. The comparison of tankywoo is as follows:

Let's implement the simplest open addressing method,Open addressingThe advantage is that no pointer is needed, which saves space. The potential effect is to reduce collision and increase search speed, below we also come to a simple implementation of the next book hash-insert and a pseudo HASH-SEARCH2 code, http://www.cnblogs.com/xiangshancuizhu/articles/1894916.html also has a good implementation, you can refer to a bit

My implementation is as follows:

// ================================================ ============================/// Name: hash. CPP // Author: Xia // description: an example of the hash function open addressing method // ================================== ====================================#include <iostream >#include <ctime> using namespace STD; const int nil =-1; // For convenience, Nil is set to-1 const int M = 20; // hash table size void inithash (int t [m]) {for (INT I = 0; I <m; I ++) T [I] = nil;} void printhash (INT T [m]) {for (INT I = 0; I <M; I ++) cout <t [I] <""; cout <Endl;} int hashfunction (INT value, int I) {// the hash function requires H (K, 0), H (K ,..), H (K, M) // It Must Be <0, 1 .... an arrangement of m-1>. For simplicity, we take the remainder return (Value + I) % m;} int hashinsert (INT T [m], int K) {for (INT I = 0; I <m; I ++) {Int J = hashfunction (K, I); If (T [J] = nil) {T [J] = K; return J ;}} return nil; // hash overflow} int hashsearch (int t [m], int K) {// search for the keyword Kint J = hashfunction (K, 0) in hash table t; for (INT I = 0; T [J]! = Nil & I <m; I ++) {J = hashfunction (K, I); If (T [J] = k) return J ;}return nil ;} void hashdelete (int t [m], int K) {// Delete the keyword Kint J = hashsearch (T, k) in hash table t; // first find if (J! = Nil) T [J] = nil; elsecout <"error" <Endl;} int main (INT argc, char * argv []) {int hash [m]; inithash (hash); printhash (hash); srand (Time (null); int test [m]; for (INT I = 0; I <m; I ++) {test [I] = rand () % 10; // cout <test [I] <""; hashinsert (hash, test [I]) ;}cout <Endl; printhash (hash); cout 
The test [m] array uses a random number. In order to ensure that search is generally successful, a modulo operation is performed on 10 and a 48 value that cannot be found is used for testing. The running result is as follows:
 
 
 
 
 
 
The running result is basically reliable.
Regarding the deletion of open addressing, I disagree with the book. The book says: If K is deleted from I, you cannot just put nil in it and mark it as null, this is because I is occupied during the exploration of inserting the keyword K. (In fact, during the insertion, it is just to check whether it is nil, isn't it ?)
 
 
There are usually three methods for calculating the probe sequence in open addressing: Linear probe, secondary probe, and dual probe. Here we use the simplest linear exploration. Their comparison is as follows:
 
 
(Figure from tankywoo)
 
 
TestProgramThe disadvantage is that the size of the test array is the same as that of the hash array, which does not reflect the essence of hash: sparse + ing. During the test, you can change the size of the test array to half or use other methods. The value of hash lies in its excellent expected performance. O (1) is very strong. It exists like a god...
 
 
 
 
So far, get it done, cainiao goes on ~~~

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Introduction to algorithms-Part 3 2hash table

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Introduction to algorithms-Part 3 2hash table

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support