Data structure hash sum one: Theory Learning Chapter __ Data structure

Source: Internet
Author: User
Data structure hash sum one: Theory study Chapter
Data structure Hash sum two: Program Learning Chapter
Data Structure Hash Summary three: Practice basic article
Data Structure Hash Summary four: program advanced article
Data Structure Hash Summary five: Nginx in the hash (version 0.1)
Reprint please indicate the source http://blog.csdn.net/yankai0219/article/details/8185796 0, learning methods to study the theory of learning, access to the program study, and then back to learn the theory and practice of a basic concept 1.Hash DefinitionHash definition: the any lengthThe input, through Hashing AlgorithmBecome Fixed lengthOutput, which is the hash value.      hash function: A function that compresses messages of any length into a message digest of a fixed length. Hash table: A kind of data structure, not only satisfies the search convenience, but also does not occupy too much content space. 2.Hash Use1 The hash is mainly used in the information security Domain encryption algorithm, some different lengths of information into a cluttered 128-bit coding. 2) is widely used in mass data processing. 3. Common hash function (hash algorithm)1) Division hashing Method 2 squared Hash Method 3) Fibonacci Hash method 4.Hash Table IntroductionHash Table Essence: Classification method.      Hash table: A new data structure, which is characterized by easy addressing of arrays and easy insertion and deletion of linked lists.                                    Hash table image description: If there is a pile of books, as linked list and linear table, disorderly, messy, looking up very troublesome; if numbered, using the second division, you can quickly find      If you can classify according to engineering, Science and humanities, you can find it faster. Hash Table implementation: A variety of implementations, the most commonly used is the zipper method. 5. Illustrate hash functionIn the following string hash function, the hash value is computed by iterating through the string, passing the expression hash = 31*hash +*p. Many hash functions are so manipulated, but their expressions (hash algorithms) are different.
unsigned int yk_simple_hash (char *str,int str_len)
{
Register unsigned int hash;
Register unsigned char *p;
int i;

for (hash = 0, i = 0, p = (unsigned char *) str; *p && i < Str_len; p++,i++)
hash = * hash + *p;

Return (hash & 0x7fffffff);
}

ii. conflict-related1. Definition of conflict: Assuming that the address set of the hash table is 0~ (n-1), the conflict is already recorded in the location where the hash address obtained by the keyword is J (0<=j<=n-1). has been recorded on the hash address of the keyword, then it is called a conflict2. Handling conflicts: A hash address for this keyword to be stuck to another "null". That is, when dealing with a hash address conflict, if the resulting another hash address H1 still conflict, then the next address H2, if H2 still conflict, and then the H3, until the HK does not conflict, then HK is recorded in the table address. Ways to handle Conflicts: 1) Open addressable methodHi= (H (key) + di) MOD m i=1,2,... K (k<=m-1) where H (key) is a hash function; M is the hash table length; Di is the increment sequence.                There are 3 increment sequences: 1 linear probe re-hashing: di=1,2,3,..., m-1 2) Two detection re-hashing: di=1^2,-1^2,2^2,-2^2,.... +-k^2 (K&LT;=M/2) 3 pseudo-random detection and di=: sequence of pseudo-random numbers Disadvantages:We can see a phenomenon where records are already filled in at the i,i+1,i+2 position in the table, and the next hash address for i,i+1,i+2 and i+3 will be filled in i+3. The phenomenon of the two first hash addresses that occurred during the process of handling the conflict, which competes for the same subsequent hash address, is called " two times gathered, which adds a non-synonym conflict to the process of dealing with synonyms. On the other hand, using linear probing and hashing to deal with conflicts can ensure that, as long as the hash table is not filled, there is always a conflict-less address HK. The two-time probing and hashing is only possible if the hash table is long m as a prime number in the form of 4j+3 (j is an integer). that is, open addressing will result in two times of aggregation, which is unfavorable to the search2) Again HashifaHi = RHI (key), i=1,2,... K RHI are different hash functions that compute the address of another hash function when the synonym generates an address conflict until no conflict occurs.                This method is not easy to generate aggregation, but increases the computational time. Disadvantage: Increased computational time. 3) Chain Address method (Zipper method)Records all keywords as synonyms are stored in the same linear list. 4 Create a public overflow areaAssuming that the domain of the hash function is [0,m-1], the vector hashtable[0...m-1] is the base table, each component holds one record, and the other sets the vector overtable[0....v] as the overflow table.           All keywords and records with synonyms for the keywords in the base table, regardless of what hash address they get from the hash function, and when a conflict occurs, fill in the overflow table. The advantages of zipper method: ① Zipper method to deal with the conflict is simple, and no accumulation phenomenon, that is, non-synonyms will never conflict, so the average search length is shorter;
② because the node space on each linked list in the Zipper method is dynamically applied, it is more suitable for the Cong before the table length can be determined.
In order to reduce the conflict, the ③ open addressing method requires that the loading factor alpha is smaller, so it will waste a lot of space when the node size is large. But the Zipper method is desirable α≥1, and when the node is large, the added pointer field in the Zipper method can be neglected, thus saving space;
④ in a hash table constructed with a zipper method, the operation of deleting nodes is easy to implement. Simply delete the corresponding node on the linked list. In the case of a hash list constructed by an open address method, deleting a node cannot simply leave the space of the deleted node blank, or it will truncate the lookup path of the synonym node after it is populated. This is because in various open address methods, empty address units (that is, open addresses) are conditions for finding failures. Therefore, in the open address method to deal with the conflict of the hash on the delete operation, can only be deleted in the node to do the deletion mark, but can not really delete the shortcomings of the Zipper method: Zipper Method's disadvantage is: the pointer needs additional space, so when the node size is small, The open addressing method is more space-saving, and if the reduced pointer space is used to enlarge the size of the hash table, the loading factor can be reduced, which reduces the conflict in the open addressing method, thus improving the average lookup speed. three. Find:The lookup process from the hash table is visible: 1 Although the hash table directly establishes a direct image in the location of the key word and record storage, the lookup process of the hash table is still a comparison of a given value and a keyword because of the "conflict" generation. It is therefore still necessary to Average lookup lengthAs a measure of the search efficiency of a hash table.                2 The number of keywords to compare to a given value in the lookup process depends on the following three factors: hash function, method of handling conflict, and loading factor of hash table. In general, the average lookup length of a hash table that handles the same method of collision is dependent on the loading factor of the hash table. Loading factor = (number of records filled in the table)/(hash length). The smaller the filling factor, the smaller the likelihood of a conflict; Conversely, the larger the filling factor, the more records that have been filled in the table, the greater the likelihood that the conflict will occur when the record is filled, and the more the number of keywords to be compared for a given value is found. four about hash usage and learningFirst of all must understand the hash table zipper method. Because many applications are using the hash table zipper method. For the Zipper method, we can be understood to be an array of linked lists. 1 each element of the array is a linked list.      2 all nodes in a list have the same hash value, and the hash value is the subscript of this array element. Second, we must learn to initialize the hash table, insert elements, look for elements of the three major operations.

Reprint please indicate the source http://blog.csdn.net/yankai0219/article/details/8185796










Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.