Paper discussion: Hash Table small speakers start broadcasting (Ethernet and WiFi protocols) I try my best (IP protocol details)

Source: Internet
Author: User

Author: vamei Source: http://www.cnblogs.com/vamei welcome reprint, please also keep this statement. Thank you!

 

Hash

Hash table)Is from one set a to another set BMapping). Ing isCorrespondenceAnd an element of set a can only correspondOneElement. However, one element in Set B may correspond to elements in multiple set. If the elements in B can only correspond to one element in a, such a ing is calledOne-to-one ing. Such mappings are common in real life, such:

A-> B

Person->ID card number

Date->Constellation

 

In the above two mappings,Person->ID card numberYesOne-to-one ing. In a hash table, the preceding process is calledHashing. Element A in a corresponds to Element B in B, which is calledKey), B is called a'sHash Value).

Wei xiaobao's hash value

 

Ing is equivalent to a function in mathematics.F (x): A-> B. For example, f (x) = 3X + 2. The core of a hash table isHash Function)This function specifies how elements in set a correspond to elements in Set B. For example:

A: three-digit integerHash (x) = x % 10B: an integer.

1044

8766

1922

In the preceding correspondence, the hash function is representedHash (x) = x % 10. That is to say, to give a three-digit number, weTake the last bit of itAs the hash value of the three-digit number.

 

Hash Tables are widely used in computer science. For example:

For more information about how to broadcast packets with a small horn (Ethernet and Wi-Fi protocols)

Checksum in the IP protocol: see my best effort (detailed description of the IP protocol)

Hash Value in git: see version management Three Kingdoms

In the above application, we use a hash value to represent the key value. For example, in git, the file content is a key value and ShaAlgorithmAs a hash function, the file content corresponds to a fixed-length string (hash value ). If the file content changes, the corresponding string will change. Git uses a relatively short hash value to check whether the file content has changed.

 

Another example is the computer login password, which is generally a string of characters. However, for the sake of security, the computer will not directly Save the string, but will save the hash value of the string (MD5, Sha, or other algorithms are used as the hash function ). Enter the password string when you log in next time. If the hash value of the password string is the same as the saved hash value, the user enters the correct password. In this way, even if a hacker breaks into the password record in the database, he can only see the hash value of the password. The hash function used above has a good one-way property: It is difficult to deduce the key value from the hash value. Therefore, hackers cannot obtain the user's password.

(We have reported the time when passwords of multiple website users were leaked because these websites store Plaintext Passwords instead of hash values. For details, see the plaintext passwords of multiple websites involved in csdn leaks as the focus of controversy)

 

Note that hash only requires a ing from A to B, andNoDefine the ing to one-to-one ing. Therefore, two different key values correspond to the same hash value. This is calledHash collision). For example, the checksum in the network protocol may occur, that is, the content to be verified is different from the original text, but it is the same as the checksum (hash value) generated in the original text. For example, the MD5 algorithm is often used to calculate the hash value of a password. Experiments show that MD5 algorithms may collide, that is, different Plaintext Passwords generate the same hash value, which brings a huge security vulnerability to the system. (Refer to hash collision)

 

Hash and search

Hash Tables are widely used for search. Set set a as the search object, Set B as the storage location, and use the hash function to match the search object with the storage location. In this way, we can find the object location through a hash. A common scenario is to set Set B to an array subscript. Because Arrays can be randomly accessed Based on the array subscript (random access, the algorithm complexity is 1), the search operation depends on the complexity of the hash function.

 

For example, we use the name (string) as the key value and the array subscript as the hash value. Each array element stores a pointer pointing to a record (someone name and phone number ).

 

Below is a simple hash function:

 # DefineHashsize 1007 
/* By vamei
* Hash Function
*/ int Hash ( char * P) { int value = 0 ; while (* P )! = ' \ 0 ' ) {value = value + ( int ) (* P); // convert Char to int, and sum P ++ ;} return (Value % hashsize); // Won's exceed hashsize }

Hash Value of "vamei": 498

Hash Value of "Obama": 480

 

We can create a hashsize Array records for storing records. Hashsize is selected as a prime number so that hash values can be evenly distributed. When searching for "vamei" records, you can obtain the hash value 498 through hash, and then directly read records [498] to read the records.

(666666 is the phone number of Obama and 111111 is the phone number of vamei. It is purely fabricated. Do not take it seriously)

Hash search

If you do not use Hash but search in an array, you need to access each record in sequence until the target record is found. The algorithm complexity is N. We can consider why there is such a difference. Although the array can be read randomly, the subscript of the array is random and has no relationship with the element value. Therefore, we need to access each element one by one. Through the hash function, we define the elements that can be stored at each subscript position. In this way, we can use the key value and hash function to have a considerable amount of prior knowledge to select the appropriate subscript for search. Without a hash collision, we only need to select one time to ensure that the subscript points to the element we want.

  Resolve Conflicts

The hash function must solve the hash conflict problem. For example, in the hash function above, "Obama" and "oaamb" have the same hash value, causing a conflict. How can we solve this problem?

 

One solution is to store conflicting records in a linked list and point the hash value to the linked list. This is calledOpen hashing:

Open hashing

When searching, we first find the linked list based on the hash value, and then traverse the linked list based on the key value until we find the record. We can replace the linked list with other data structures.

 

A pointer is required for open hashing. We sometimes want to avoid using pointers to maintain the advantage of Random storage, so we useClosed hashingTo resolve the conflict.

Closed hashing

In this case, we put records into an array. When a conflict occurs, we place the conflict record in the array, which is still idle. After Obama is inserted in the array, the subsequent oaamb is also hashed to the 480 position. However, because 480 is occupied, oaamb detects the next idle location (by adding the hash value to 1) and records it.

The key to closed hashing is how to detect the next location. The hash value is added to 1. However, there can be other methods. In summary, we should testPosition (I) =(H (x) + f (I) % hashsize. The above hash value plus 1 is equivalent to settingF (I) = 1.When searching, we can use position (I) to detect the possible locations of records until records are found.

(F (I) will bring different results. I will not go into depth here)

If the array is full, closed hashing needs to perform many tests to find the vacant space. This greatly reduces the efficiency of insert and search. In this case, you need to increase the hashsize and add the original records to the new large array. Such an operation is calledRehashing.

 

Summary

Hash table, search

Hash conflict, open hashing, closed hashing

 

Welcome to the "paper discussion: algorithms and data structures" series.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.