"Chaos" algorithm-HASH table (HASH) algorithm explanation

Source: Internet
Author: User

Disclaimer: copyright. You are welcome to reprint it. Contact mailbox: yiluohuanghun@gmail.com]

In data search, we will think of many good and effective methods, which are roughly divided into the following types. 1. It is quite good to use the Binary Search Method for spatial continuous data. 2. It is also a good method for sorting Binary Trees with disconsecutive spaces. 3. However, if we still use the above two methods for spatial discontiguous and large data volumes, it is obviously not enough. Of course we can use the hash table query method. The hash table inherits the advantages of arrays that are easy to locate and search, while the linked list is easy to add or delete.

What is Hash:

Hash, which is usually translated as "Hash" and is also directly translated as "Hash", that is, the input of any length is also called pre- ing, pre-image ), the hash algorithm is used to convert an output with a fixed length. The output is the hash value. This type of conversion is a compression ing, that is, the space of hash values is usually much smaller than the input space, and different inputs may be hashed into the same output, instead, it is impossible to uniquely determine the input value from the hash value. Simply put, a function compresses messages of any length to a fixed-length message digest.

HASH is mainly used for encryption algorithms in the information security field. It converts information of different lengths into messy 128-bit codes. These encoding values are called HASH values. it can also be said that hash is to find a ing between the data content and the data storage address.

Arrays are characterized by ease of addressing and difficulty in insertion and deletion. linked lists are characterized by difficulties in addressing and insertion and deletion. So can we combine the two features to make a data structure that is easy to address and easily inserted and deleted? The answer is yes. This is the hash table to be mentioned. There are many different implementation methods for hash tables. What I will explain next is the most commonly used method-the zipper method, we can understand it as an array of linked lists ",

650) this. width = 650; "src =" http://www.bkjia.com/uploads/allimg/131228/16242R4B-0.png "title =" Fig. PNG "/>

The left is obviously an array. Each member of the array contains a pointer pointing to the head of a linked list. Of course, this linked list may be empty or contain many elements. We distribute elements to different linked lists based on some features of the elements. We also find the correct linked list based on these features and then find this element from the linked list.

The method for converting element features into arrays is the hash method. Of course, there are more than one hash method. The three types are listed below.

Commonly used:

1. Division hash

The most intuitive method is the hash method. The formula is as follows:

Index = value % 16

All those who have learned assembly know that the modulus is actually obtained through a division operation, so it is called the Division hash method ".

2. Square hash Method

Index is a very frequent operation, and the multiplication operation is more time-saving than division. For the current CPU, we cannot feel it ), so we want to replace division with multiplication and a displacement operation. Formula: index = (value * value)> 28 shifted right by 2 ^ 28. Note: shift left to enlarge, Which is multiplication. Shift right to a smaller value, which is division .) If the value distribution is relatively uniform, this method can produce good results, but the index calculated by the values of each element in the graph I drew above is 0-very failed. Maybe you still have a problem. If the value is large, will the value * value not overflow? The answer is yes, but we do not care about overflow in this multiplication, because we are not trying to get the multiplication result, but to get the index.

3. Fibonacci) hash

The disadvantages of the square hash method are obvious, so can we find an ideal multiplier instead of using the value itself as the multiplier? The answer is yes.

1. For a 16-digit integer, the multiplier is 40503.

2. For a 32-bit integer, the multiplier is 2654435769.

3. For a 64-bit integer, the multiplier is 11400714819323198485.

How are these "ideal multiplier" obtained? This is related to a rule called the golden division rule, and the most classic expression that describes the golden division rule is undoubtedly the famous Fibonacci series, that is, the sequence in this form:

0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89,144,233,377,610,987,159 7,

2584,418 1, 6765,109 46 ,.... In addition, the Fibonacci series and the orbital half of the eight planets in the solar system

The ratio of diameter is surprisingly consistent.

For our common32The formula is as follows:

Index = (value * 2654435769)> 28

If the Fibonacci hash is used, the figure above becomes like this:

650) this. width = 650; "src =" http://www.bkjia.com/uploads/allimg/131228/16242QX5-1.png "title =" Fig "/>

Obviously, it is much better to use the Fibonacci hash method after adjustment than the original scatter method.

Applicability:

The basic data structure to be deleted, which usually requires a total amount of data to be stored in the memory.

Basic principles and key points:

Hash function Selection, for strings, integers, sorting, specific hash method. For collision processing, one is open hashing, also known as the zipper method, and the other is closed hashing, also known as the open address method and opened addressing.

Extension:

D-left hashingThe d in is multiple meanings. Let's first simplify this problem and take a look at 2-left hashing. 2-left hashing refers to dividing a hash table into two halves of the same length, namely T1 and T2, and configuring a hash function, h1 and h2 for T1 and T2 respectively. When a new key is stored, two hash functions are used for calculation to obtain the addresses h1 [key] and h2 [key]. In this case, you need to check the h1 [key] position in T1 and the h2 [key] position in T2. Which location has already been stored with a collision?) There are many keys, store the new key in a location with less load. If the two sides are as many as one, for example, if both locations are empty or both of them store a key, the new key is stored in the T1 subtable on the left, and 2-left is also stored. When searching for a key, you must perform two hashes and query both locations at the same time.


This article from the "Yi fall Dusk" blog, please be sure to keep this source http://yiluohuanghun.blog.51cto.com/3407300/1258577

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.