hash function of data structure

Last Update:2016-05-14 Source: Internet

Author: User

Tags random seed

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1, hash list (hash table, also known as a hash table), is a data structure that is accessed directly from the key code value (key value). Provides quick insert and find operations, which are based on an array implementation. The basic idea is to map the keyword key evenly to a number within the range of the hash table subscript 0~tablesize-1.

2, Hash function construction method:

1> Direct Addressing method: the so-called direct addressing method is to take a key word of a linear function value is a hash address, that is,

Pros: Simple, even, and no conflict. Disadvantage: You need to know the distribution of the keyword in advance, suitable for finding small and continuous cases of the table.

Because of this limitation, in real-world applications, this method is simple, but it is not commonly used.

2> Digital Analysis Method: If the key is more than the number of digits, such as the 11-digit mobile phone number "130****1234", of which the first three is the access number, the median four is the HLR identification number, indicating the user number of the place of attribution; After four is the real user number. As shown in.

If you now want to store a company's registration form, if you use the phone number as the keyword, most likely the first 7 bits are the same, select the last four to be hashed address is a good choice. If it is prone to conflict, the extracted figures are reversed, the right ring displacement, and so on. The general purpose is to provide a hash function that can reasonably assign keywords to the various locations of the hash table.

This method can be considered if the distribution of the keywords is known in advance and the distribution of the keywords is more uniform, because the digital analysis method is suitable for dealing with the large number of keywords.

3> Square Method: This method is very simple calculation, assuming that the keyword is 1234, then its square is 1522756, then extract the middle of the 3 is 227, used as a hash address.

The square-take method is more suitable for the distribution without knowing the keyword, and the number of bits is not very large.

3, Hashiha: (1) Direct keywords as index, (2)? Converts a string to an index: Converts a string into an ASCII code, then adds it, multiplies the power, and compresses an optional value.

4. Methods for handling hash conflicts:

　　4.1 Open Addressing method : The so-called open addressing method is to find the next empty hash address once a conflict occurs, as long as the hash table is large enough, the empty hash address is always found, and the record is deposited.

　　Its formula is:

For example, the keyword set is {12, 67, 56, 16, 25, 37, 22, 29, 15, 47, 48, 34}, with a table length of 12. hash function f (key) = key MoD 12.

When calculating the first 5 numbers {12, 67, 56, 16, 25}, there is no conflicting hash address, which is deposited directly, as shown in the following table.

Calculates key = 37 o'clock and finds F (37) = 1, at which point it conflicts with the position of 25. So apply the above formula F (PNS) = (f (PNS) + 1) MoD 12 = 2,. The 37 is then credited to the position labeled 2. As shown in the following table.

Next 22,29,15,47 no conflict, normal deposit, as shown in the following index.

By 48, the calculation of f (48) = 0, and the 12 where the 0 position conflict, it does not matter, we f (to) = (f (+) + 1) MoD 12 = 1, at this time, and 25 where the position conflicts. So f (=) = (f (+) + 2) MoD 12 = 2, or conflict ... up to F = (f (+) + 6) MoD 12 = 6 o'clock, there is no vacancy, as shown in the table below.

The open addressing method of conflict resolution is called the Linear detection method .

Consider a deep step, if this happens, when the last key = 34,f (key) = 10, and 22 in the position of the conflict, but 22 there is no empty position, but it is in front of an empty position, although you can continue to find the results after the remainder, but the efficiency is very poor. So can improve di=12, -12, -22.........Q2,-Q2 (q<= m/2), so that it is possible to find two-way to the potential empty position. For 34来 said, take di = 1 to find the empty position. In addition, the purpose of increasing the square operation is to keep the keywords from aggregating in an area. This method is called two-time detection method.

There is also a method that, in the case of conflict, the displacement of Di is calculated by random function, called random detection method .

since it is random, does the search not also randomly generate di? How do I get the same address? The randomness here is actually a pseudo-random number. Pseudo-random numbers that is, if the random seed is set to the same, then the random function is called to generate a sequence that does not repeat, in the search, with the same random seed, it each time the sequence is figured, the same di of course can get the same hash address.

in short, the open addressing method, as long as the hash table is not filled, always find the address that does not conflict, is a common method of conflict resolution.

public void Insert (info info) {

Get keywords

String key = Info.getkey ();

The number of hashes that the keyword has customized

int hashval = hashcode (key);

If the index is already occupied, and the data is not deleted

while (Arr[hashval]! = null && arr[hashval].getname ()! = null) {

For sliding scale

++hashval;

Cycle

Hashval%= arr.length;

}

Arr[hashval] = info;

}

Public Info Find (String key) {

int hashval = hashcode (key);

while (Arr[hashval]! = null) {

if (Arr[hashval].getkey (). Equals (key)) {

return Arr[hashval];

}

++hashval;

Hashval%= arr.length;

}

return null;

}

Public Info Delete (String key) {

int hashval = hashcode (key);

while (Arr[hashval]! = null) {

F (Arr[hashval].getkey (). Equals (key)) {

Info tmp = Arr[hashval];

Tmp.setname (NULL);

return TMP;

}

++hashval;

Hashval%= arr.length;

}

return null;

}

4.2 Re-hash function method:

for a hash table, you can prepare multiple hash functions beforehand.

here RHI is a different hash function, can be said before the remainder, folding, square take all the use. Each time a hash address conflict occurs, a hash function is calculated.

This method allows the keyword not to generate aggregation, but also increases the time of calculation accordingly.

4.3 Chain address method:

Stores all keywords as synonyms in a single-linked list, which is called a synonym child table, and only the pointers in front of all the synonyms ' child tables are stored in the hash list. For the keyword set {12, 67, 56, 16, 25, 37, 22, 29, 15, 47, 48, 34}, the remainder is obtained using the same 12 as before, and the structure can be achieved by the addition of the remainder method.

At this point, there is no conflict to address the problem, no matter how many conflicts, only in the current position to the single-linked list to increase the node problem.

The chain address method provides no guarantee that an address cannot be found for a hash function that may cause a lot of collisions. This, of course, brings the performance loss of a single-linked list that needs to be traversed during a lookup.

Summary: The Open Address method stores all nodes in a hash (hash) t[0..m-1], and the link method links the nodes of the synonym to one but the linked list, while the head pointer of the list is placed in the hash table t[0..m-1]. Compared with open address chain address method has the following advantages: 1, the chain address method to deal with the conflict is simple, and no accumulation phenomenon, that is, non-synonym never conflict, so the average search length is short. 2, the link in the address method of the link list is a dynamic application, it is more suitable for watchmaking can not determine the length of the case, 3, open addressing method in order to reduce the conflict requirements fill factor is small, so the node scale is larger when a lot of space is wasted, and the chain address method can be larger than 1 filling factor, The addition of the pointer field in the Zipper method can be ignored, so save space, 4, the chain address method constructs the hash list to delete the node is very convenient, simply delete the linked list on the corresponding node can be.

public class HashTable {

Private linklist[] arr;

Public HashTable () {

arr = new linklist[100];

}

Public HashTable (int maxSize) {

arr = new Linklist[maxsize];

}

public void Insert (info info) {

Get keywords

String key = Info.getkey ();

The number of hashes that the keyword has customized

int hashval = hashcode (key);

if (arr[hashval] = = null) {

Arr[hashval] = new linklist ();

}

Arr[hashval].insertfirst (info);

}

Public Info Find (String key) {

int hashval = hashcode (key);

return Arr[hashval].find (key). info;

}

Public Info Delete (String key) {

int hashval = hashcode (key);

return Arr[hashval].delete (key). info;

}

public int hashcode (String key) {

int hashval = 0;

for (int i = Key.length ()-1; I >= 0; i--) {

int letter = Key.charat (i)-96;

Hashval + = letter;

//}

return hashval;

BigInteger hashval = new BigInteger ("0");

BigInteger pow27 = new BigInteger ("1");

for (int i = Key.length ()-1; I >= 0; i--) {

int letter = Key.charat (i)-96;

BigInteger Letterb = new BigInteger (string.valueof (letter));

Hashval = Hashval.add (letterb.multiply (pow27));

pow27 = pow27.multiply (New BigInteger (String.valueof (27)));

}

Return Hashval.mod (New BigInteger (String.valueof (arr.length))). Intvalue ();

}

public class Linklist {

Head Knot Point

Private Node first;

Public linklist () {

first = null;

}

public void Insertfirst (info info) {

Node node = new node (info);

Node.next = First;

First = node;

}

Public Node Deletefirst () {

Node tmp = First;

first = Tmp.next;

return TMP;

}

Public Node Find (String key) {

Node current = first;

while (!key.equals (Current.info.getKey ())) {

if (Current.next = = null) {

return null;

}

current = Current.next;

}

return current;

}

Public Node Delete (String key) {

Node current = first;

Node previous = first;

while (!key.equals (Current.info.getKey ())) {

if (Current.next = = null) {

return null;

}

previous = current;

current = Current.next;

}

if (current = = first) {

first = First.next;

} else {

Previous.next = Current.next;

}

return current;

}

public class Node {

Data fields

public info Info;

Pointer field

Public Node Next;

Public Node (Info info) {

This.info = info;

}

Reference Link: http://blog.chinaunix.net/uid-26548237-id-3480645.html

hash function of data structure

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More