7. Hash and hash

Source: Internet
Author: User

7. Hash and hash

Hash, also known as Hash, is a common algorithm. Hash is mainly used in the HashMap data structure of Java. The hash algorithm consists of a hash function and a hash table. We can know the features of the array that we can use the subscript to quickly locate the element (O (1). Similarly, in the hash table, we can quickly locate a value through the key (hash value, the hash value is calculated using the hash function (hash (key) = address. The hash value can be used to locate the element [address] = value. The principle is similar to that of an array.

Of course, the Best hash function is that each key value can calculate a unique hash value, but there may be different hash values of key values, which leads to a conflict, two aspects of judging whether a hash function is well designed:

1. fewer conflicts.

2. fast computing.

Below are several common hash functions, all of which have certain mathematical principles behind them and have been put into practice. Their mathematical principles are not explored here.

BKDR hash function (h = 31 * h + c)

This hash function is used to calculate the string hash value in Java.

 

// String # hashCodepublic int hashCode () {int h = hash; if (h = 0 & value. length> 0) {char val [] = value; for (int I = 0; I <value. length; I ++) {h = 31 * h + val [I]; // BKDR hash function, the constant can be 131, 1313, 13131 ...... } Hash = h;} return h ;}

 

DJB2 hash function (h = h <5 + h + c = h = 33 * h + c)

ElasticSearch uses the DJB2 hash function to hash the specified key of the document to be indexed.

SDBM hash functions (h = h <6 + h <16-h + c = 65599 * h + c)

It is applied in SDBM (a simple database engine.

The above lists the three hash functions. Let's test them to see how they conflict.

Java

1 package com. algorithm. hash; 2 3 import java. util. hashMap; 4 import java. util. UUID; 5 6/** 7 * Number of hash function conflicts compared with 8 * Created by yulinfeng on 6/27/17. 9 */10 public class HashFunc {11 12 public static void main (String [] args) {13 int length = 1000000; // 1 million string 14 // use HashMap to calculate the number of conflicts. The key values of HashMap cannot be repeated. size () is the number of conflicting HashMap <String, String> bkdrMap = new HashMap <String, String> (); 16 HashMap <String, String> djb2Map = new HashMap <String, String> (); 17 HashMap <String, String> sdbmMap = new HashMap <String, String> (); 18 getStr (length, bkdrMap, djb2Map, sdbmMap); 19 System. out. println ("BKDR hash function 1 million number of string conflicts:" + (length-bkdrMap. size (); 20 System. out. println ("DJB2 hash function 1 million number of string conflicts:" + (length-djb2Map. size (); 21 System. out. println ("SDBM hash function 1 million number of string conflicts:" + (length-sdbmMap. size (); 22} 23 24/** 25 * Generate a String and calculate the number of conflicts 26 * @ param length27 * @ param bkdrMap28 * @ param djb2Map29 * @ param sdbmMap30 */31 private static void getStr (int length, HashMap <String, string> bkdrMap, HashMap <String, String> djb2Map, HashMap <String, String> sdbmMap) {32 for (int I = 0; I <length; I ++) {33 System. out. println (I); 34 String str = UUID. randomUUID (). toString (); 35 bkdrMap. put (String. valueOf (str. hashCode (), str); // Ja String of va. hashCode is the BKDR hash function, h = 31 * h + c36 djb2Map. put (djb2 (str), str); // DJB2 hash function 37 sdbmMap. put (sdbm (str), str ); // SDBM hash function 38} 39} 40 41/** 42 * djb2 hash function 43 * @ param str44 * @ return45 */46 private static String djb2 (String str) {47 int hash = 0; 48 for (int I = 0; I! = Str. length (); I ++) {49 hash = hash * 33 + str. charAt (I); // h = h <5 + h + c = h = 33 * h + c50} 51 return String. valueOf (hash); 52} 53 54/** 55 * sdbm hash function 56 * @ param str57 * @ return58 */59 private static String sdbm (String str) {60 int hash = 0; 61 for (int I = 0; I! = Str. length (); I ++) {62 hash = 65599 * hash + str. charAt (I); // h = h <6 + h <16-h + c = 65599 * h + c63} 64 return String. valueOf (hash); 65} 66}

The number of conflicts between 0.1 million, 1 million, and 2 million is as follows:

The number of conflicts between the three hash functions is similar.

Python3

1 import uuid 2 3 def hash_test (length, bkdrDic, djb2Dic, sdbmDic): 4 for I in range (length): 5 string = str (uuid. uuid1 () # Based on the timestamp 6 bkdrDic [bkdr (string)] = string 7 djb2Dic [djb2 (string)] = string 8 sdbmDic [sdbm (string)] = string 9 10 # BDKR hash function 11 def bkdr (string): 12 hash = 013 for I in range (len (string )): 14 hash = 31 * hash + ord (string [I]) # h = 31 * h + c15 return hash16 17 # DJB2 hash function 18 def djb2 (string ): 19 hash = 020 for I in range (len (string): 21 hash = 33 * hash + ord (string [I]) # h = h <5 + h + c22 return hash23 24 # SDBM hash function 25 def sdbm (string): 26 hash = 027 for I in range (len (string )): 28 hash = 65599 * hash + ord (string [I]) # h = h <6 + h <16-h + c29 return hash30 31 length = 10032 bkdrDic = dict () # bkdrDic ={} 33 djb2Dic = dict () 34 sdbmDic = dict () 35 hash_test (length, bkdrDic, djb2Dic, sdbmDic) 36 print ("BKDR hash function 1 million number of string conflicts: % d "% (length-len (bkdrDic) 37 print (" DJB2 hash function 1 million number of string conflicts: % d "% (length-len (djb2Dic ))) 38 print ("SDBM hash function 1 million number of string conflicts: % d" % (length-len (sdbmDic )))

A hash table is a data structure that must be used with hash functions to create an index for quick search-algorithm notes. Generally, it is a storage space with a fixed length. For example, the default HashMap hash table is an Entry array with a fixed length of 16. After a storage space with a fixed length, the problem is how to place the value at a location. Generally, if the hash value is m and the length is n, then the value is placed at m mod n.

 

It is the hash and hash tables, and the solution to conflicts (Zipper method ). There are many solutions after the conflict arises. There are some ways to hash it again until there is no conflict, and there are also ways to use the zipper method to concatenate elements at the same position using the linked list.

Imagine that the hash table in the preceding example has a length of 10, resulting in one conflict. If the hash table has a length of 20, it will not produce faster conflict searches, but it will waste more space, if the length of the hash table is 2, the system will put three conflicting queries in an inverted manner, which is slower, but saves a lot of space.Therefore, selecting the length of a hash table is crucial, but it is also an important challenge.

Supplement:

Hash is used in many aspects. For example, different values have different hash values. However, you can also design a hash algorithm to make similar or identical values have similar or identical hash values. That is to say, if the two objects are completely different, their hash values are also completely different; if the two objects are identical, their hash values are also identical; the more similar the two objects, then their hash values are more similar. This is actually a matter of similarity. That is to say, this idea can be applied to similarity calculation (for example, Jaccard distance), and finally to precise advertising and commodity recommendation.

In addition, consistent hashing can also be applied to Server Load balancer. A good hashing algorithm can also be used to ensure that each server can evenly share the load pressure.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.