Why hash?
Hash Tables are implemented based on arrays.AlgorithmIt is how to convert the key value (key) into an array of small objects. hashing can provide very high operation (insert, delete, query) efficiency, because of the query of Ordered arrays, even binary search can only achieve O (logn), because the hash can directly convert the key to be queried into an array small mark, so it can reach the O (1) Time level.
Hash Algorithm: The hash value of the key is called hashcode. The hashcode value range may be large. The hash algorithm converts a large range of hashcode into a value of a fixed length range. A good hash algorithm should make the hashcode evenly distributed in the interval. However, different keys will inevitably generate the same hashcode, which is a hash conflict.
For example, if you want to hash the information of the company's 100 employees, you must use the name as the key value. The hashcode of "Frank" is 3135416. If you use 3135416 as the logo, it is a waste of resources to use a very large array. MoD is the simplest and most effective hash algorithm, which is 3135416% 100 = 16. In this way, you can use array [16] to store "Frank" information.
How to solve hash conflicts?
- Open address Method
Linear detection, secondary detection, and hash.
For example, the hashcode of "Leo" is 733616. After the modulo 100 hash algorithm is used, its value is 16 because array [16] is occupied and a "Conflict" occurs ", check the next position 17 of 16 in sequence to see whether array [17] is available. If possible, store Leo. Otherwise, continue to test the next position 18 until there is a vacancy. When querying, if the hash value of a key is 16, array [16] cannot be simply returned, because the hash value of 16 May be Frank or Leo, so we need to compare the key value.
- Link address Method
In Java, hashmap uses the writable list to resolve hash conflicts.
Reference: http://en.wikipedia.org/wiki/Hash_function
Note the following when using Java hashmap:
1. hashmap is NOT thread-safe
Do not use hashmap in the concurrent environment, which may cause unexpected consequences,According to Sun, expansion will lead to a closed loop of the linked list. When the get element is used, there will be an infinite loop, with the consequence being cpu100 %.
If concurrent is a must, use map M = collections. synchronizedmap (New hashmap (...))
2.If the data size is fixed, it is best to set a reasonable capacity value for hashmap.
Because the default capacity of hashmap is capacity (16) * loadfactor (0.75) = 12, if your hashmap is used to load 10 thousand pieces of data, then hashmap will be continuously expanded (resize ), the original data will be re-arranged for each expansion, which is very time-consuming. Therefore, new hashmap (15000) and load factor should be used during hashmap initialization) use the default value 0.75.
More about loadfactor: what is the use of this loadfactor? It is used to reduce collision. Imagine if the closer it is to the full load, the higher the possibility of conflict. Just like there are 16 seats on the bus, five people are coming up at this time, there is almost no such situation, because it is very empty. If there are 10 more people, 15 people may have 16 seats. Some people may think that the front seats are better, and they may conflict with each other and greet their parents. It is now stipulated that the 16-seat bus can only carry 12 passengers, that is, the load factor is 0.75, and the people who come later are loaded with another empty car, which can effectively solve the conflict.
3. Conditions for making a hashmap key
Hashmap uses an array entry [] to store the corresponding entry <K, V>. Therefore, during the put (K, v) operation, check the hashcode of the key first, check whether the entry already exists in the array. If it does not exist, add it to the array. Otherwise, K. equals () compares K in the linked list of entries in an existing array. If the values are the same, replace them. If the values are not, place the entry in the head of the linked list.
In the get () operation, the value is returned only when hashcode is checked first and equals is compared.
Therefore, this is why the key used as a hashmap must override hashcode and equals.
Reference: http://www.iteye.com/topic/754887
4. Comparison of hashmap, hashtable, and hashset
1. hashtable is a subclass of dictionary, and hashmap is an implementation class of the map interface;
2. Methods in hashtable are synchronized, while those in hashmap are not synchronized by default.
3. hashmap allows the key and value to be null, while hashtable does not.
4. hashtable directly uses the hashcode of the object. hashmap has its own hash algorithm.
5. The difference between hashset and arraylist is that hashset does not allow repeated elements. You can use hashmap. keyset () to obtain the key set in hashmap.
I found a blog post on in-depth hash research and the solution to the top K Problem Using Hash (there are 10 million entries (Query string)Record to find the top 10 most frequentlyMethod for querying strings)
Http://blog.csdn.net/v_july_v/article/details/6256463