This is a creation in Article, where the information may have evolved or changed.
Two days ago, a small partner asked whether to see the implementation of the Go language map, then really did not see, so it took a little time to read the runtime source of the HashMap implementation. The underlying implementation of map is a hash table, which is roughly the same as the hash table usually in mind, but it does have a lot of details ("Devils in the Details").
HashMap is implemented by a bucket array, all elements will be hashed into the bucket in the array, and after the bucket is filled, a bucket will be expanded by a overflow pointer to form a list, that is, to resolve the conflict problem. This is also a basic hash table structure, nothing novel, the following summarizes some details it.
- Note that a bucket does not only store one key/value pair, but can store 8 key/value pairs. Each bucket consists of the header and data parts, the data portion of the memory size is: (sizeof (key) + sizeof (value)) * 8, that is, to store 8 pairs of key/value, which 8 to key/value in the data memory The order of storage in is: Key0key1...key7value0value1...value7, in order to store 8 key values sequentially, and then store the corresponding 8 value. Why not store it as Key0value0...key7value7? The main is easy to access it.
- If key, the type size of value exceeds 128 bytes, the value is not stored directly, but its pointer is stored.
- The header portion of the bucket has an
uint8 tophash[8]
array that will be used to store a high 8-bit value for the hash value of 8 keys. For example: tophash[0] The value stored is hash (key0)» (64-8). Save a key of the hash high 8-bit part, in the search/delete/Insert a key, you can first determine the two key hash of the high 8 is equal, if not, it does not have to compare the content of key. So here to save the hash value of the high 8 bits can be used as the first step of the rough filtering, many times can be omitted to compare two key content, because the comparison of two key is more expensive than the cost of two uint8. Of course, if you store the entire hash value, not just the 8-bit height, the judgment will be better, but the memory will be much more occupied.
- If all 8 key/value of buckets are filled, a new bucket is allocated and concatenated with the overflow pointer. Note that this list pointer is named overflow, which represents the bucket overflow, the name feels good, when the hash table is implemented we should try to avoid the bucket overflow.
- HashMap is self-growing, it is said that with more and more kv inserted, the initial bucket array can need to grow, re-hash all elements, performance will be good. Bucket array growth is the time to insert more than the number of elements
bucket数组大小 * 6.5
, why is 6.5, this in the code comments are described, mainly testing out the empirical value.
- Each time HashMap is increased, a new bucket array is reassigned, and the new bucket array is twice times the size of the previous bucket array.
- After the hashmap grows, it is necessary to copy the elements from the old bucket array to the new bucket array, which is not done immediately in a breath, but with an incremental copy, that is, after allocating a new bucket array, and not copying the elements immediately. Instead, the next time you insert an element, copy a point, and gradually copy all the elements into a new bucket array as the insert moves more.
- When you make a map object, if you do not specify the size, the bucket array defaults to 1, and as the number of inserted elements increases, it grows into 2,4,8,16 and so on. You can see that a map that does not specify an initialization size is likely to go through a lot of growth and element copying. We should specify a suitable size value for the map.
For the time being summed up this point ...
Zhejiang Province Figure Air Conditioning is really cold, it's freezing, I'm going out to the sun.