Go HashMap memory layout and implementation

Source: Internet
Author: User
This is a creation in Article, where the information may have evolved or changed.

The opportunity to learn about the memory layout of the go built-in type is one time when debugging "the impact of different types of small objects frequently creating on GC performance" found that the GC performance of map was poor, and that the slice-containing struct as a contrast was good. This concludes the implementation of map in go runtime to explain the problem.

hash table Internal Structure

Go map is HashMap, source code in Src/runtime/hashmap.go. The Map,go map of C + + with red-black trees is unordered map, which means that the key value cannot be traversed. Like the traditional HashMap implementation, it is implemented by a buckets array, where all the elements are hashed into the bucket of the array, andbuckets points to the contiguous array of memory allocations. The B field indicates that the hash table size is 2 index, ie 2^B . Each expansion will increase to twice times the last size, that is 2^(B+1) . When the bucket is filled, a bucket is formed through the overflow pointer to mallocgc form the linked list, which is the problem of resolving conflicts for the hash table.

// A header for a Go map.type hmap struct {count int // len()返回的map的大小 即有多少kv对flags uint8B     uint8  // 表示hash table总共有2^B个buckets hash0 uint32 // hash seedbuckets    unsafe.Pointer // 按照low hash值可查找的连续分配的数组,初始时为16个Buckets.oldbuckets unsafe.Pointer nevacuate  uintptr      overflow *[2]*[]*bmap //溢出链 当初始buckets都满了之后会使用overflow}
// A bucket for a Go map.type bmap struct {tophash [bucketCnt]uint8// Followed by bucketCnt keys and then bucketCnt values.// NOTE: packing all the keys together and then all the values together makes the// code a bit more complicated than alternating key/value/key/value/... but it allows// us to eliminate padding which would be needed for, e.g., map[int64]int8.// Followed by an overflow pointer.}

is a bucket of data structure, Tophash is a size of 8 (bucketcnt) of the array, stored 8 key hash value of the high eight-bit value, in the Key/value to delete the check, the first comparison of the key hash value of eight bits is equal, Then compare the specific key values. According to the official note after the Tophash array followed by 8 key/value pairs, each pair corresponds to a record in the Tophash. The last bucket also contains a pointer to the next bucket in the list. Memory layout such as.

The reason to put all k1k2 together instead of K1V1 is because the data type of key and value can be very large, such as Map[int64]int8, which, given the byte alignment, can waste a lot of space together.

Map related Operations

Map initialization

The initial size of B is 0, if the size of map is specified hint and hint is greater than 8, then buckets will be allocated by NewArray at make, otherwise buckets will be allocated at the time of the first put. With the increase in key/value in HashMap, buckets needs to be redistributed, and each time it is hashed and the elements are copied . It is best to specify a suitable size for the map at initialization time.

The Makemap has both H and bucket parameters, which are left to the compiler. If the compiler decides that the Hmap struct and the first bucket can be created on the stack, the two entry parameters may not be nil.

// makemap implemments a Go map creation make(map[k]v, hint)func makemap(t *maptype, hint int64, h *hmap, bucket unsafe.Pointer) *hmap{  B := uint8(0)  for ; hint > bucketCnt && float32(hint) > loadFactor*float32(uintptr(1)<&ltB); B++ {  }  // 确定初始B的初始值 这里hint是指kv对的数目 而每个buckets中可以保存8个kv对  // 因此上式是要找到满足不等式 hint > loadFactor*(2^B) 最小的B  if B != 0 {    buckets = newarray(t.bucket, uintptr(1)<<B)  }  h = (*hmap)(newobject(t.hmap))  return h}

Map stored value

The stored steps and the first part of the analysis are consistent. First, the hash value of the key to find the bucket, and then in the bucket within the Tophash and high 8 bits of its corresponding key value is equal to the entry key, if found to update the value. If key does not exist, the key is prioritized into the empty Tophash array location encountered during the lookup. If the current bucket is full, you need to allocate additional space to the key, and the newly allocated bucket will be hung on the overflow linked list.

Func mapassign1 (t *maptype, h *hmap, key unsafe. Pointer, Val Unsafe.  Pointer) {hash: = Alg.hash (Key, UIntPtr (H.hash0)) if h.buckets = = Nil {h.buckets = NewArray (t.bucket, 1)}again: Find the corresponding buckets bucket according to the low 8-bit hash value: = Hash & (UIntPtr (1) <
  
   > (sys. PTRSIZE*8-8)) for {//traversal each bucket compares whether all tophash are equal to top//If an empty Tophash position is found is marked as pluggable position for I: = uintptr (0); I < Buc ketcnt; i++ {if b.tophash[i]! = top {if b.tophash[i] = = Empty && Inserti = nil {Inserti = &b. Tophash[i]} Continue}//the current Tophash corresponding key position can be found based on the bucket offset K2: = Add (unsafe. Pointer (b), dataoffset+i*uintptr (t.keysize)) if!alg.equal (key, K2) {continue}//Find a key corresponding to Tophash Position Typedmemmove (T.elem, V2, val) goto done}//If overflow is empty then break ovf: = B.overflow (t)}//Do not fi  nd mapping for key.  Allocate new cell & Add entry. If float32 (h.count) >= Loadfactor*float32 ((uintptr (1) <
   
    
     = bucketcnt {hashgrow (t, h) Goto again//growing the table invalidates everything, so try to again}//All C  Urrent buckets is full, allocate a new one.   if Inserti = = Nil {newb: = (*bmap) (NewObject (T.bucket)) H.setoverflow (T, b, newb) Inserti = &newb.tophash[0] }//Store new key/value at insert position kmem: = NewObject (t.key) Vmem: = NewObject (T.elem) typedmemmove (T.key, I NSERTK, Key) Typedmemmove (T.elem, Insertv, val) *inserti = Top h.count++}
    
   
  

Hash grow expansion and migration

If all buckets are full when the value is stored in the map, you need to calculate whether you need to expand the new space in the heap. The time for expansion is Count > Loadfactor (2^b). The Loadfactor option here is 6.5. understanding of the physical meaning of the time of expansion when there is no overflow, HashMap can store a total of 8 (2^b) kv pairs, when HashMap has been stored in the 6.5 (2^B) kv pair, the hashmap has tended to overflow, It is possible to use the overflow linked list when the value is stored, which increases hitprobe and missprobe. In order to keep the HashMap read and ultra-looking high performance, the HashMap need to re-hash elements in the newly allocated buckets and copy them in the new bucket, called evacuate in the source code.

Overflow overflow rate refers to the average number of kv in a bucket when it overflows. Bytes/entry is the amount of data that an average of one kv needs to store in extra bytes. Hitprobe refers to the average number of times to find an existing key. Missprobe refers to the average number of times to find a nonexistent key. Select 6.5 to balance this set of data.

Loadfactor %overflow Bytes/entry Hitprobe Missprobe
4.00 2.13 20.77 3.00 4.00
4.50 4.05 17.30 3.25 4.50
5.00 6.85 14.77 3.50 5.00
5.50 10.55 12.94 3.75 5.50
6.00 15.27 11.67 4.00 6.00
6.50 20.90 10.79 4.25 6.50
7.00 27.14 10.15 4.50 7.00
7.50 34.03 9.73 4.75 7.50
8.00 41.10 9.40 5.00 8.00

However, this migration is not completed once after the expansion, but gradually completed, each time insert or remove when the migration of 1 to 2 pair, that is, incremental expansion. The main reason for incremental expansion is to shorten the response time of the map container. If the HashMap is large, it can easily cause the system to stop and not respond. Incremental expansion is essentially the allocation of total expansion time to each hash operation. As this work is done gradually, the data is partially part of the old table in the new table. Old buckets are not deleted, just add a deleted tag. It is only released when all buckets have been migrated from the old table.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.