Go HashMap memory layout and implementation

Last Update:2017-12-18 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This is a creation in Article, where the information may have evolved or changed.

The opportunity to learn about the memory layout of the go built-in type is one time when debugging "the impact of different types of small objects frequently creating on GC performance" found that the GC performance of map was poor, and that the slice-containing struct as a contrast was good. This concludes the implementation of map in go runtime to explain the problem.

hash table Internal Structure

Go map is HashMap, source code in Src/runtime/hashmap.go. The Map,go map of C + + with red-black trees is unordered map, which means that the key value cannot be traversed. Like the traditional HashMap implementation, it is implemented by a buckets array, where all the elements are hashed into the bucket of the array, andbuckets points to the contiguous array of memory allocations. The B field indicates that the hash table size is 2 index, ie 2^B . Each expansion will increase to twice times the last size, that is 2^(B+1) . When the bucket is filled, a bucket is formed through the overflow pointer to mallocgc form the linked list, which is the problem of resolving conflicts for the hash table.

// A header for a Go map.type hmap struct {count int // len()返回的map的大小 即有多少kv对flags uint8B     uint8  // 表示hash table总共有2^B个buckets hash0 uint32 // hash seedbuckets    unsafe.Pointer // 按照low hash值可查找的连续分配的数组，初始时为16个Buckets.oldbuckets unsafe.Pointer nevacuate  uintptr      overflow *[2]*[]*bmap //溢出链 当初始buckets都满了之后会使用overflow}

// A bucket for a Go map.type bmap struct {tophash [bucketCnt]uint8// Followed by bucketCnt keys and then bucketCnt values.// NOTE: packing all the keys together and then all the values together makes the// code a bit more complicated than alternating key/value/key/value/... but it allows// us to eliminate padding which would be needed for, e.g., map[int64]int8.// Followed by an overflow pointer.}

is a bucket of data structure, Tophash is a size of 8 (bucketcnt) of the array, stored 8 key hash value of the high eight-bit value, in the Key/value to delete the check, the first comparison of the key hash value of eight bits is equal, Then compare the specific key values. According to the official note after the Tophash array followed by 8 key/value pairs, each pair corresponds to a record in the Tophash. The last bucket also contains a pointer to the next bucket in the list. Memory layout such as.

The reason to put all k1k2 together instead of K1V1 is because the data type of key and value can be very large, such as Map[int64]int8, which, given the byte alignment, can waste a lot of space together.

Map related Operations

Map initialization

The initial size of B is 0, if the size of map is specified hint and hint is greater than 8, then buckets will be allocated by NewArray at make, otherwise buckets will be allocated at the time of the first put. With the increase in key/value in HashMap, buckets needs to be redistributed, and each time it is hashed and the elements are copied . It is best to specify a suitable size for the map at initialization time.

The Makemap has both H and bucket parameters, which are left to the compiler. If the compiler decides that the Hmap struct and the first bucket can be created on the stack, the two entry parameters may not be nil.

// makemap implemments a Go map creation make(map[k]v, hint)func makemap(t *maptype, hint int64, h *hmap, bucket unsafe.Pointer) *hmap{  B := uint8(0)  for ; hint > bucketCnt && float32(hint) > loadFactor*float32(uintptr(1)<&ltB); B++ {  }  // 确定初始B的初始值 这里hint是指kv对的数目 而每个buckets中可以保存8个kv对  // 因此上式是要找到满足不等式 hint > loadFactor*(2^B) 最小的B  if B != 0 {    buckets = newarray(t.bucket, uintptr(1)<<B)  }  h = (*hmap)(newobject(t.hmap))  return h}

Map stored value

The stored steps and the first part of the analysis are consistent. First, the hash value of the key to find the bucket, and then in the bucket within the Tophash and high 8 bits of its corresponding key value is equal to the entry key, if found to update the value. If key does not exist, the key is prioritized into the empty Tophash array location encountered during the lookup. If the current bucket is full, you need to allocate additional space to the key, and the newly allocated bucket will be hung on the overflow linked list.

Func mapassign1 (t *maptype, h *hmap, key unsafe. Pointer, Val Unsafe.  Pointer) {hash: = Alg.hash (Key, UIntPtr (H.hash0)) if h.buckets = = Nil {h.buckets = NewArray (t.bucket, 1)}again: Find the corresponding buckets bucket according to the low 8-bit hash value: = Hash & (UIntPtr (1) <
  
   > (sys. PTRSIZE*8-8)) for {//traversal each bucket compares whether all tophash are equal to top//If an empty Tophash position is found is marked as pluggable position for I: = uintptr (0); I < Buc ketcnt; i++ {if b.tophash[i]! = top {if b.tophash[i] = = Empty && Inserti = nil {Inserti = &b. Tophash[i]} Continue}//the current Tophash corresponding key position can be found based on the bucket offset K2: = Add (unsafe. Pointer (b), dataoffset+i*uintptr (t.keysize)) if!alg.equal (key, K2) {continue}//Find a key corresponding to Tophash Position Typedmemmove (T.elem, V2, val) goto done}//If overflow is empty then break ovf: = B.overflow (t)}//Do not fi  nd mapping for key.  Allocate new cell & Add entry. If float32 (h.count) >= Loadfactor*float32 ((uintptr (1) <
   
    
     = bucketcnt {hashgrow (t, h) Goto again//growing the table invalidates everything, so try to again}//All C  Urrent buckets is full, allocate a new one.   if Inserti = = Nil {newb: = (*bmap) (NewObject (T.bucket)) H.setoverflow (T, b, newb) Inserti = &newb.tophash[0] }//Store new key/value at insert position kmem: = NewObject (t.key) Vmem: = NewObject (T.elem) typedmemmove (T.key, I NSERTK, Key) Typedmemmove (T.elem, Insertv, val) *inserti = Top h.count++}

Hash grow expansion and migration

If all buckets are full when the value is stored in the map, you need to calculate whether you need to expand the new space in the heap. The time for expansion is Count > Loadfactor (2^b). The Loadfactor option here is 6.5. understanding of the physical meaning of the time of expansion when there is no overflow, HashMap can store a total of 8 (2^b) kv pairs, when HashMap has been stored in the 6.5 (2^B) kv pair, the hashmap has tended to overflow, It is possible to use the overflow linked list when the value is stored, which increases hitprobe and missprobe. In order to keep the HashMap read and ultra-looking high performance, the HashMap need to re-hash elements in the newly allocated buckets and copy them in the new bucket, called evacuate in the source code.

Overflow overflow rate refers to the average number of kv in a bucket when it overflows. Bytes/entry is the amount of data that an average of one kv needs to store in extra bytes. Hitprobe refers to the average number of times to find an existing key. Missprobe refers to the average number of times to find a nonexistent key. Select 6.5 to balance this set of data.

Loadfactor	%overflow	Bytes/entry	Hitprobe	Missprobe
4.00	2.13	20.77	3.00	4.00
4.50	4.05	17.30	3.25	4.50
5.00	6.85	14.77	3.50	5.00
5.50	10.55	12.94	3.75	5.50
6.00	15.27	11.67	4.00	6.00
6.50	20.90	10.79	4.25	6.50
7.00	27.14	10.15	4.50	7.00
7.50	34.03	9.73	4.75	7.50
8.00	41.10	9.40	5.00	8.00

However, this migration is not completed once after the expansion, but gradually completed, each time insert or remove when the migration of 1 to 2 pair, that is, incremental expansion. The main reason for incremental expansion is to shorten the response time of the map container. If the HashMap is large, it can easily cause the system to stop and not respond. Incremental expansion is essentially the allocation of total expansion time to each hash operation. As this work is done gradually, the data is partially part of the old table in the new table. Old buckets are not deleted, just add a deleted tag. It is only released when all buckets have been migrated from the old table.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Go HashMap memory layout and implementation

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Go HashMap memory layout and implementation

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support