This is a creation in Article, where the information may have evolved or changed.
First throw a question, in go when we want to implement a collection, we can use map to achieve. And map itself can be used by the "comma OK" mechanism to obtain the existence of the building, for example _ , ok := map["key"] , if there is no corresponding value, OK is false, In this way, the collection can be implemented. Sometimes we choose map[string]bool to define this set, but because of the "comma OK" syntax, it can be defined map[string]struct{} in the form that the value no longer takes up memory.
The latter can indicate that both states have or are not, while the former actually has three states, sometimes indicating true or false, or not.
Many times we choose map[string]struct{} to represent the implementation of the collection, but is it really worth it?
Here's what the map is about. The implementation of map is a hash table. The table structure is represented by two structures, namely Hmap and Bmap, which represent the map structure, which represents the bucket structure under the hash table of the map.
Everything starts with the implementation of map.
A map is made up of bucket arrays, each of which is represented as follows.
// A bucket for a Go map.type bmap struct { tophash [bucketCnt]uint8 // 这里的bucetCnt是8,是个固定值,每个桶跟8个k-v对. // 先是8个key,后是8个value. // 最后是一个overflow指针指向串联的bucket.}
And Hmap said that, in fact, is a header information.
// A header for a Go map.type hmap struct { flags uint8 // 一些标志j B uint8 // bucket数量的log_2 hash0 uint32 // hash 种子 buckets unsafe.Pointer // buckets 数组的指针. oldbuckets unsafe.Pointer // 增长时需要被替换的数组的指针. nevacuate uintptr // 被提升的桶的数量(增长时,桶会从oldbuckets移到buckets当中) overflow *[2]*[]*bmap // 指向串联桶的指针.}
BMAP This structure is similar to the definition of C, there are actually some members, but the dynamic application (runtime own malloc) is not defined.
A BMAP will have a 8-byte tophash to navigate to the corresponding entry in the bucket. Each entry represents a k-v, which is the high byte of the hash of the key.
The location bucket uses a hash of the low byte. Each type in go has its own hash method.
To prevent alignment problems, first rank 8 keys, then 8 value. For example, if it is Map[int8]int64, then k-v together, it will empty 7 bytes, very wasteful.
But 8 int8 first, there will be no alignment problem. The last structure is the bucket pointer, which points to the bucket in series.
And the whole hmap is an array of bmap, mostly management information.
Memory distribution.
The hmap growth is dependent on the load factor, and the load factor (loadfactor) in Go is 6.5, which is an ideal value to be obtained by testing.
This value means that the average number of entry per barrel is 6.5, before we mentioned that the size of each bucket is 8. That is, buckets are generally not filled.
If the load factor is high, that is, the bucket as full as possible, it will result in a high hash collision rate (can hash to a small space), which will produce too much overflow bucket.
If the load factor is low, the hash collision rate is low, which makes the space very large, resulting in a relatively small amount of real utilization (the amount of data deposited/all bucket space).
So the overall load factor 6.5 is an ideal value, which is the value that go now takes.
This can be found by determining the key code for Growth:
for; Hint > bucketcnt && float32 (hint) > Loadfactor*float32 (uintptr (1) <<B); b++ { }
2^b is the number of buckets, hint is the size of the application map, BUCKETCNT is 8, because the pre-allocation of a bucket, if a bucket will not exceed the words will not increase.
The key is to ensure that the hint is larger than the load factor * bucket, in other words to ensure that the average of 6.5 k-v per barrel can tolerate hint so much to k-v.
The above is static allocation, when dynamic growth oldbuckets is half of buckets, that is, doubling growth.
Hmap will turn bueckets into oldbuckets when it grows and then apply for the k-v in the new buckets.buckets will not move to other barrels.
This guarantees the consistency of the traversal time. HashMap is traversed by range by a bucket in the bucket array and then the bucket series bucket is returned
The next element of the bucket array is traversed in turn.
Delete is very simple, just set the corresponding key and value to null.
Now we can make a calculation of the map's implementation. Suppose our map is defined as map[string]struct{}{}
Under 64bit operating system the size of a bucket is 8 + 8 +80 + 8 = 144 bytes (string is a constant with only one pointer and one Len value).
If yes map[string]bool{} , then the size of a bucket is 8 + 8+ 81 = 152 bytes.
The amount of space saved is probably 5.2%, given that the load factor is 6.5 and the percentage is 81.25%, saving 8 bytes is completely superfluous.
Rather than sacrificing semantic trickery to save these bytes, it's more straightforward to define a map that represents clarity.
So my conclusion is map[string]struct{} not advisable.