Folly::atomichashmap Source Analysis (ii)

Last Update:2015-07-06 Source: Internet

Author: User

Tags rehash

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article is original, reprint please specify: http://www.cnblogs.com/gistao/background

On a detailed analysis of the source code once again, and the source behind the design of the idea is not written, design ideas are often the most important, without it, the basic can not do the whole optimization or correct use,

But according to the results of the reasons are difficult, but also very easy to not in place, here ' stumbling ' to write their own understanding, in addition to the source of the ' problem ' also written out.

Simple

Debugging a multithreaded program is a headache, and using atomic to write a correct multithreaded data structure is more difficult, out of the question is generally not a random problem, and waiting to be reproduced to see the log bar,

So the simplicity of this feature should be the first in the design.

Atomichashmap's key only supports int, why doesn't IT support a custom type of key like the concurrent_hash_map of TBB? It is entirely possible to position the existing key as

Pure state machine, and then set a field to save the custom key, I think is simple, because the user can use the hash algorithm to convert the custom key to int to solve. This also

Saves a pointer space footprint and is simple enough.

28 Principles

The 28 principle here is that 80% of the CPU executes 20% of the code. Rehash is an essential function of hashmap, but it is clearly not within the scope of the 20% code.

So, Atomichashmap can not support the traditional rehash, on the one hand is the atomic ability limit, on the other hand is rehash enough complex efficiency low enough, but the Facebook engineers chose

Let the 80% CPU execute fast enough, and the rest of the 20%cpu is a little bit lower to accept the idea.

Atomichashmap's rehash is similar to Dequue's expansion strategy, which will have two conclusions

When the capacity is not full, this is 80% probability event, still can O (1)
When the capacity is full, this is 20% probability event, can still O (2), O (3), O (4 ...)

Simple +28 Principles

Atomichashmap's conflict resolution strategy is linear probing, which affects the insertion of other keys because of this conflict, and the zipper method does not have this problem. Why Facebook's Engineers

Choose this, I think the first conflict is also a 20% probability event, then the code efficiency is acceptable, and then this zipper is actually a multi-producer multi-consumer mode of the queue,

See a lock-free implementation of boost, which is much more complex than linear probing, and the advantage of a linear probe is that it is good locality.

O (1) Turn O (N)

Watch a scene.

step1:100 concurrency, each concurrent do 100 random insertions, Atomichashmap size set to 10w, a total of 14w data inserted.

Step2:2 a concurrent query, the key of the query does not exist

STEP3:CPU idle down to 0

How to Solve

Through the source analysis of the previous article, we can conclude that there are three suspects.

The key to check is a conflict , so the best thing to do is to traverse one or several (depending on the hash algorithm and size) to find the element,
Or find an empty element, and then end the lookup; the worst-case scenario (where all the space in the map is occupied) is to iterate over the search ,
Of course, this is a small possibility, depending on the fill factor and the concurrency of the insertion
To check the key has not been inserted , then the best case is to encounter the first empty element to end the lookup, the worst case is to traverse to find again
The key to be checked has been removed , as in this case

The conclusion is that the distribution of elements over space is very important, and there are three kinds of distribution

All elements are used, no empty elements
The used elements are centrally distributed, and the corresponding empty elements are centrally distributed.
separated/evenly distributed by used elements and empty elements

The latter two distributions mainly depend on the hash algorithm, and the good hash algorithm can ensure that the input of each bit change is reflected in the output.

The hash algorithm used in the test scenario is a general murmurhash, and the effect of this algorithm is quite good, and in actual use it does not occur in these two cases.

So the problem is only the first distribution: no empty elements

Insertinternal (Keyt key_in, t&& value) {  ...  if (Isfull_.load (std::memory_order_acquire))    return false;//full, do not allow to insert this map again   ++numentries_;//number  inserted if (Numentries_.readfast () >= maxentries_) {    Isfull_.store (true, std::memory_order_relaxed);//isfull set to True  ......}

This is the insertion logic for atomichashmap: Inserting is not allowed when full. Shouldn't there be no empty elements in that case? No

Maxentries_ is 10w and the fill factor is 0.8, then the capacity of the element is =12.5w (10/0.8), then the capacity of the empty element is 2.5w (12.5w-10w).

But why does the empty element have a capacity of 0?

Numentries_ is the Thread_cache_int type, the general idea of this class is that you can configure a cache_size, then access this Numentries_ object

All threads of the local variable + + to Cache_size will not be synchronized to other threads (that is, Readfast can obtain).

For example, if the Cache_size is configured to 1000 and the number of threads is 100, the maximum delay of readfast in theory is 100*1000=10w.

10w is much larger than the capacity of the empty element 2.5w, that is, it is possible that the actual insertion element is full (empty element capacity is 0), and Isfull_ is still false.

In the face of this problem, the solution is to increase the capacity.

Folly::atomichashmap Source Analysis (ii)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More