Re: http://webcache.googleusercontent.com/search? Hl = ZH-CN & newwindow = 1 & Safe = strict & GBV = 2 & Q = cache:-kz77hm9shsj: http://blog.yufeng.info/archives/tag/align+typedef+union+align&ct=clnk
MultithreadingProgramTo avoid the use of locks, we usually adopt this data structure: according to the number of threads, arrange an array, each thread has one item, no conflict with each other. logically, this design is impeccable, but we will find that this does not increase the speed. the problem lies in the cache line of the CPU. when reading the primary storage, data is read to L1 and L2 at the same time, and L1 is in the cache line (usually 64) bytes. each core has its own L1 and L2, so each thread also reads others' items when reading its own items. Therefore, when updating, to maintain data consistency, cache must be synchronized between cores,
This will cause serious performance problems. This is the so-called false sharing problem. If you are interested, you can go to Wikipedia.
Specific referenceArticle:
Http://software.intel.com/en-us/articles/avoiding-and-identifying-false-sharing-among-threads/
The solution is simple:
Add each item to the length of the cache line for isolation.
Typedef Union{Erts_smp_rwmtx_t rwctx; byte cache_line_align _ [partition (sizeof (erts_smp_rwmtx_t)];} erts_meta_main_tab_lock_t; or _ declspec (Align(64) int thread1_global_variable ;__ declspec (Align(64) int thread2_global_variable;
This is why cache_line_align is everywhere on high-performance servers, and it is called to avoid the trash of cache.
Tools similar to valgrind and Intel vtune can fine-tune performance at this level.