Multi-core program probing (1) False sharing and verification using vtune

Source: Internet
Author: User
Tags intel core 2 duo

A common problem in multi-core development is false sharing (failure sharing). This problem allows us to look at the compilation of multi-core programs from a completely new perspective, which is the hardware perspective.

 

On Intel Core 2 Duo processor platform, L2 cache is shared by two cores, while L1 data cache is separated and accessed by two cores respectively. The cache line size is 64 bytes. When different threads read and write different variables at the same time, because these two variables are actually stored on the same cache line, in this way, access to the cache line may be compromised, resulting in potential performance loss. For example, this Code:

Unsigned char vectora [10];
Unsigned char vectorb [10];

 

Uint mythreadproca (lpvoid pparam)
{
Unsigned long mycounter= 100000000;
While (-- mycounter)
{
For (INT I = 0; I <10; ++ I)
{
++ Vectora [I];
}
}
Return 0; // thread completed successfully
}

 

Uint mythreadprocb (lpvoid pparam)
{
Unsigned long mycounter= 100000000;
While (-- mycounter)
{
For (INT I = 0; I <10; ++ I)
{
++ Vectorb [I];
}
}
Return 0; // thread completed successfully
}

 

Although mythreadproc [A/B] is two different threads and two different variables are accessed, false sharing actually happens. When mythreadproca updates vectora [I], the cache line on the corresponding core A is also updated to the modified status, this cache line stores a copy of vectorb [I]. Therefore, the cacheline in another core B will become invalid ), the CPU will have to use the cache Protocol (Cache synchronization protocol) to notify the cache line on core B to update vectorb data at the same time. In this way, although mythreadproca does not modify vectorb, but it will cause the cache miss to increase when mythreadprocb thread accesses vectorb! We know that the cache access speed is 10 times that of normal memory, and the Miss increase in cache will cause significant performance degradation!

 

On the core2 platform, you can use the ext_snoop.all_agents.hitm event to evaluate the impact of false sharing. It monitors the transmission of bus (memory bus). If an hitm event occurs, it indicates that the cache at the response end on the bus is being modified, which exactly reflects the root cause of the false sharing problem.

Description of the vtune manual for ext_snoop.all_agents.hitm:

This event counts the Snoop responses to bus transactions. responses can be counted separately by type and by bus agent. with the 'This _ agent' mask the event counts Snoop responses from this processor to bus transactions sent by this processor. with the 'all _ agents' mask the event counts all Snoop responses seen on the bus.

 

Let's take a look at the measurement results of the above Code!


 

When sampling is used, ext_snoop.all_agents.hitm occurs 1175 times, cpu_clk is 6373, and inst_retired is 3796.

 

The solution of false sharing is also very simple. You only need to put the shared data in different cache lines. For example, you can change the code:

 

Unsigned char vectora [100];
Unsigned char vectorb [100];

 

In this way, vectora [0 ~ 9] And vectorb [0 ~ 9], vectora [10 ~ 99] is used as a pad placeholder to fill up the same cache line (64 bytes.

 

The measurement data after resolving the false sharing problem is:

 

 

Ext_snoop.all_agents.hitm dropped significantly to 179 times, while cpu_clk dropped to 1847. because the number of commands has not changed significantly, inst_retired is 3370. Similar results can also be obtained through the embedded timing Function Method in the program.

 

To sum up, the solution to the false sharing problem is as follows:

1. Increase the interval between array elements so that the elements accessed by different threads are located on different cache lines.
2. Create a local copy of each element of the entire local array in each thread, and then write it back to the global array.

False sharing is a common problem in multi-core program development and needs to be emphasized by programmers.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.