"Loads are not reorderd with other loads" is a fact !!

Source: Internet
Author: User

For the difficulty of multi-threaded programming, it may not be enough to be fully prepared. Some time ago, I have been sorting out some content related to multi-threaded programming (a pair of multi-threadedAlgorithmLibrary compilation experience ). The day before yesterday Microsoft Of Pfx Of Joe The blog post is as follows: "Loads cannot pass other loads" is ~ Myth , It was really shocking.

 

The discussion focuses on the following examples:

 

P0 p1
================================
X = 1; y = 1;
R0 = x; r2 = y;
R1 = y; R3 = X;

 

Q: If X, Y Is Volatile Is it possible to make the execution complete? R1 = R3 = 0 What about it?

 

Let alone the results. For more information, see Symptom Analysis. First . Net Any Store Both Relase Meaning and any Volatile Of Load Both Acquire Semantics. Due to this, R0 = x It is impossible to adjust X = 1 Previously executed. Similarly R2 = y Nor can it be adjusted Y = 1 Run. Reference IntelWhite Paper In 2.1 As you can see, Loads are not reordered with other loads and stores are not reorderd with other stores ". Therefore R1 = y Nor can it be moved R0 = x Previously, likewise, R3 = x Nor can it be moved R2 = y Before.

 

To sum up, you want to get R1 = R3 = 0 The results seem impossible. However, this may indeed happen. Reference Intel In the White Paper 2.4 Section:" Intra-processor Forwarding is allowed Just give the same example and point out that, R1 = R3 = 0 It is completely possible. Joe pointed out that in this case,ProgramJustIt seems likeThis operation:

P0 p1
================================
R1 = y; R3 = X;
X = 1; y = 1;
R0 = x; r2 = y;

 

 

What's the same thing? Is it self-contradictory? Actually not! White Paper 2.1 Section describes the situation Memory reorder , And 2.4 Section does not deny 2.1 . Memory Access Reorder The rule is only applicable to the current Processor But does not guarantee the rest of all results Processor Visible. This is caused by write latency. Therefore, this question is not in the above example. Reorder The fact that it is impossible, but for other processors, "it seems" has happened. Reorder Same.

 

Load acquire and store release are actually considered incomplete half fence, and the visibility of other CPUs cannot be guaranteed. How can this problem be solved? See intel 64 and IA-32 Programming Manual To ensure visibility, useMemory fence . Therefore, the method to solve the problem is to add Full fence Or include implicit Full fence For example Interlocked. xxx . That is

P0 p1
================================
X = 1; y = 1;

Memoryfence;
R0 = x; r2 = y;
R1 = y; R3 = X;

 

Note: To insert a full fence in. NET Framework, you can use the system. Threading. thread. memorybarrier () method.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.