read a lot of instructions to reorder the article, unfortunately because of their own hardware and computer theory of lack of knowledge, it is difficult to understand the deep-seated mysteries and implementation principles. But there are a lot of posts, easy to understand, the use of examples are very image. Daniel is able to use simple explanations and popular metaphors to tell us what is very advanced. here to do a excerpt and summary, and share with you, I hope you can have an image of order reordering, not in concurrent programming to make some simple mistakes. If you understand something wrong, you want to see the great God.
From the source to the machine (or virtual machine) can be recognized by the program, at least through the compilation period and run time. reordering is divided into two categories: compile -time reordering and run-time reordering (unordered execution), which correspond to compile-time and run-time environments, respectively. Because of the existence of reordering, the actual order of execution of the instructions is not the order that the source sees.
1. Reordering of compilers
The compiler can reschedule the execution of a statement without changing the semantics of the single-threaded procedure. The following example is from an article in the Concurrent Programming network
The typical re-ordering of compiling period is to minimize the number of registers read and stored, and to fully reuse the stored values of registers by adjusting the order of instruction without changing the semantics of the program. Suppose the first instruction calculates a value assigned to a variable A and holds it in the register, the second instruction is independent of a but requires a register (assuming it will occupy the register where A is located), the third instruction uses the value of a and is not related to the second instruction. If, according to the sequential conformance model, A is placed in the register after the first instruction is executed, a no longer exists when the second instruction executes, and a is re-read into the register when the third instruction executes, and the value of a is not changed in this process. Normally the compiler swaps the position of the second and third directives so that a is present in the register at the end of the first instruction, and then the value of a is read directly from the register, reducing the overhead of repeated reads. ”
another compiler optimization: When reading a variable in a loop, in order to improve access speed, the compiler will first read the variable into a register, and then take the value of the variable, it is taken directly from the register, no longer from the memory value. This reduces unnecessary access to memory. But while improving efficiency, new problems are also introduced. If another thread modifies the value of a variable in memory, it is likely that the loop cannot end because the value of the variable in the register has not changed. the compiler's code optimization can improve the efficiency of the program, but it can also lead to incorrect results. So programmers need to prevent the compiler from making faulty optimizations.
2. Direct dependencies between directives
The compiler and the processor may reorder the operations, but to comply with data dependencies, the compiler and processor do not change the order in which the two operations exist that have data dependencies. If two operations access the same variable, and one of the two operations is a write operation, there is a data dependency between the two operations. The following three types of data dependencies are:
Name code example description
Write after read a = 1;b = A; After you write a variable, read the position again.
Write after write a = 1;a = 2; After writing a variable, write the variable again.
Read after write a = B;b = 1; After reading a variable, write the variable.
In the above three cases, the execution result of the program will be changed as long as the order of execution of the two operations is reordered. Operations like this that have a direct dependency are not reordered. Special note: The dependencies here are only range on a single line .
Write thread public void writer () { a = 1; Flag = true; } Read thread public void Reader () { if (flag) { int i = A * A; }}
Although a write thread cannot be reordered from the logical relationship between the read thread and the write thread, the read thread produces the wrong result, but the write thread can actually be reordered, because from the writing thread's point of view, a and flag writes have no dependencies at all and can be reordered arbitrarily. The compiler is not so limited, it can only limit the dependency analysis to a single thread, and cannot perform dependency analysis across threads. Dependency is mainly a excerpt from
this article
3. Implicit dependencies between directives
Both the compiler and the CPU must ensure that the causal relationship of the program context does not change. Therefore, in the vast majority of cases, we write the program will not consider the impact of chaos. But some program logic, purely from the context is not to see their causal relationship. Like what:
*addr=5;
Val=*data;
On the surface, addr and data are not connected, can be assured to go out of order to execute. But if this is in a certain device driver, these two variables may correspond to the address port and data port of the device. Also, this device specifies that when you need to read and write a register on a device, the register number is set to the address port, and then it can be manipulated to the corresponding register by reading and writing the data port. In this case, the execution of the preceding two instructions can cause errors. For such logic, we call it implicit causality , and the direct input-output dependency between directives and directives is also called explicit causality . The execution of the CPU or the reordering of the compiler are not prerequisites for maintaining explicit causal relationships, but they do not recognize implicit causality. Let me give you an example:
Obj->data = xxx;
Obj->ready = 1;
When data is set, note the flag and possibly execute it in another thread:
if (Obj->ready)
Do_something (Obj->data);
Although this code looks awkward, it seems to be true. However, given the chaos, if the flag is set before data, the result is likely to be a cup. Because literally, there is no explicit causal relationship between the two directives in front of you, and chaos can happen. In general, if the program has an explicit causal relationship, the disorder will certainly respect these relationships, otherwise, chaos may break the original logic of the program. At this point, a barrier is needed to suppress the disorder to maintain the logic that the program expects. This text is excerpt from this article
Reordering of 4.CPU (instruction disorderly execution)
Today's CPUs generally use pipelining to execute instructions. The execution of an instruction is divided into a number of stages: Fetch, decode, visit, execute, write back, etc. Then, multiple instructions can exist simultaneously in the pipeline and executed simultaneously. The instruction pipeline is not serial, and does not cause a lengthy instruction to stay in the "execution" phase for a long time, resulting in subsequent instructions being stuck in the pre-execution phase. Instead, pipelining is parallel, and multiple instructions can be in the same phase at the same time, as long as the corresponding processing parts within the CPU are not fully occupied. For example, the CPU has an adder and a divider, then an addition instruction and a division instruction may be in the "execution" phase, while the two addition instructions in the "execution" phase can only work serially.
In this way, however, chaos may arise. An addition instruction, for example, originally appeared at the back of a division instruction, but because of the long execution time of the division, the addition could be executed before it was executed. Another example is the two-memory command, which may have been completed before the first instruction because the second instruction hit the cache. In general, command Chaos is not the CPU's intention to adjust the order before executing the instructions. The CPU is always in order to take instructions in the memory, and then put them in the order of the command line. But the various conditions of the instruction execution, the interaction between the instruction and the instruction, may lead to the order to be put into the pipeline, and finally the execution of the sequence is complete. This is called " sequential inflow, disorderly outflow ".
In addition to the lack of resources in the command pipeline will be stuck outside (as described in the previous adder to cope with two addition instructions), the correlation between the directives is also the main cause of pipeline congestion. The chaotic execution of the CPU is not random order, but the premise of guaranteeing the procedural context causality. With this premise, the correctness of CPU execution is guaranteed. Like what:
a++;
B=f (a);
c--;
Since B=f (a) is dependent on the execution of the previous instruction a++, B=f (a) will be blocked before the "execute" phase until the result of the a++ execution is generated, and c--is not dependent on the previous one, it may be executed before b=f (a). (Note that f (a) here does not represent a function call with a as parameter, but rather an instruction with a as the operand.) A C-language function call requires several instructions to be implemented, and the situation is more complicated.
If a dependent directive like this is close enough, the latter instruction must be blocked for a long time in the pipeline because it waits for the result of the previous execution and consumes the pipelining resources. And the compiler's chaos, as a means of compiling optimization, then try to rearrange the instructions by the command to pull the distance of such two instructions, so that the next instruction into the CPU, the previous command results have been obtained, then no longer need to block wait. For example, rearrange the instructions to:
a++;
c--;
B=f (a);
Compared to the chaotic sequence of the CPU, the compiler's chaos is really the order of instructions to adjust. However, the confusion of the compiler must also ensure that the causal relationship of the program context does not change.
Due to the existence of reordering and disorderly execution, if there is no synchronization of shared data in concurrent programming, it is very easy to get a variety of seemingly bizarre problems.
Resources:
Concurrent programming Network, Cheng article http://ifeve.com/java-memory-model-1/
Concurrent programming Network, Wang Chen Pure article http://ifeve.com/jvm-reordering/
CSDN, the sky is full of aircraft blog http://blog.csdn.net/jiang_bing/article/details/8629425
Understanding important concepts in concurrent programming: Command reordering and command-order execution