Memory Consistency Models and consistencymodels

Source: Internet
Author: User

Memory Consistency Models and consistencymodels
Note: computers have already entered the multi-core era. The multi-core era requires programmers to write parallel programs to give full play to the multi-processor. To write parallel/concurrent programs, you must understand the memory model. Therefore, I have translated an article about the summary of memory models. An error is inevitable when you translate an article for the first time. Http://www.cs.nmsu.edu /~ Pfeiffer/classes/573/notes/consistency.html
Note: There is a good tutorial on the memory consistency model at ftp://keeper.dec.com/pub/dec/wrl/research-reports/wrl-tr-95.7.pdf. This article has made a lot of reference to that paper.
Memory Consistency Models)
This article describes several important memory consistency models that have emerged in recent years. The most basic idea is to clarify that it is so difficult and especially expensive to try to implement the most intuitive understanding of the concept of "memory consistency" in our minds, and isn' t necessary to get a properly written parallel program to run correctly .). Therefore, we try to give other memory consistency models that are weaker than the intuitive understanding of "memory consistency" to make it easier to implement, while still allowing us to write parallel programs, the program runs correctly as expected.
Notation)
When describing these memory models, we are only interested in the shared memory (that is, the shared variables between threads or processes)-rather than anything else related to the program. We do not involve program control flows, data operations, and local variables (non-shared ). We use a standard mark to describe the memory model. We will use it below. In this mark, a straight line is used to indicate the operation of each processor in the system, starting from the left to the right. For each shared memory operation, we write it on the straight line of the processor. The two main operations are "read" and "write", which are expressed in the following expression: W (var) value the expression meaning is: Write the value to the shared variable var, R (var) value: Read the shared variable var and obtain its value. for example, W (x) 1 indicates that 1 is written to x, and R (y) 3 indicates that 3 is read from the variable y. for more operations (especially synchronization operations), we can define its mark when necessary. For simplicity, assume that all variables are initialized to 0. note that a statement (such as x = x + 1;) in advanced languages usually involves several memory operations. If the value is 0 before x, the statements in that advanced language will become (not considering other processors): P1: R (x) 0 W (x) 1 ---------------------------- the C statement may be changed to three commands in a cpu of the assume type: one load command (the variable x is loaded into the register) add one command (: Add x in the Register to 1), one storage command (: Put x in the register back into memory), and perform two operations on the memory. In a CISC processor, the statement is likely to be converted into a command, that is, the memory add command. Even so, the processor will still execute that command in the form of "Read Memory", "add 1", and "write memory. So it still involves two memory operations.
Note that the actual execution of memory operations may be very equivalent to the execution of code in some completely different advanced languages; maybe an if-then-else statement will test a flag, then set the flag. If I ask about memory operations, and your answers include something like conversion or data, it's not that good. (Notice that the actual memory operations saved med cocould equally well have been stored med by some completely different high level language code; maybe an if-then-else statement that checked and then set a flag. if I ask for memory operations and there is anything in your answer that looks like a transformation or something of the data, then something is wrong !)
Strict Consistency memory model/Strict Consistency model (Strict Consistency)
The most intuitive understanding of the concept of "memory consistency" is the "strictly consistent memory model ". In this strictly consistent memory mode, any read operation on the memory location X will return the value written by the most recent write operation on the memory location. If we have many processors that do not cache and access the memory through a bus, we will get (or conform to) the "strict consistency memory model ". The most important thing here is to precisely serialize all access to memory. We will use an example to illustrate what the "strictly consistent Memory Model" is and what it is, and also an example of how to use a mark to represent memory operations. As we mentioned earlier, we assume that all variables have an initial value of 0. The following example shows a scenario that complies with the "strictly consistent memory model. P1: W (x) 1--------------------------------------------p2: R (x) 1 R (x) 1, which indicates that "processor P1 writes 1 to variable x; after a period of time, the processor P2 reads the value 1 of x. Then read it again to obtain the same value. Let's provide another scenario that complies with the "strict memory consistency model": P1: W (x) 1--------------------------------------------p2: R (x) 0 R (x) 1. This time, the processor P2 runs first, it first reads the value 0 of x. When it reads x for the second time, it obtains the value 1 of x written by the processor P1. Note that these two scenarios can be achieved by executing the same program twice on the same processor. We provide a scenario that does not conform to the "strict memory consistency model": P1: W (x) 1----------------------------------------p2: R (x) 0 R (x) 1, when P2 reads the value of x for the first time, it does not read the value 1 written to x by the processor P1, But it finally reads the value. We call this model "atomic consistency ).
Sequential Consistency/memory Sequential Consistency model/Sequential Consistency Memory Model (Sequential Consistency) http://www.sigma.me/2011/05/06/sequential-consistency.html

The sequential consistency memory model is a model that is slightly weaker than the strictly consistent memory model. It is defined by Lamport as: "(concurrent programs on a multi-processor) the execution results are the same at any time, just as all processor operations are executed in a certain order, the operations of each microprocessor are performed in the order specified by the program." Essentially, any execution sequence generated by a program in the "strictly consistent Memory Model" is also legal in the "sequential consistent Memory Model". Of course, the speed of the processor is not considered. The idea here is to expand from "the actual read/write set happens to be a possible read/write set, we can make more effective reasoning for this program (because we can ask more useful questions, "Can this program be broken once? "). We are able to reason the program itself, without the interference of the hardware details that run our program. It is fair to say that if we have a computer system that uses a strictly consistent memory model, we can infer the situations in which it uses a sequential consistent memory model. (It's probably fair to say that if we have a computer system that really uses strict consistency, we'll want to reason about it using sequential consistency .)
The third scenario above is valid in the sequential consistency memory model. The following is another execution scenario of a valid sequential consistency Memory Model: P1: W (x) 1 ----------------------- P2: R (x) 1 R (x) 2 --------------------- P3: R (x) 1 R (x) 2 ----------------------- P4: W (x) 2
The reason for this scenario is the legal sequential consistency memory model is that the following alternate operations will be legal in the strict consistency Memory Model: P1: W (x) 1 --------------------------------------- P2: R (x) 1 R (x) 2 ------------------------------------- P3: R (x) 1 R (x) 2 ------------------------------------- P4: W (x) 2 The following is a scenario that does not conform to the sequential consistency Memory Model: P1: W (x) regression: R (x) 1 R (x) 2------------------------------------------------p3: R (x) 2 R (x) 1 -------------------- ---------------------------------- P4: W (x) 2 is very strange. The precise definition given by Lamport does not even require the idea of maintaining a normal response relationship. Before a write operation occurs, it is possible to see the result of the write operation, for example: (Oddly enough, the precise definition, as given by Lamport, doesn't even require that ordinary notions of causality be maintained; it's possible to see the result of a write before the write itself takes place, as in :) P1: W (x) 1-----------------------------p2: R (x) 1 is legal because there is an execution order in the strictly consistent memory model that may suspend (yield) processor P2 until its value is 1. This is not a weakness of the model; if your program can indeed violate such a response relationship, some synchronization operations are missing in your program (if your program can indeed violate causality like this, you're missing some synchronization operations in your program .). We haven't talked about the synchronization operation until now, but it will be right away.
Cache Coherence)
Many researchers almost regard Cache Coherence as synonyms of sequential consistency, but they are not, which may be surprising. Sequential consistency requires a view of memory operations from the perspective of global (that is, the memory) Consistency. cache consistency only requires a local (that is, a single memory address) consistency. Here is an example in which the given scenario conforms to the cache consistency but does not conform to the sequence consistency: P1: W (x) 1 W (y) 2---------------------------------------p2: R (x) 0 R (x) 2 R (x) 1 R (y) 1-------------------------------------------p3: R (y) 0 R (y) 1 R (x) 0 R (x) else: W (x) 2 W (y) both the 1 processor P2 and the processor P3 see the write operation of the processor P1 to x after the write operation of the processor P4 to x (in fact, the processor P3 does not see the write operation of the processor P4 to x operation ), we can see that the write operation of the processor P4 to y takes place after the write operation of the processor P1 to y (this time, the processor P3 did not see the write operation of the processor P1 to y .) However, the P2 processor sees that the P4 write operation on y occurs after the P1 write operation on x, however, processor P3 shows that the write operation of processor P1 to x takes place after the write operation of processor P4 to y. This is not possible in a snoopy-cache based system. But it does happen in a directory-based Cache consistency protocol system.
Do we really need such a strong memory model? (Do We Really Need Such a Strong Model ?)
Consider the following situation in a multi-processor with shared memory: the process runs on two processors, and each process changes the value of the shared variable x, as shown in the following figure: p1 P2 x = x + 1; x = x + 2; what will happen? There are no additional information. There are four different execution orders, which may lead to different results in 3: P1 first executes --- x to get 3. P2 executes --- x to obtain 3 values. Both P1 and P2 read the value of x. P1 first performs the write operation on x-the value obtained by x is 2. Both P1 and P2 read the value of x. P2 first performs the write operation on x-the value obtained by x is 1. We can easily and concisely describe such a program: It has bugs. More accurately, we say that there is a "data race": a variable is modified by multiple processes (threads), and the result is determined by the process (thread) run the command first. To make this program reliable, we must use locks to ensure that one of the processes (threads) executes the complete operation before another process (thread) starts.
So, if our program has "data competition" and the program behavior is unpredictable, but if all the processors see all the modifications in the same order, does this really matter (important )? Attempts to achieve "strict consistency" or "sequential consistency" may be seen as attempts to support the semantics of a program with bugs-because the program results are random, why do we need to know if it will obtain the correct random value? But it gets worse, as described in the next section. (Attempting to achieve strict or sequential consistency might be regarded as trying to support the semantics of buggy programs -- since the result of the program is random anyway, why shoshould we care whether it results in the right random value? But it gets worse, as we consider in the next sections ...)
Optimization and Consistency)
Even if the program we write has no bugs, the compiler generally does not support sequential consistency (the compiler generally does not know the existence of other processors, let alone consistent memory models. We can argue that this may indicate the need of the language for parallel semantics. As long as programmers want to use C and Java to write parallel programs, we must support them .) Most languages support a syntax in which the execution sequence of a program is maintained by each memory address, but not across memory addresses; this gives the compiler the freedom to rearrange the code order. Therefore, for example, if a program writes two variables x and y, the two variables do not depend on each other, then the compiler is free (right) write operations on these two variables in any order without affecting the correctness of the program. However, in a parallel environment, it is very likely that a program running on some other processors depends on the write operation sequence of the x and y variables. Two mutually exclusive processes are a good example. The code for entering the critical section is as follows: flag [I] = true; trun = 1-I; while (flag [1-I] & (turn = (1-I); If the compiler decides (for whatever reason) switches the order of write operations on flag [I] and trun, which is completely correct code in a single process environment, however, in a multi-process environment, failure occurs (this is a related situation ). What's worse, because the processor supports disorderly execution, there is no guarantee that the machine code of the program will execute access to the memory in the specified order! What's worse, the processor performs more aggressive (more serious) command re-arrangement due to the tight coupling between the processor and the cache, there is almost no way to stop or control this type of optimization (it is easy to think that the processor has finished the turn update, but it is still setting the above flag [I] value, because access to flag [I] involves access to arrays ). But the situation is not so bad, because we can require our compiler to access the shared memory in the order specified by the program (the keyword volatile is used here ). Taking the Intel processor as an example, we can also use commands with a lock prefix to forcibly specify the memory access sequence. However, note that we only use these keywords and prefixes where we care about the precise execution sequence of code. The following memory model will expand this point.
Processor Consistency)
This model is also called PRAM (short for the random access memory of the pipeline, rather than the Computing Theory of the parallel Random Access Machine Model. This model is defined as: All write operations completed on a single processor will be notified to all other processors in the order in which it actually occurs, however, the write operations completed on different Processors may be seen by different processors in a different sequence than the actual execution. The basic idea is that "processor consistency" can better reflect the real network-the latency of different nodes in the network may be different. In the last scenario in the sequence consistency Section (as shown below), it is not a legal sequence consistency, but a valid processor consistency: P1: W (x) 1----------------------------------------------p2: R (x) 1 R (x) 2------------------------------------------------------p3: R (x) 2 R (x) lead: W (x) 2 The following describes how it is generated, in a multi-processor machine connected with something more complex than the bus: 1. the processors are connected in a linear array: P1 --- P2 --- P3 --- P42. In the first loop, P1 and P4 perform write operations and notify other processors of their operations. 3. In the second cycle, the value 1 written by P1 is notified to P2, and the value 2 Written by P4 is notified to P3. Then P2 and P3 are read, so P2 sees x = 1, while P3 sees x = 2. 4. In the third cycle, the value 1 written by P1 is notified to P3, and the value 2 Written by P4 is notified to P2. Then P2 and P3 are read, so P2 sees x = 2, while P3 sees x = 1. (So we can see the situation .) Therefore, you can see that we comply with the key part of the "processor consistency" definition (requiring all other processors to see write operations on a single processor in an orderly manner ): p1 and P4 respectively perform a write operation. P2 and P3 respectively read the write operations of P1 and P4 in an orderly manner, so it first sees the write operation of P1 and then the write operation of P4. Similarly, P3 is closer to P4, so P3 first sees the write operation of P4, then you can see the P1 write operation ). However, the key point in this example is that it illustrates that the definition of "processor consistency" violates our intuition that the order in which P1 and P4 write operations are performed in P2 and P3 is not the same. Therefore, it violates "sequence consistency ".)
The following is a scenario that violates "processor consistency": P1: W (x) 1 W (x) 2-----------------------------------------p2: R (x) 2 R (x) the order of the two write operations of P1 to x seen by 1 processor P2 is different from the actual order of the two write operations of P1 to x! Therefore, we have come to the conclusion that the mutex code (critical code) between two processes (threads) will be broken down by "processor consistency!
Finally, it should be noted that "processor consistency" and "PARAM consistency" are: Some researchers attempt to request the PC to comply with "PARAM consistency" and "cache consistency" at the same time, to obtain a "processor consistency" that is slightly higher than "PARAM consistency ".
Synchronous memory Access and normal memory Access (Synchronization Access vs. Ordinary Access)
A correct method for compiling parallel programs with shared memory variables is to use mutex to protect the shared memory variables to be accessed. In the first example with a bug above, we can obtain deterministic behavior code through locking. We use S to represent related synchronization operations. P1 P2x = x + 1; S; x = x + 2; in general, in the correct parallel program, we first obtain mutually exclusive access to a series of shared variables, then we can handle the problem according to our requirements, exit the mutex access, and distribute the modified shared variable values to other parts of the system. Other Processors do not need to see the intermediate results; they only need to see the final results.
With this idea, we can carefully examine various types of shared memory models. A classification chart of various memory access models is provided. [Gharachorloo]: shared access | no competition in the region | ------------------------ synchronous non-synchronous | ------------------- obtain the lock release lock
The definition of various memory accesses is as follows: Shared Access actually exists in addition to Shared Access to variables. However, the private access method has nothing to do with the issues we want to discuss now, so we only need to consider sharing access. Competition and Non-competition (Competing vs. Non-Competing) if we have two different processors that want to access the same variable, at least one of them is a write operation, then this is a competitive access. Because the final result depends on which processor accesses the variable first (if both processors perform read operations on the variable, it does not matter who accesses the variable first and who then accesses it ). Synchronous and Non-synchronous (Synchronizing vs. Non-Synchroning) normal competitive access, such as variable access, are Non-synchronous access. Of course, access between synchronization processes must be synchronized. After obtaining the lock and releasing the lock (Acquire vs. Release), we divide the synchronization access into two steps: obtaining the lock and releasing the lock.
Remember, compared to competing access, synchronous access should be rare (if you spend all your time on synchronous access, it must be a problem with your program !), Therefore, we can further weaken our memory access model by treating synchronous access and other access forms differently.
Weak Consistency)
If we only divide the competitive access into synchronous and non-synchronous access, and the following conditions are met, we get "weak consistency": 1. access to synchronization variables is "sequential consistency. these synchronization variables are not allowed to be accessed until the write operation on all synchronization variables is complete. 3. We are not allowed to access (read or write) the synchronized variables until the previous access to the synchronized variables is complete.
The following is a scenario that conforms to "weak consistency" and shows the true use of "weak consistency": P1: W (x) 1 W (x) 2--------------------------------------------------p2: R (x) 0 R (x) 2 s r (x) 2------------------------------------------------------P3: R (x) 1 s r (x) 2
In other words, a processor is not required to broadcast its modifications to variables until a synchronous access occurs. In a distributed system based on the network rather than the bus, this can greatly reduce the communication of information (note that in reality no one will intentionally write a program with such behavior; you never want to read a variable that someone else is updating. The read operation must take place after S. I have mentioned some synchronization algorithms, such as relaxation algorithms, which do not require memory consistency. These algorithms do not work in the "weak consistency" system, because the "weak consistency" system delays data exchange until the synchronization point .)
Release Consistency (Release Consistency) A single synchronization access type requires that: when a synchronization occurs, we must update all the memory. We need to notify other processors to modify the shared variables locally. We must also obtain modifications from other processors through replication. "Release consistency" only focuses on the locked shared memory variables. It only needs to notify other processors of the changes to the locked shared variables. It is defined as follows: 1. Before a common access to a shared variable, all the operations performed by the process to obtain the lock must be completed successfully. 2. Before releasing a lock operation, all the previous read and write operations of the process must have been completed. 3. The lock acquisition and release operations must comply with "sequential consistency ".
Conclusion (One Last Point)
Obviously, a synchronous access is a heavyweight operation because it requires full memory synchronization. But why is there so many memory models? It is because of the basic fact that the synchronization operation cost in these memory models is, after all, higher than the requirement for every memory access (whether it is shared variables or local variables, whether it is read or write operations) the cost for complying with "sequence consistency" is small. (But where the strength of these memory models comes is that the cost of these sync operations isn' t any worse than the cost of every memory access in a sequentially consistent system .)
References: (References)
Gharachorloo, K ., d. lenoski, J. ludon, P. gibbons,. gupta, and J. hennessy, ''memory consistency and event ordering in scalable shared-Memory multiprocessors, ''in Proceedings of the 17th International Symposium on Computer Architecture (1990) 15-26

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.