12.2 The efficiency and consistency of the hardware
Because the computer's storage device and the processor's operation speed has several orders of magnitude difference , Therefore, modern computer systems have to add a layer of read and write speeds as close as possible to the processor speed cache (cache) as a buffer between memory and processor: The data needed to copy the operation to the cache, so that the operation can be done quickly, when the operation is finished and then back to memory from the cache, This way the processor does not have to wait for slow memory to read and write.
Cache-based storage interactions have a good understanding of the speed of processor-to-memory contradiction, but it also brings a higher degree of complexity to the computer system, because it introduces a new problem: cache Consistency (caches Coherence). in multiprocessor systems, each processor has its own tell cache, and they share the same main memory, which is shown in 12-1. When the operations of multiple processors involve the same block of main memory, it may result in inconsistent cache data, and if that happens, whose cache data will be synchronized back to main memory? In order to solve the problem of consistency, the need for each processor to access the cache is followed by a number of protocols, when read and write according to the protocol to operate, such protocols are MSI, MESI (Illinois Protocol), MOSI, Synapse, Firefly and Dragon Protocol and so on. The term "memory model", which will be mentioned many times in this chapter, can be interpreted as an abstraction of read-write access to a particular memory or cache under a specific operating protocol. Physical machines of different architectures can have dissimilar memory models, and Java virtual machines have their own memory models, and the memory access operations described here are highly comparable to the cache access operations of the hardware.
In addition to increasing the cache, in order to make the operating unit within the processor as fully utilized as possible, the processor may perform random execution (Out-of-order execution) optimization of the input code, and the processor will reorganize the results of the disorderly execution after the calculation. Ensure that the result is consistent with the sequential execution, but there is no guarantee that the order of each statement in the program is consistent with the order in the input code , so if there is a calculation task that relies on the intermediate result of another computational task, the order cannot be guaranteed by the order of the Code. With the processor's unordered execution optimization type, the Java virtual machine's immediate compiler has similar command reordering (instruction Reorder) optimizations.
12.3 Java Memory model
The Java Virtual Machine specification attempts to define a Java memory model (MODEL,JMM) that masks memory access differences between various hardware and operating systems to enable Java programs to achieve consistent memory access across various platforms.
12.3.1 main memory and working memory
The main goal of the Java memory model is to define the access rules for each variable in the program, that is, the underlying details of storing variables in the virtual machine into memory and removing variables from memory. the variable (Variables) here differs from the variable described in Java programming, which includes instance fields, static fields, and elements that make up the array object, but not local variables and method parameters, because the latter is thread-private and not shared, and there is no competition problem naturally. In order to achieve better performance, the Java memory model does not restrict the execution engine from using the processor's specific registers or caches to interact with the main memory, nor does it limit the optimizations that the immediate compiler can use to adjust the order of code execution.
The Java memory model Specifies that all variables are stored in the main memory (main memory), where the primary is the same as the main memory name when the physical hardware is introduced, and the two can be likened to each other, but this is only part of the virtual machine memory. Each thread also has its own working memory (the working memory, which can be compared to the previous processor cache analogy), where the thread 's working memory holds a copy of the Master memory copies of the variables used by the thread, and all the thread-to-variable operations (read, assignments, etc.) must be done in working memory and not directly read and write to the variables in main memory . there is no direct access to variables in the other's working memory between different threads, and the transfer of variable values between threads needs to be done through main memory , and the interaction between threads, main memory, and working memory is shown in 12-2.
Here, the main memory, working memory and the previous Java memory area of Java heap, stack, method area and so on is not the same level of memory division, which is basically no relationship, if the two must reluctantly correspond to, that from the variables, main memory, working memory definition, the main memory mainly corresponds to The part of the object instance data in the Java heap, while the working memory corresponds to a portion of the virtual machine stack. From a lower level, the main memory directly corresponds to the memory of the physical hardware, and in order to get higher speed, the virtual machine (or even the optimization of the hardware system itself) may make the working memory priority stored in the register and cache, because the program runs the main access to read and write the working memory.
Copy Copies:
Assuming that the thread accesses a 10MB object, will the 10MB memory be copied one copy of it?
In fact, this is not the case, the reference to this object, the object in a thread access to the field is possible to have a copy, but there is no virtual machine implementation to the entire object copy once.
12.3.2 Inter-memory interoperability
The following 8 actions are defined in the Java memory model for the specific interaction protocol between main memory and working memory, i.e. how a variable is copied from main memory to working memory, how to synchronize from the working memory, and the implementation details such as main memory. The virtual machine implementation must ensure that each of the actions mentioned below are atomic and non-re-divided (for variables of double and long, the load, store, read, and write operations allow exceptions on some platforms).
- Lock: A variable that acts on the main memory, which identifies a variable as a thread-exclusive state.
- unlock (Unlocked): A variable that acts on the main memory, releasing a variable that is in a locked state, and the released variable can be locked by another thread.
- read: A variable that acts on the main memory, transferring the value of a variable from main memory to the working memory of the thread for subsequent load actions to use.
- load: A variable that acts on working memory, which places the value of a read operation from the main memory into a variable copy of the working memory.
- use: A variable acting on a working memory that passes the value of a variable in the working memory to the execution engine, which is performed whenever the virtual opportunity is to a bytecode instruction that needs to use the value of the variable.
- Assign (Assignment): A variable that acts on the working memory, assigning a variable that is received from the execution engine to the working memory, and performs this operation whenever the virtual opportunity is assigned to a byte-code instruction that assigns a value to the variable.
- Store: A variable acting on a working memory that transfers the value of a variable in the working memory to the main memory for subsequent write operations.
- write: A variable that acts on the main memory, which puts the value of a store operation's variable from the working memory into a variable in the main memory.
If you want to copy a variable from main memory to working memory, perform the read and load operations sequentially, and if you want to synchronize the variables from the working memory back to the main memory, execute the store and write operations sequentially. Note that the Java memory model only requires that the two operations above must be executed sequentially, without guarantee of continuous execution. That is, between read and load, between store and write, is a that can be inserted into other directives, such as access to variables A and B in main memory, a possible order of read A, read B, load B, and load a. In addition, the Java memory model stipulates that the following rules must be met when performing the 8 basic operations:
- One of the read and load, store, and write operations is not allowed to appear separately, that is, a variable is not allowed to read from the main memory but the working memory is not accepted, or from the working memory initiated writeback but the main memory does not accept the situation.
- A thread is not allowed to discard its most recent assign operation, that is, the variable must be synchronized to the main memory after it has changed in working memory.
- A thread is not allowed to synchronize data from the working memory of the thread back to main memory for no reason (no assign operation has occurred).
- A new variable can only be "born" in main memory and not allow direct use of a variable that is not initialized (load or assign) in the working memory, in other words, the Assign and load operations must be performed before a variable is implemented as a use, store operation.
- A variable allows only one thread to lock it at the same time, but the lock operation can be repeated multiple times by the same thread, and the variable will be unlocked only after performing the same number of unlock operations, after performing the lock multiple times.
- If you perform a lock operation on a variable, that will empty the value of this variable in the working memory, and you need to re-execute the value of the load or assign manipulation initialization variable before the execution engine uses the variable.
- If a variable is not locked by the lock operation beforehand, it is not allowed to perform a unlock operation on it, nor is it allowed to unlock a variable that is locked by another thread.
Before performing a unlock operation on a variable, you must first synchronize this variable in main memory (execute store, write operation).
These 8 memory access operations, along with the rules defined above, together with some special provisions on volatile that are described later, have fully determined which memory access operations in the Java program are secure in the concurrency. Because this definition is very rigorous but very cumbersome, practice is very troublesome, so in the future I will introduce the definition of an equivalent principle-the first occurrence principle, to determine whether an access in the concurrency environment is safe.
12.3.3 special rules for volatile variables
Keyword volatile can be said to be the most lightweight synchronization mechanism provided by a Java virtual machine.
When a variable is defined as volatile , it will have two characteristics, the first is to ensure that the variable visibility of all threads, where the "visibility" refers to when a thread modifies the value of this variable, the new value for other threads can be immediately informed. while ordinary variables can not do this, the value of the normal variable between the thread pass through the main memory to complete, for example, thread a modifies the value of a common variable, and then write back to the main memory, the other thread B, a write-back after the completion of the read from the main memory, the new variable value to thread B Visible.
The visibility of volatile variables is often misunderstood by developers as follows: "Volatile variables are immediately visible to all threads, and all writes to volatile variables can be immediately reflected in other threads, in other words, volatile Variables are consistent across threads, so operations based on volatile variables are safe in concurrency. There is nothing wrong with the argument in this sentence, but the argument does not come to the conclusion that operations based on volatile variables are safe in concurrency. The volatile variable does not have a consistency problem in the working memory of each thread (the volatile variable can also be inconsistent in the working memory of each thread, but because it is refreshed before each use, the execution engine does not see inconsistencies, so it can be considered that there is no inconsistency). But the operation in Java is not atomic, and the operation of the volatile variable is not as secure as it is in concurrency, and we can illustrate why with a simple demonstration, see the example shown in Listing 12-1.
/** * volatile variable self-increment operation Test * * @author mk */ Public class volatiletest { Public Static volatile intRace =0; Public Static void Increase() {race++; }Private Static Final intThreads_count = -; Public Static void Main(string[] args) {thread[] threads =NewThread[threads_count]; for(inti =0; i < Threads_count; i + +) {Threads[i] =NewThread (NewRunnable () {@Override Public void Run() { for(inti =0; I <10000; i++) {increase (); } } }); Threads[i].start (); }//wait for all cumulative threads to end while(Thread.activecount () >1) Thread.yield (); System.out.println (race); } }
This code initiates 20 threads, each of which makes 10,000 self-increment operations on the race variable, and if the code is properly concurrent, the result of the final output should be 200000. After running this code, the reader will not get the desired result, and the results of each run of the program, the output is different, is a number less than 200000, this is why?
The problem is in the self-increment operation "race++", we use JAVAP to decompile This code will get the code listing 12-2, found that only one line of code increase () method in the Class file is composed of 4 bytecode instructions (return instruction is not by the race++ Generated, this instruction can not be calculated), from the bytecode level it is easy to analyze the cause of the concurrency failure: when the getstatic instruction race value to the top of the Operation Stack, the volatile keyword guarantees that the value of race is correct at this time, but in the execution iconst_1, Iadd These instructions, the other threads may have increased the value of the race, and the value at the top of the operation Stack becomes outdated data, so the putstatic instruction can synchronize the smaller race values into the main memory.
Code Listing 12-2 Volatiletest bytecode
Public Static voidIncrease (); Flags:acc_public, acc_static Code:Stack=2, locals=0, args_size=0 0: getstatic#13 //Field race:i 3: iconst_14: Iadd5: putstatic#13 //Field race:i 8:returnLinenumbertable:line -:0Line -:8
Objectively speaking, I use bytecode to analyze concurrency problems, is still not rigorous, because even if compiled with only one bytecode instruction, it does not mean that the execution of this instruction is an atomic operation. A bytecode instruction in the interpretation of execution, the interpreter will run many lines of code to implement its semantics, if it is compiled execution, a bytecode instruction may also be converted to several local machine code instructions , here using the-xx:+printassembly parameter output disassembly analysis will be more severe , but considering the reader's convenience, and the bytecode is already a problem, the bytecode is used to analyze it.
Since volatile variables can only guarantee visibility, we still need to use locking (atomic classes in synchronized or java.util.concurrent) to guarantee atomicity in an operation scenario that does not conform to the following two rules.
- The result of the operation does not depend on the current value of the variable, or can ensure that only a single thread modifies the value of the variable.
- Variables do not need to participate in invariant constraints with other state variables.
This type of scenario, as shown in listing 12-3 below, is a good fit for using volatile variables to control concurrency, and when the shutdown () method is called, the DoWork () method that executes in all threads is guaranteed to stop immediately.
Code Listing 12-3 volatile usage scenarios
volatileboolean shutdownRequested; publicvoidshutdown() { true; } publicvoiddoWork() { while (!shutdownRequested) { //do stuff } }
The second semantics of using volatile variables is to prohibit command reordering optimizations, and ordinary variables only guarantee that the correct results will be obtained in all areas that depend on the results of the execution of the method, without guaranteeing that the order of the variable assignment is consistent with the order of execution in the program code. because this is not perceptible during the execution of a thread's methods, this is what is described in the Java memory model as the so-called "line range that behaves as a serial semantics" (Within-thread as-if-serial semantics).
The above description is still not easy to understand, and we continue to use an example to see why the order reordering interferes with the concurrent execution of the program, as shown in listing 12-4 of the demo program.
Code Listing 12-4 Instruction reordering
map configoptions; char [] configtext; //this variable must be defined as volatile volatile boolean initialized = false ; //assume the following code executes in thread A //analog read configuration information, when read is complete, set initialized to True to notify other threads that the configuration is available configoptions = new HashMap (); Configtext = Readconfigfile (fileName); Processconfigoptions (Configtext, configoptions); initialized = true ; //assume that the following code executes in thread B. //waits for initialized to be true, which means that thread A has initialized the configuration information to completion of while (!initialized) {Sleep (); } //uses the configuration information initialized in thread A dosomethingwithconfig ();
The program in Listing 12-4 is a pseudo-code that describes a scenario that is very scenario-only when we are dealing with a configuration file that is generally not concurrency. If the initialized variable is defined without the use of a volatile modifier, it is possible that the code "Initialized=true" in the last sentence of thread A is executed in advance (although Java is used as a pseudo-code) because of the optimization of the order reordering. However, the reordering optimization referred to is machine-level optimization, and early execution means that the corresponding assembly code is executed in advance, so that code that uses configuration information in thread B can have errors, and the volatile keyword avoids such situations. (Note: Thesemantics of the volatile masking command reordering are completely fixed in JDK 1.5, and the previous JDK does not completely avoid the problem of reordering even if the variables are declared as volatile (mainly because the code before and after the volatile variable still has a reorder problem) , which is why the DCL (double lock detection) cannot be used on schedule in Java prior to JDK 1.5 to implement the singleton mode. )
Command reordering is the easiest place for developers to be confused in concurrent programming, and in addition to the pseudo-code examples above, I'll give you an example of how the volatile keyword can be used to prevent command reordering optimizations. Listing 12-5 is a standard DCL single code that can be used to observe the differences in the assembly code generated when adding volatile and not adding volatile keywords
Public classSingleton {Private volatile StaticSingleton instance; Public StaticSingletongetinstance() {if(Instance = =NULL) {synchronized (Singleton.class) {if(Instance = =NULL) {instance =NewSingleton (); } } }returnInstance } Public Static void Main(string[] args) {singleton.getinstance (); } }
After compiling, this code assigns a value to the instance variable as shown in Listing 12-6.
0x01a3de0f: mov $X3375cdb0,%esi;... beb0cd75 -; {OOP (' Singleton ')}0x01a3de14: mov%eax,0x150(%esi) ;...89865001 0000 0x01a3de1a: SHR $X9,%esi;... c1ee090x01a3de1d: Movb $x0,0x1104800(%esi);... c6860048100100 0x01a3de24: Lock Addl $x0,(%esp);... f0830424xx;*putstaticinstance; -Singleton::getinstance@24
By contrast, it is found that the key change is the variable with the volatile modifier, after the assignment (the previous MOV%eax,0x150 (%esi) is an assignment operation), a more action is performed:
$0x0,(%esp)
This operation is equivalent to a memory barrier (Barrier, or memory Fence, which can not reorder the following instructions before the memory barrier), and only one CPU accesses the memory without a memory barrier, but if there are two or more CPUs Access to the same piece of memory, and one of which is observing another, requires a memory barrier to ensure consistency. this directive "Addl $0x0, (%ESP)" (adding the value of the ESP register to 0) is obviously an empty operation (using this empty operation instead of the null operation instruction NOP is because the IA32 manual specifies that the lock prefix is not allowed to be used with NOP instructions), the key is the LOC K prefix, query IA32 manual, its role is to make this CPU cache write memory, the write action will also cause other CPUs or other kernel invalid (Invalidate) Its cache, this operation is equivalent to the cache in the variable did a previous introduction Jav A "store and write" operation as described in memory mode . So an empty operation allows the modification of the previous volatile variable to be immediately visible to other CPUs.
So why does it prohibit order reordering? From the hardware architecture, the command reordering refers to the CPU used to allow multiple instructions not in accordance with the program specified in the order of development to each corresponding circuit unit processing. However, it does not mean that the command is arbitrarily re-queued, the CPU needs to be able to properly handle the instruction dependencies to ensure that the program can produce the correct results. For example, command 1 adds 10 to the value in address a, instruction 2 multiplies the value in address a by 2, and instruction 3 subtracts 3 from the value in address B, when instruction 1 and instruction 2 are dependent, and the order between them cannot be re-ordered-(A + 10) * 2 and A * 2 + 10 are obviously unequal , but instruction 3 can be re-queued to instructions 1, 2, or in the middle, as long as the CPU execution depends on a, B value of the operation is able to obtain the correct a and B values can be. So in this internal CPU, reordering looks still orderly. As a result, the lock Addl $0x0, (%ESP) instruction synchronizes the changes to memory, which means that all previous operations have been performed, thus creating the effect of "command reordering cannot cross the memory barrier".
Does volatile make our code faster than using other sync tools?
In some cases, the performance of the volatile synchronization mechanism is really better than the lock (using the Synchronized keyword or the lock inside the java.util.concurrent package), but due to the many elimination and optimization of the virtual machine's lock implementation, it is difficult to quantify that Volatile will be much faster than synchronized. If you allow volatile to compare itself with yourself, you can determine a principle:the performance consumption of volatile variable read operations is almost no different from normal variables, but writes may be slower, Because it needs to insert many memory barrier directives in the local code to ensure that the processor does not occur in a disorderly sequence. But even so, the total cost of volatile in most scenarios is still lower than the lock, and the only reason we choose between volatile and locked is just whether the semantics of volatile can satisfy the needs of the usage scenario.
At the end of this section, let's look back at the special rules that define volatile variables in the Java memory model. Assuming that T represents a thread, and V and W represent two volatile variables respectively, the following rules are required for read, load, use, assign, store, and write operations:
- The thread t can perform the use action on the variable v only if the previous action of the thread T on the variable v is load, and the thread T can perform the load action on the variable v only if the last action that the thread T performs on the variable V is use. The use action of the thread T on the variable v can be considered to be associated with the load, read action of the variable V for the thread T, which must be present together (this set of rules requires that in working memory, the latest value must be flushed from main memory before each use of V, to ensure that the changes made to the variable v are visible to other threads Values).
- The thread T can perform a store action on the variable v only if the previous action of the thread T on the variable is assign, and the thread T can perform a assign action on the variable v only if the last action that the thread T performs on the variable V is store. The Assign action of the thread t to the variable v can be considered to be associated with the store, write action of the variable V for the thread T, which must be present together (this rule requires that in the working memory, it must be synchronized back to main memory every time the V is modified, to ensure that other threads can see their own variables V The changes that were made).
- Assuming that action A is a use or assign action on a variable v implemented by a thread T, it is assumed that action F is the load or store action associated with action A, assuming that action P is a read or write action corresponding to the variable V of the action F; B is the use or assign action of the thread T to the variable w, assuming that the action G is the load or store action associated with Action B, assuming that the action Q is the read or write action corresponding to the variable w of the action G. If A precedes B, then P is preceded by Q (this rule requires that volatile-modified variables are not optimized by instruction reordering, ensuring that the code executes in the same order as the program).
Deep understanding of Java Virtual Machine--JVM advanced features and Best Practices (2nd edition) PDF download:
http://download.csdn.net/detail/xunzaosiyecao/9648998
Jiankunking Source: http://blog.csdn.net/jiankunking
Deep understanding of the JVM reading note Five: Java memory model and volatile keyword