Transfer from Java concurrency Guru Brain goetz:http://www.ibm.com/developerworks/cn/java/j-jtp02244/(Chinese address)
Http://www.ibm.com/developerworks/java/library/j-jtp02244/index.html (English address)
What is the Java memory model, and how was it destroyed at first?
Summary: JSR 133, which has been active for nearly three years, has recently released public recommendations on how to fix the Java Memory Model (JMM). There are several serious flaws in the original JMM, which lead to some of the most difficult conceptual semantics that were originally thought to be simple, such as volatile, final, and synchronized. In this issue of Java theory and practice, Brian Goetz shows how to strengthen the semantics of volatile and final to fix JMM. Some of these changes are already integrated in JDK 1.4, while others will be included in JDK 1.5. You can share your view of this article with the author and other readers in the forum that corresponds to this article (you can also access the forum by clicking the discussion button at the bottom or at the top of the article).
The Java platform integrates threading and multi-processing technologies into the language, which is much more integrated than most of the previous programming languages. The language's support for platform-independent concurrency and multithreaded technology is ambitious and pioneering, and perhaps not surprisingly, a problem that is slightly more difficult than the original idea of a Java architect. Many of the underlying confusion about synchronization and thread safety are some of the subtle nuances of the Java Memory Model (JMM) that were originally specified in the 17th chapter of Java Language Specification and were re-specified by JSR 133.
For example, not all multiprocessor systems exhibit cache Consistency (cache coherency); If a processor has an updated variable value in its cache, but has not yet been stored in main memory, the other processor may not see the updated value. With a lack of consistency in the cache, two different processors can see two different values in the same location in memory. That doesn't sound likely, but it's intentional-it's a way to get higher performance and scalability-but it adds to the burden of developers and compilers writing code to solve these problems.
What is a memory model and why do I need a memory model?
The memory model describes the relationships between variables (instance fields, static fields, and array elements) in the program, as well as low-level details such as storing variables into memory and removing variables from memory in the actual computer system. The object is eventually stored in memory, but the compiler, runtime, processor, or cache can have privileged timing to deposit or remove variable values at the specified memory location of the variable. For example, in order to optimize a cyclic index variable, the compiler may choose to store it in a register, or the cache will be deferred to a more appropriate time before a new variable value is stored in main memory. All of these optimizations are designed to help achieve higher performance, which is usually transparent to the user, but for multi-processing systems, these complex things can sometimes be completely apparent.
JMM allows the compiler and cache to have important privileges in the order in which data moves between processor-specific caches (or registers) and main memory, unless the programmer has used synchronized
or final
explicitly requested certain visibility assurances. This means that in the absence of synchronization, the operation of the memory takes place in a different order from a different thread perspective.
In contrast, languages such as C and C + + have no memory models to display--but C programs inherit the memory model of the executor processor (although a given architecture compiler might know something about the memory model of the underlying processor, and part of the responsibility for consistency falls to The compiler's head). This means that concurrent C-language programs can run correctly on one, but not on another, processor architecture. While it may be a bit confusing at first, there is a big benefit-the program that is synced correctly according to JMM can run correctly on any Java-JMM platform.
Disadvantages of Primitive JMM
Although the JMM specified in chapter 17th of Java Language specification is an ambitious attempt to define a consistent, cross-platform memory model, it has some minor and important drawbacks. synchronized
and volatile
The semantics are confusing so that many insightful developers sometimes choose to ignore these rules because it is difficult to write code that is correctly synchronized under the old storage model.
The old JMM allows strange and confusing things to happen, such as the final field will present an unexpected result with a value that is not the same as the value it was set in the constructor (which makes the imaginary immutable object not immutable) and the memory Operation Reordered. This also prevents some other effective compiler optimizations. If you read any article about the double-check locking issue (double-checked locking problem), you 'll remember how confusing the memory manipulation reordering is, And how subtle but serious problems can be hidden in your code when you're not syncing correctly (or not actively trying to avoid syncing). Even worse, many programs that do not sync correctly seem to work well in some cases, such as under a slight load, on a single-processor system, or on a processor with a stronger memory model than JMM requires.
The term "reordering" is used to describe several types of true and distinct reordering of memory operations:
- When the compiler does not change the semantics of the program, as an optimization it can arbitrarily reorder certain instructions.
- In some cases, you can allow the processor to perform some operations in reverse order.
- Caches are usually allowed to store variables in main memory in the same order as they were written by the program.
From the point of view of another thread, any of these conditions will cause some action to occur in a different order than the program specifies--and when the reordered source code is ignored, the memory model considers all these conditions to be equal.
The goal of JSR 133
JSR 133 is authorized to fix JMM, and it has several goals:
- Retain existing security assurances, including type safety.
- Provides no -out security (out-of-thin-air safety). This means that variable values are not created "out of nowhere"-so for a thread to observe that a variable has a variable value x, a thread must have previously actually written the variable value x to that variable.
- The semantics of the "properly synchronized" program should be as simple and intuitive as possible. In this way, "properly synchronized" should be formally and intuitively defined (the two definitions should be consistent with each other).
- Programmers should have the confidence to create multithreaded programs. Of course, we don't have the magic to make it easy to write concurrent programs, but our goal is to ease the programmer's burden of understanding all the details of the memory model.
- It should be possible to implement high-performance JVM implementations across a wide range of popular hardware architectures. Modern processors differ greatly in their memory models, and JMM should be able to fit into the actual architecture as much as possible without sacrificing performance.
- Provides a synchronous idiom (idiom) that allows us to publish an object and make it visible without synchronization. This is a new security guarantee called Initialize security (initialization safety) .
- There should be minimal impact on existing code.
It is worth noting that vulnerable technologies (such as double-check locking) still have vulnerabilities under the new memory model, and that the "fix" double-check locking technique is not a goal of the new memory model. (However, volatile
the new semantics allow the optional method that is usually proposed by one of the double check locks to work correctly, although we do not encourage this technique.) )
Over the three years since the JSR 133 process became active, it has been found that these problems are much more nuanced than any of the issues they deem important. This is the price of being a trailblazer! The final formal semantics are much more complex than originally expected, and in fact it takes a completely different form than originally expected, but the informal semantics are clear and intuitive and will be outlined in the 2nd part of this article.
Synchronization and visibility
Most programmers know that the synchronized
keyword enforces a mutex (mutually exclusive) that prevents multiple threads from entering a block of synchronized statements that are protected by a given monitor each time. But there is another aspect of synchronization: As JMM specifies, it enforces certain memory visibility rules. It ensures that when a synchronization block is exited, the local processor cache is flush to main memory (so that other threads can read from main memory to the newest value) and the local processor cache is invalidated when entering a synchronization block (which can only be read again from main memory). Therefore, during a synchronization block that is protected by a given monitor, the value written by one thread is visible to all remaining threads that execute a synchronization block that is protected by the same monitor . It also ensures that the compiler does not move instructions from the inside of a synchronization block to the outside (although in some cases it moves the instruction from the outside of the synchronization block to the inside). JMM does not make this guarantee in the absence of synchronization-this is why synchronized (or its compatriots) must be used whenever there are multiple threads accessing the same variable volatile
.
Issue 1: Immutable objects are not immutable
One of the most striking drawbacks of
JMM is that immutable objects seem to be able to change their values (the immutability of such objects is intended to be guaranteed by using the final
keyword). (Tip: final does not necessarily make the object immutable -- All fields also must be primitive type < here means int , Boolean and other basic types or built-in types; or to references to immutable objects . Immutable objects, such as String
, are not considered to require synchronization. However, because there is a potential delay in propagating memory-write changes from one thread to another, there is a possibility of a race condition that allows a thread to see a value for an immutable object first, and a different value after a period of time.
How did this happen? Given the implementation of the Sun 1.4 JDK String
, there are basically three important determinant fields: the reference to the character array, the length, and the offset of the character array at which the string begins. String
is implemented in this way, not just a character array, so a character array can be String
shared between multiple and objects without having to StringBuffer
copy the String
text into a new array each time it is created. For example, String.substring()
you create a String
new string that can share the same character array as the original, and the two strings differ only in length and offset.
Assume that you execute the following code:
String S1 = "/usr/tmp"; String s2 = s1.substring (4); Contains "/tmp"
The string s2
will have a size of 4 and an offset, but it will s1
share the /usr
/tmp
same character array with "". String
before the constructors run, Object
the constructors initialize all fields with their default values, including the decisive length and offset fields. When the String
constructor runs, the string length and offset are set to the desired value. However, under the old memory model, in the absence of synchronization, it is possible for another thread to temporarily see the offset field with the initial default value of 0, and then see the correct value of 4. The result is s2
that the value /usr
has changed /tmp
from"" to "". This is not what we want, and it is not possible on all JVMs or platforms, but the old memory model specification allows this.
Issue 2: reordering volatile and non-volatile storage
Another major area is related to the volatile
reordering of field memory operations , and the existing JMM in this area have caused some very confusing results. Existing JMM indicate that volatile reads and writes are directly involved with main memory, thus avoiding storing values in registers or bypassing processor-specific caches. This allows multiple threads to see the most recent value of a given variable. The result, however, was that the volatile
definition was not as useful as originally thought, and it led to volatile
significant confusion in the real sense.
To provide better performance in the absence of synchronization, compilers, runtimes, and caches are often allowed to reorder normal memory operations, as long as the currently executing thread cannot distinguish between them. (This is what the so-called thread inside seems to be the serial semantics (within-thread as-if-serial semantics). However, volatile reads and writes are arranged completely across threads, and the compiler or cache cannot reorder volatile reads and writes between each other. Unfortunately, by referencing the read and write of common variables, JMM allows volatile reads and writes to be reordered, which means that we cannot use the volatile flag as an indication that the operation has been completed. Consider the following code, which is intended to assume that the volatile field is initialized
used to indicate that initialization is complete.
Listing 1. Use a volatile field as a "guard" variable
Map configoptions;char[] Configtext;volatile boolean initialized = false;. .//In thread A configoptions = new HashMap (); configtext = Readconfigfile (fileName);p rocessconfigoptions ( Configtext, configoptions); initialized = true;. .//In thread B while (!initialized)
The idea here is to use a volatile variable initialized
as a guard to indicate that a set of other operations have been completed. This is a good idea, but it cannot work under the old JMM, because the old JMM allows non-volatile writes (such as writing to fields and configOptions
writing to the configOptions
fields referenced) to be Map
reordered with volatile writes , So another thread might see it initialized
as true, but there configOptions
is no consistent or current view of the field or the object it refers to. volatile
The old semantics only promises the visibility of variables that are being read and written, without committing to other variables . Although this approach is easier and more effective, the result is not as useful as originally thought.
Conclusion
As specified in the 17th chapter of Java Language specification, JMM has some serious drawbacks, that is, allowing some seemingly reasonable programs to take place something that is not intuitive or undesirable. If it is too difficult to write concurrent classes correctly, then we can say that many concurrent classes do not work as expected, and that this is a disadvantage in the platform. Fortunately, we can create a memory model that is more consistent with the intuition of most developers without destroying any code that is properly synchronized under the old memory model, and this has been done by the JSR 133 process. Next month, we'll cover the details of the new memory model, which most of its features are integrated into the 1.4 JDK.
Repairing the Java memory model, part 1th--brian Goetz