Java Concurrency Programming-volatile

Source: Internet
Author: User
Tags volatile

The previous article, learning the concurrent programming in the synchronized, this is a good understanding, but also my first learning multithreaded programming in a simple implementation, the university will be, and then always thought that the synchronization of multi-threaded environment can only be achieved through this, In fact, Java also provides another more lightweight implementation-volatile, if synchronized realizes that the data can only have one thread to access the data at the same time, then the volatile implementation is that multiple threads can access the data at the same time, But as long as the data changes, make sure that other threads "perceive" the change in a timely manner.

1,CPU, main memory and the concept of cache

The hardware composition of the computer can be abstracted from the bus, IO device, main memory, processor (CPU) and so on. Where the data is stored in main memory, the CPU is responsible for the execution of instructions, the CPU instruction execution is very fast, the execution of most simple instructions only need a clock cycle, while the primary data of the reading will require dozens of to hundreds of clock cycles, then the CPU from memory read and write data will have a great delay. This time, the concept of caching is generated.

In other words, when the program is running, the data needed for the operation is copied from main memory to the CPU cache, then the CPU can read the data directly from its cache and write the data to it, when the operation is finished, the data in the cache is written back to main memory, In this way, the CPU's latency to get data from main memory is reduced. Roughly the following:

Figure A model, can be simply considered as a single-core model, in this model, take i++ this operation as an example, when the program executes, it will first get the value of I from the main memory, copied to the cache, then the CPU load from the cache and perform +1 operations, the completion of the write-back to the cache, Finally, write back from cache to main memory. The single-core model does not have any problems with this operation, but since the computer has been created, two goals have been pursued, one is how to do more, the other is how to calculate faster, so the change is the single core into multicore, cache tiered storage. Roughly the following:

In Figure two, i++ this operation has a problem, because the multicore CPU can be thread parallel computing, in core 0 and core 1 can simultaneously copy I to the respective cache, and then the CPU to calculate each, assuming that the initial I is 1, then we expect to be 2, But actually because the two CPU each successively calculates after the final main memory the I may be 2, may also be other values.

This is a hardware memory architecture in a problem, the cache consistency problem, that is, the kernel 1 changed the value of the variable i, the kernel 0 is not known, stored or old value, and finally to such a dirty data operation.

To this end, the CPU manufacturers have customized the relevant rules to solve such a hardware problem, mainly the following ways:

1) bus lock, in fact, very good understanding of the bus lock, let's take a look at figure two, the previous variable will be copied from the main memory to the cache, after the completion of the calculation will be written back to the main memory, and the cache and main memory interaction will go through the bus. Since the variable can not be simultaneously operated by multiple CPUs at the same time, it will bring dirty data, so long as the other CPU on the bus, to ensure that only one CPU at a time to operate the variable, the subsequent CPU read and write operations will not have dirty data. The disadvantage of bus lock is also obvious, a bit similar to the multi-core operation into a single core operation, so inefficient;

2) Cache lock, that is, cache consistency protocol, mainly MSI, MESI, Mosi, and so on, the main core idea of these protocols: When the CPU writes data, if the variable that found the operation is a shared variable, that is, there is also a copy of the variable in other CPUs, Signals to other CPUs that the cache row of the variable is invalidated, so that when the other CPU needs to read the variable, the cache row that caches the variable in its cache is invalid, then it is re-read from memory.

2.Java memory model

In the Java Virtual Machine specification, an attempt was made to define a Java memory model (MODEL,JMM) to mask the memory access differences between the various hardware platforms and operating systems to enable Java programs to achieve consistent memory access across various platforms. Prior to this, the mainstream programming language (C + +, etc.) directly used the physical hardware and operating system memory model (can be understood as similar to the direct use of hardware standards), are more or less on different platforms have the same execution results.

The main goal of the Java memory model is to define the access rules for variables in the program, namely the underlying details of the variables stored in memory and the variables taken out of memory. It specifies that all variables are stored in main memory, that each thread has its own working memory, that the thread reads and writes the variables to the working memory first, and then writes back to the main memory after performing the calculation operation, and each thread cannot access the working memory of the other threads. Roughly as follows:

Figure three we can understand as and figure two is a meaning, working memory can be seen as CPU cache, register abstraction, the main memory can be seen as the physical hardware of the main memory abstraction, figure II This model will have a cache consistency problem, figure three also has a cache consistency problem.

In addition, to achieve better performance, the Java memory model does not restrict the execution engine from using processor registers or caches to increase instruction execution speed, nor does it restrict the compiler from reordering instructions. In other words, in the Java memory model, there is also an issue of command reordering.

How does the Java language solve these two problems? is to solve the problem of cache consistency and instruction rearrangement through the keyword volatile, which is to ensure visibility and prohibit command rearrangement.

3 , volatile behind the implementation

So how does volatile make sure the visibility and prohibition order reflow? Let's start by writing a singleton pattern code to see.

 Public classSingleton {Private Static volatileSingleton instance;  Public StaticSingleton getinstance () {if(Instance = =NULL) {            synchronized(Singleton.class) {                if(Instance = =NULL) {instance=NewSingleton (); }            }        }        returninstance; }     Public Static voidMain (string[] args) {singleton.getinstance (); }}

First look at the bytecode level, what the JVM has done.

Figure Four

As you can see from figure four, there is nothing special about it. Now that we don't see a clue at the bytecode level, let's look at what it's like to convert code into assembly instructions. Conversion to assembly instructions, can be achieved through-xx:+printassembly, the window environment specific how to operate please refer here (HTTPS://DROPZONE.NFSHOST.COM/HSDIS.XHT). But it is a pity that although I compiled a successful hsdis-i386.dll (Figure V), placed in the JDK8 under the multiple bin directory, consistent in the report cannot find this DLL file so I decided to change the idea of a glimpse.

Figure Five

The idea is to read the source code of the OPENJDK. In fact, through the JAVAP can see the VOLATILE bytecode level has a keyword acc_volatile, through this keyword to locate the accessflags.hpp file, the code is as follows:

bool is_volatile    const         return (_flags & jvm_acc_volatile    0;}

To search for the keyword is_volatile, you can see the following code in BytecodeInterpreter.cpp:

//          //Now Store the result//          intField_offset = cache->F2_as_index (); if(cache->Is_volatile ()) {            if(Tos_type = =ITOs) {obj->release_int_field_put (Field_offset, Stack_int (-1)); } Else if(Tos_type = =Atos) {Verify_oop (Stack_object (-1)); Obj->release_obj_field_put (Field_offset, Stack_object (-1)); Orderaccess::release_store (&byte_map_base[(uintptr_t) obj >> Cardtablemodrefbs::card_shift],0); } Else if(Tos_type = =Btos) {obj->release_byte_field_put (Field_offset, Stack_int (-1)); } Else if(Tos_type = =LTOs) {obj->release_long_field_put (Field_offset, Stack_long (-1)); } Else if(Tos_type = =CTOs) {obj->release_char_field_put (Field_offset, Stack_int (-1)); } Else if(Tos_type = =STOs) {obj->release_short_field_put (Field_offset, Stack_int (-1)); } Else if(Tos_type = =FTOs) {obj->release_float_field_put (Field_offset, Stack_float (-1)); } Else{obj->release_double_field_put (Field_offset, Stack_double (-1));          } orderaccess::storeload (); }

In this code, you will first determine the Tos_type, followed by a different implementation of the underlying type, such as int called Release_int_field_put,byte call release_byte_field_put, and so on. Take the int type as an example and continue searching for the method Release_int_field_put, where OOP.HPP can see the following code:

void release_int_field_put (int offset, jint contents);

This code is actually inline oop.inline.hpp, and the concrete implementation is this:

void oopdesc::release_int_field_put (int offset, jint contents)       {Orderaccess::release_store ( INT_FIELD_ADDR (offset), contents);  }

Actually see this, you can see the previous article is very familiar with OOP.HPP and OOP.INLINE.HPP, is very familiar with the Java object model. Continue to see Orderaccess::release_store, you can find the corresponding implementation method in ORDERACCESS.HPP:

Static void     Release_store (volatile jint*    p, jint    v);

In fact, the implementation of this method and a lot of inline for different CPUs have different implementations, in the SRC/OS_CPU directory can see different implementations, taking ORDERACCESS_LINUX_X86.INLINE.HPP as an example, is so implemented:

void     Orderaccess::release_store (volatile jint*    p, jint    v) {*p = V;}

Can see in fact Java's volatile operation, the first step in the JVM implementation level is to give the C + + primitive implementation, and then see BytecodeInterpreter.cpp Intercept code, will give a orderaccess::storeload ( ) operation, and this operation executes the code like this (ORDERACCESS_LINUX_X86.INLINE.HPP):

void Orderaccess::storeload ()  {fence ();}

The fence method code is as follows:

Inlinevoidorderaccess::fence () {if(OS::IS_MP ()) {//locked Addl Since mfence is sometimes expensive#ifdef AMD64 __asm__volatile("lock; Addl $0,0 (%%RSP)": : :"cc","Memory");#else__asm__volatile("lock; Addl $0,0 (%%ESP)": : :"cc","Memory");#endif  }}

As can be seen and through the-xx:+printassembly to see behind the realization: lock; Addl, in fact, this is the memory barrier, a detailed description of the memory barrier can look at the ORDERACCESS.HPP comments. The memory barrier provides 3 functions: to ensure that the command reordering does not place the instructions behind the memory barrier, and does not queue the preceding instruction to the memory barrier, forcing the modification of the cache to write to main memory immediately, and if it is a write operation, it will invalidate the corresponding cache line in other CPUs. How do these 3 functions work? Consider the strategy for the memory barrier:

Insert the Storestore barrier before each volatile write operation;

Insert the Storeload barrier behind each volatile write operation;

Insert the Loadload barrier behind each volatile read operation;

Insert the Loadstore barrier behind each volatile read operation;

where Loadload and Loadstore correspond to the method Acquire,storestore corresponding to the method release,storeload corresponds to is the method fence.

4 , volatile Application Scenarios

4.1 Double check Single Case

 Public classSingleton {Private Static volatileSingleton instance; PrivateSingleton () {};  Public StaticSingleton getinstance () {if(Instance = =NULL) {synchronized (Singleton.class) {                if(Instance = =NULL) {instance=NewSingleton (); }            }        }        returninstance; }}

Why write this, this online has a lot of information, here will not repeat.

4.2 java.util.concurrent

A large number of basic classes and toolbars that are applied under J.U.C form the basis for Java and contract. Subsequent concurrent programming learning can be followed by this roadmap to learn.

Resources:

Https://github.com/lingjiango/ConcurrentProgramPractice

Https://stackoverflow.com/questions/4885570/what-does-volatile-mean-in-java

Https://stackoverflow.com/questions/106591/do-you-ever-use-the-volatile-keyword-in-java

Https://www.cnblogs.com/zhangj95/p/5647051.html

Http://download.oracle.com/otn-pub/jcp/memory_model-1.0-pfd-spec-oth-JSpec

https://www.cs.umd.edu/~pugh/java/memoryModel/

Java Concurrency Programming-volatile

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.