Java Concurrency Basics

Last Update:2018-04-04 Source: Internet

Author: User

Tags flushes garbage collection

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

CPU Multilevel Cache-cache consistency

Simply show the simplest cache configuration, the data read and storage are cached, the CPU core and the cache has a special fast channel; main memory and cache are connected to the system bus (bus) This bus is also used for communication with other components:

Soon after the cache appeared, the system became more and more complex, and the speed difference between the cache and main memory was pulled up until another cache was added, the new level of cache was larger than the first cache, but slower and economically inappropriate, so there was a level two cache, and even some systems already had a level three cache. So it evolved into a multilevel cache, such as:

Why CPU cache is required:

CPU frequency is too fast to keep up with the main memory, so in the processor clock cycle, the CPU often needs to wait for main memory, which will waste resources. So the cache appears to be a mismatch between CPU and memory speed (structure: memory, CPU, cache)

Cache capacity is much smaller than main memory, so it is inevitable that cache misses, since the cache can not contain all the data required by the CPU, then the existence of the cache is really meaningful?

CPU cache is certainly the meaning of its existence, as for the CPU cache what the meaning, it should look at its local principle:

1. Time locality: If a data is accessed, it is likely to be accessed again in the near future
2. Spatial locality: If a data is accessed, then the data adjacent to it may also be accessed quickly

Multilevel Cache-Cache Consistency (MESI), MESI is a protocol that is used to guarantee the consistency of cached shared data between multiple CPU caches. It defines the four data states of the Cacheline, and the CPU's four operations on the cache can produce inconsistent states. Therefore, when the cache controller hears local operation and remote operation, it needs to make certain changes to the Cacheline state of the address, so as to ensure the consistency of data flow between multiple caches. The four states of Cacheline are as follows:

M:modified modification, refers to the cache line is only cached in the CPU's cache, and is modified, so he and main memory in the data is inconsistent, the data in the cache line needs to be at some point in the future (allow other CPUs to read the corresponding contents of main memory before) write back to main memory, And when the data is written back to main memory, the state of the cache line becomes E (exclusive)

E:exclusive Exclusive Cache line is cached in the CPU cache, is unmodified, and the main memory of the data is consistent, can at any time when the other CPU read the memory, into the s (shared) state, the same when the CPU modifies the contents of the cache line, will become m (modified) state

S:share sharing, the current CPU and other CPUs have common data, and the data in main memory is consistent, meaning that the cache row may be cached by multiple CPUs, and the data in each cache is consistent with the main memory data, when a CPU modifies the cache row, The cache line in the other CPU can be invalidated and become an I (invalid) state

I:invalid invalid, on behalf of this cache is invalid, there may be other CPUs modified the cache line, the data should be obtained from main memory, other CPUs may have data or no data, the current CPU data and main memory is considered inconsistent; for Invalid, In the Mesi protocol, write invalidation is taken (writing invalidate).

MESI:

Cacheline has four data states (MESI), and there are four CPU cache operations that cause data state transitions:

Local read: Reading the data in the native cache
Local write: Writes the data to the native cache
Remote read: Reads the data from the inside (primary) memory into the cache
Remote write: Write the data in the cache back to main memory

Therefore, to fully understand Mesi this protocol, we need to understand the situation of these 16 state transitions, the relationship between States, can be used to represent:

In a typical multicore system, each core will have its own cache to share the main memory bus, and each corresponding CPU will issue read-write (I/O) requests, while the cache is intended to reduce the number of times the CPU reads and writes shared memory. A cache can satisfy CPU read requests In addition to the Invalid state.

A write request can be executed only if the cache line is M-state, or e-state. If the current state is in the S state, it must first turn the cache row in the cache into an invalid (Invalid) state, which is usually done in a broadcast manner. At this point, it does not allow different CPUs to modify the same cache line at the same time, even if it is not allowed to modify data in different locations of the cache line, the main solution here is the problem of cache consistency. A cache line in the M state it must always listen to all attempts to read the cache line relative to the main memory, which must be deferred until the cache writes the cache line back to main memory and turns the state into the S state.

A cache line in the S state must also listen to other caches to invalidate the cache line or to exclusive access to the cache row and invalidate the cache row (Invalid).

A cache line in the E state must also listen for other cached rows in the cache read memory, and once this is done, the cache line needs to become the S state.

Therefore, for both M and e states, they are always accurate, and they are consistent with the actual state of the cache line. While the S state may be inconsistent, if a cache is invalidated by a cache line in the S state, and the other cache may actually have exclusive access to the cache row, the cache does not upsize the cache row to the E state because other caches do not broadcast notifications that they have invalidated the cache line. Also, because the cache does not hold the number of copy of the cache row, there is no way to determine whether or not you have exclusive access to the cache row, even with this notification.

From the above, the E-state is a speculative optimization: If a CPU wants to modify a cache line in the S state, the bus transaction needs to change all of the cache rows to the invalid state, and the cache for the E-state does not need to use a bus transaction.

CPU multi-level cache-Random order execution optimization

What is Order execution optimization:

Processor to improve the speed of operation to make a violation of the original order of the Code optimization

For example, I now have two variables a and b,a with a value of 10,b of 200, and I want to calculate the result of a multiplied by B. And what I wrote on the code was:

a=10;
b=200;
Result=a*b;

However, when the order is optimized on the CPU, it may become:

b=200;
a=10;
Result=a*b;

Such as:

From there, you can see that the CPU is executing the optimized code without affecting the results of the calculation, but this is only one of the things that has not been affected. In the single-core era, the processor guarantees that the optimizations made do not result in execution results far from the intended target. However, this is not the case in multi-core environments, where multiple cores are executing instructions at the same time, and each core instruction may be ordered out of order. In addition, the processor introduces L1, L2 and other multi-level caching mechanism, and each core has its own cache, which leads to the logical order after the data written may not really write. If we do not take any precautions, the result of the final processing of the processor may be quite different from the logical result of our code. For example, we perform a data write operation on a core and write a token at the end to indicate that the previous data is ready. Then from the other core by judging the mark to determine whether the required data is ready, there is a risk that the tag bit may be written first, and the data is not ready to complete, this is not completed can be either no calculation is completed, it is possible that the cache has not been refreshed in time into main memory, This will eventually lead to another core using the wrong data, so we often ensure thread safety in multi-threaded situations.

Java memory model

In order to ensure thread safety, we need to take some extra measures to prevent the problem of thread safety when the CPU performs the optimization of the execution of the sequence in the multi-core concurrency environment.

But before we introduce how to solve this problem by practical means, let's take a look at how the Java virtual machine solves this problem: in order to block the access differences between various hardware and operating system memory, so that Java programs can achieve consistent concurrency under various platforms, The Java memory model is defined in the Java Virtual Machine specification (JMM).

The Java memory model is a specification that defines how the Java virtual machine and the computer's memory work together. It specifies how and when a thread can see the values of shared variables that have been modified by other threads, and how to access shared variables synchronously when necessary.

After defining what the Java memory model does, let's look at two concepts of memory allocation

Head (heap): The heap in Java is a runtime data area, and the heap is responsible for the garbage collection mechanism. The advantage of a heap is that it can dynamically allocate memory size, and the lifetime does not have to tell the compiler beforehand, because it allocates memory dynamically at runtime, and Java's garbage collection mechanism automatically reclaims data that is no longer used. But it also has drawbacks because it is dynamically allocating memory at run time, so it has a relatively slower access speed.
Stack: The advantage of the stack is that the access speed is faster than the heap, second only to the registers in the computer, the data of the stack can be shared. The disadvantage of the stack is that the size of the data in the stack and the lifetime must be deterministic, lack of flexibility, so the stack is mainly used to store some basic data types of variables, such as: Int,short,long,byte,double,float,boolean, Char, and object handles, and so on.

The Java memory model requires that the call stack and local variables be stored on the online stacks (thread Stack), while objects are stored on the heap. A local variable may also be a reference to an object, in which case the local variable referenced by the Save object is stored on the thread stack, but the object itself is stored on the heap.

An object may contain methods that may contain local variables that are still stored on the thread stack. Even if the objects to which these methods belong are stored on the heap. A member variable of an object that may be stored on the heap with the owning object, whether the member variable is the original type or a reference type. Static member variables are stored on the heap along with the definition of the class.

The object that is stored on the heap can be accessed by the thread that holds the reference to the object. When a thread can access an object, it can also access the object's member variables. If two threads invoke the same method on the same object at the same time, they will all access the member variables in the method, but each thread has a private copy of the member variable.

Hardware Memory Architecture

The modern hardware memory model is somewhat different from the Java memory model. It is also important to understand how the memory model architecture and the Java memory model work together with it. This section describes the common hardware memory architecture, and the following sections describe how Java memory works with it.

A simple demonstration of the modern computer hardware memory Architecture:

CPU: A modern computer typically consists of two or more CPUs, some of which have multiple cores. From this you can see that it is possible to run multiple threads concurrently on a modern computer with two or more CPUs. It is no problem for each CPU to run one thread at a time. This means that if your Java program is multithreaded, the previous thread in your Java program may execute concurrently (concurrently) on each CPU.
CPU Registers (Register): Each CPU contains a series of registers, which are the basis of memory in the CPU. The CPU performs the operation on the register much faster than the speed performed on main memory. This is because the CPU access register is much faster than main memory.
CPU cache (Cache): Because the computer's storage device has several orders of magnitude gap with the processor's processing device, the modern computer joins a layer of high-speed cache that reads and writes at a speed close to the processor and acts as a cache between the memory and the processor. This is the cache layer in the CPU, in fact, the vast majority of modern CPUs have a certain size of the cache layer. Because the CPU accesses the cache layer faster than the speed of accessing main memory, it is possible to copy the data used in the operation to the cache, so that the operation can be executed quickly, when the operation is finished, and then synchronized from the cache to memory, so that the CPU does not have to wait for slow memory read and write. However, it is usually slower to access the cache than to access the internal registers. Some CPUs also have multilayer caches, but these are not so important to understand how the Java memory model interacts with memory. Just know that you can have a cache layer on the CPU.
Primary (internal) storage: A computer also contains a main memory. All CPUs have access to main memory. Main memory is usually much larger than the cache in the CPU.

How it works: Typically, when a CPU needs to read main memory, it reads the portion of main memory into the CPU cache. It may even read some of the contents of the cache into its internal registers, and then perform operations in the register. When the CPU needs to write the result back into main memory, it flushes the value of the internal register to the cache, and then refreshes the value back to main memory at some point in time.

When the CPU needs to store something in the cache layer, the content that is stored in the cache is usually flushed back to main memory. The CPU cache can write the data locally to its memory at some point, and flush its memory locally at some point. It will no longer read/write the entire cache at some point. Typically, the cache is updated in a smaller block of memory called "cache lines". One or more cache rows may be read to the cache, and one or more cache rows may be flushed back to main memory.

Bridging between the Java memory model and the hardware memory schema

As mentioned above, there is a difference between the Java memory model and the hardware memory architecture. The hardware memory schema does not differentiate between line stacks and heap. For hardware, all line stacks and heaps are distributed in main memory. Partial-line stacks and heaps may sometimes appear in the CPU cache and in registers inside the CPU. As shown in the following:

Abstract relationship of thread and main memory

Java memory model abstract structure diagram:

Shared variables between each thread are stored in main memory, each thread has a private local memory, and local memory is an abstract concept of the Java memory model, not real. It covers caches, write buffers, registers, and other hardware and compiler optimizations that store a copy of the thread's copy of a read or write shared variable in local memory.

From a lower level, the main memory is the memory of the hardware, and in order to get better running speed, the virtual machine and hardware system may make the working memory priority in the register and cache.

The working memory of the threads in the Java memory model (working memories) is an abstract description of the CPU registers and caches. The JVM's static memory storage model (JVM memory model) is only a physical partition of memory, confined to memory and confined to the JVM's memory.

If thread A and thread B in are to communicate, you must go through two steps:

First, thread a flushes the updated shared variables in local memory A to the main memory.
Then thread B goes to main memory to read the shared variable of thread a update, which completes the communication between the two threads.

Therefore, there is a thread-safety problem in a multithreaded environment. For example, we are going to do a count: Thread A reads the variable value 1 in main memory and then saves it to local memory a to accumulate. At this point, thread B does not wait for thread A to write the cumulative result to the main memory and read it, but instead reads it directly into the main memory, and then saves it to the local memory B to accumulate. At this point, the data between the two threads is not visible, and when two threads write the computed results to the main memory at the same time, the result of the calculation is wrong. In this case, we need to take some synchronous means to ensure the accuracy of the program processing results in a concurrent environment.

Eight operations in the case of synchronous means

Lock: A variable that acts on the main memory and identifies a variable as a thread-exclusive state
Unlock (Unlocked): A variable that acts on the main memory, frees a variable that is locked, and the released variable can be locked by another thread
READ: A variable that acts on the main memory, transferring a variable value from main memory to the working memory of the thread for subsequent load actions to use
Load (load): A variable acting on the main memory that puts the value of a read operation from the main memory into a variable copy of the working memory
Use: Acts on a working memory variable, passing a variable value in the working memory to the execution engine, which is performed whenever the virtual opportunity is specified by a byte code that needs to use a variable.
Assing (Assignment): A variable acting on a working memory that assigns a value to the working memory from the execution engine, and performs this operation whenever the virtual opportunity is specified by a byte code assigned to the variable
Store: A variable acting on a working memory that transfers the value of a variable in the working memory to main memory so that subsequent write operations can use
Write: A variable acting on the working memory that transfers the store operation from the value of a variable in the working memory to a variable in the main memory

Synchronization rules

If you want to copy a variable from main memory to working memory, you need to perform the read and load operations sequentially, and if you synchronize the variables from the working memory back to the main memory, execute the store and write operations sequentially. However, the Java memory model only requires that the above operations must be executed sequentially, without guarantee that sequential execution must be performed.
Does not allow one of the read and load, store, and write operations to appear separately
A thread is not allowed to discard its most recent assin operation, that is, the variable must be synchronized to main memory after it has changed in working memory.
A thread is not allowed to synchronize data from the working memory back to main memory without a reason (no assin operation has occurred)
A new variable can only be born in main memory, and it is not allowed to use a variable that is not initialized (load or assin) directly in working memory. That is, you must perform a load or Assin operation before you can implement use and store operations on a variable.
A variable allows only one thread to lock it at the same time, but the lock operation can be repeated multiple times by the same thread, and the variable will be unlocked only after performing the same number of unlock operations, after performing the lock multiple times. So lock and unlock must appear in pairs.
If you perform a lock operation on a variable, the value of this variable in the working memory will be emptied, and the value of the variable initialized by the load or assign operation needs to be re-executed before the execution engine uses the variable.
If a variable is not locked by the lock operation beforehand, it is not allowed to perform a unlock operation on it, nor is it allowed to unlock a variable that is locked by another thread.
Before performing a unlock operation on a variable, you must first synchronize this variable into main memory (perform store and write operations)

Synchronization operations and rules:

Risk and advantage of concurrency

Java Concurrency Basics

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More