Non-blocking solution to Java concurrency Problems

Source: Internet
Author: User

In a concurrent environment, an explicit lock mechanism (such as synchronized or reentrantlock) is usually used for shared resources to ensure that only one thread accesses these variables at any time, the modifications to these variables are visible to the thread that subsequently acquires the lock. Threads that cannot obtain the lock will enter the blocking status and be suspended by JVM and the operating system. In the future, they will be scheduled to re-obtain the lock, the suspension and restoration of threads may cause a lot of system consumption and long-time interruptions.

Thread switching will also cause context switching. That is, the context of the current thread is saved and loaded into the context of the new thread. Therefore, context switching is not free, in addition, the data required by the new thread to be swapped in is unlikely to be in the CPU cache. Therefore, context switching will cause a sharp increase in cache missing. When the new thread starts to execute, the performance will be relatively low.

In highly concurrent and highly competitive scenarios, the overhead of explicit locks will be very large, which will seriously affect the system performance. Therefore, in some scenarios, using a non-blocking solution to solve the concurrency problem will significantly improve the system performance.

Volatile

The volatile variable is a synchronization mechanism that is lighter than the lock, because it does not generate context switching and thread scheduling. The volatile variable ensures visibility. After a variable is modified by a thread, other threads will read the latest value. However, the volatile variable also has its own restrictions: it does not support atomic operations, when the update of a variable depends on another variable or itself (such as I ++), volatile cannot guarantee the correctness of the result.

Recently, the memcache client needs to be encapsulated in the project to implement hot standby and automatic failover. Here, a variable currentcluster is set to indicate whether the current cluster is the master cluster or the slave cluster. The business request determines the cluster with priority access based on this variable, at the same time, one daemon thread keeps polling two clusters. Once a machine is found unavailable, it immediately switches to change the value of the variable currentcluster. Therefore, the currentcluster variable is read and written by multiple threads and frequently accessed. However, the modification of the currentcluster variable does not depend on other variables or itself. You only need to ensure the visibility, applicable to volatile variables:

 

Because the access speed of the primary storage is slower than the processing speed of the CPU, the CPU uses the cache to reduce the memory delay cost. The compiler and the CPU itself will optimize the commands and change the command execution sequence, the process increases the CPU cache hit rate and the execution efficiency of the CPU pipeline without changing the final result. When data is immutable or restricted within the same thread range, CPU cache and command rescheduling are harmless. However, if the CPU is in a multi-core processor and concurrent accesses share a mutable state, data in different cache caches may be inconsistent, and memory operations in the shared variable state will be re-sorted. These optimizations will causeProgramThe behavior is not fixed, resulting in invisible shared variables. Adding the volatile keyword to the Java program can effectively solve these problems.

The write operation on shared variables modified by the volatile keyword in C Language triggers two things:

1. Write the data in the cache row of the current processor back to the system memory.

2. This write-back memory operation will cause invalid data cached with this memory address in other CPUs.

JVM enhances the semantics of the volatile keyword in Java. when accessing variables, it adds a memory barrier so that the front and back commands are not rearranged. Therefore, the volatile keyword in Java ensures visibility, that is, after the shared variable is modified, other threads can immediately read the latest value.

The volatile variable will invalidate the data in cpucache when it is modified, and the minimum Execution Unit of the CPU cache is cache line. Therefore, the data in the entire cache line containing the volatile variable will be invalid. Here, you need to pay attention to the "pseudo-sharing" problem. If the volatile variable length does not exceed the cache line, padding is required between adjacent variables; otherwise, a large number of cache missing will be generated. The CPU details are not discussed here. If you are interested, read the article published by zhenhui in the April Rigel technology monthly.ArticleOptimization to CPU-Java and CPU Cache

Atomic operation

The volatile keyword mentioned in the previous section can only ensure visibility. If the modification of a variable depends on another variable or itself, volatile cannot do anything about it. In this case, Atomicity is required, that is, all operations are inseparable, it will not be interrupted by other threads. In modern multi-core processors, atomic commands are provided, such as CAS (compare and swap). The command has three operands: memory address V to be operated, the expected original value oldvalue, the new value newvalue to be written. When the CAS command is used to perform the update operation, if the value on V is the same as that on oldvalue, the value on the atomic V is updated to newvalue. If the current thread reads the oldvalue, if other threads perform the update operation, the CAS command of the current thread fails to return.

When multiple threads attempt to use the CAS command to update the same shared variable at the same time, a thread will successfully update the variable, while other threads will fail. Because there should be no lock-related operation here, the failed threads will not be suspended or blocked. They are only notified that the update operation failed and can be retried or other things can be done. The following exampleCodeIt is a non-blocking implementation of a counter. During the increment process, CAS is constantly used to update the variable until it is successful. The Atomic implementation in the Java. util. Concurrent package also adopts a similar mechanism.

 

 

CAS commands can be used to implement atomic operations of simple data types (such as atomicinteger and atomiclong in Java. util. Concurrent), as well as atomic operations of complex data types. Implement non-blocking of complex data typesAlgorithmThe key lies in how to limit the scope of atomic update to a simple variable while maintaining data consistency. For example, in a stack, each element node (value, next) points to only one other element, and each element is directed to only one other element. For the push method, a new node is created pointing to the top element of the stack, and Java. util. Concurrent. Atomic. atomicreference is used.
If the top element is not modified by other threads, the top element is successfully replaced. Otherwise, the top element is retrieved again and the top element is successfully replaced.

 

Spin lock

When the program needs to ensure the consistency of multiple resources or variables and the update range cannot be limited to one variable, an explicit lock mechanism must be used, such as the synchronized keyword or reentrantlock, as described above, because blocking occurs, this explicit lock mechanism has a high overhead, especially in high concurrency scenarios. Here we introduce a non-blocking explicit lock mechanism-spin lock.

The spin lock uses the atomicboolean class provided by the java. util. concurret package to indicate the lock status. False indicates that no other thread obtains the current lock. True indicates that the current lock has been obtained by other threads. When multiple threads access the lock () method at the same time and attempt to obtain the lock, only one thread can succeed, and other threads will stay in the while (state. in the get () {} loop, only when the active thread calls the unlock () method to release the lock will another thread jump out of the while (state. get () {} loop, because the unlock method sets the state to false. The thread that does not obtain the lock is not blocked, so there is no congestion-related overhead.

After a thread successfully acquires the lock, all the inactive threads will keep repeating and the competition will be fierce, resulting in a waste of CPU resources. Therefore, the concession lock mechanism can be introduced to reduce CPU overhead.

When the thread fails to obtain the lock, it will call the Backoff () method to sleep for a period of time to avoid multiple threads loop at the same time, and the recovery time of each thread is different, reducing competition, reduces CPU overhead. Here, the sleep length setting of the concession lock is very critical. If it is too short, the effect is not obvious. If it is too long, the system throughput will be reduced. It is reasonable to set the synchronization block based on the expected running duration.

Summary

In practical application scenarios, it is necessary to use a non-blocking solution to solve the concurrency problem to avoid congestion overhead. When the update range can be limited to a variable, you can use the volatile keyword or atomic operation. If you need to ensure the consistency of multiple resources or variables, you can consider the spin lock. However, for scenarios where the execution time of synchronization blocks is long or the execution time difference is large, it is not suitable for the use of spin locks, because it is difficult to avoid excessive CPU overhead, so in this scenario, you may wish to directly use the synchronized keyword or reentrantlock.

In fact, both the synchronized keyword and reentrantlock implement spin locks to varying degrees. At the beginning of competition, will first try to spin, if you can get the lock, directly return, otherwise enter the blocking status. However, the spin duration here is not controllable. If it is determined that the synchronization block will execute faster (generally there is no IO or complex computing), it will be better to directly use the spin lock. For the synchronized keyword and the internal implementation principle of reentrantlock, I will write a special article to discuss it in detail later.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.