Java Theory and Practice: introduction to non-Blocking Algorithms

Source: Internet
Author: User
Java Theory and Practice: non-blocking algorithm introduction-Linux general technology-Linux programming and kernel information. The following is a detailed description. Java™5.0 for the first time, it was possible to develop non-blocking algorithms using the Java language. The java. util. concurrent package fully utilizes this function. Non-Blocking Algorithms are concurrent algorithms. They can safely derive their threads. instead of locking them, they use low-level atomic hardware native forms ?? For example, comparison and exchange. The Design and Implementation of non-blocking algorithms are extremely difficult, but they provide better throughput and provide better defense against survival problems (such as deadlocks and priority inversion. In this phase of Java theory and practice, the concurrency master Brian Goetz demonstrates several simple non-blocking algorithms.

When more than one thread accesses a mutex variable, all threads must use synchronization. Otherwise, some very bad things may occur. The main synchronization method in Java is the synchronized keyword (also known as the internal lock). It enforces mutex and ensures that the thread action of the synchronized block is executed, it can be seen by other threads that later execute synchronized blocks protected by the same lock. When properly used, internal locks can ensure thread security for the program. However, when locking is used to protect short code paths and threads frequently compete for locks, locking may be a very heavy operation.

In the article "Popular atoms", we have studied atomic variables, which provide atomic read-write-modify operations, you can securely update shared variables without using locks. The memory semantics of atomic variables is similar to that of volatile variables, but they can be modified atomically, so they can be used as the basis for concurrent algorithms that do not use locks.

Non-blocking counters

The Counter in Listing 1 is thread-safe, but the performance cost caused by the lock usage has plagued some developers. But the lock is required, because although the increase seems to be a single operation, it is actually a simplification of the three independent operations: to retrieve the value, add 1 to the value, and then write back the value. (Synchronization is also required on the getValue method to ensure that the thread that calls getValue sees the latest value. Although many developers barely convince themselves that the ignore lock requirement is acceptable, it is not a good strategy to ignore the lock requirement .)

When multiple threads request the same lock at the same time, one thread wins and gets the lock, while other threads are blocked. JVM implements blocking by suspending the blocked thread and then rescheduling it later. As a result, context switching may cause a considerable delay compared to the few commands protected by the lock.

List 1. Use a synchronized thread-safe counter


Public final class Counter {
Private long value = 0;

Public synchronized long getValue (){
Return value;
}

Public synchronized long increment (){
Return ++ value;
}
}


NonblockingCounter in Listing 2 shows the simplest non-blocking algorithm: counter using the compareAndSet () (CAS) method of AtomicInteger. The compareAndSet () method stipulates that "this variable is updated to a new value, but if other threads modify its value since I saw this variable last time, the update fails. (For more information about atomic variables and comparison and settings, see popular atoms .)

List 2. Non-Blocking Algorithm Using CAS


Public class NonblockingCounter {
Private AtomicInteger value;

Public int getValue (){
Return value. get ();
}

Public int increment (){
Int v;
Do {
V = value. get ();
While (! Value. compareAndSet (v, v + 1 ));
Return v + 1;
}
}


Atomic variable classes are called atomic because they provide fine-grained atomic updates for numbers and object references, but in the sense of being a basic construction block of a non-blocking algorithm, they are also atomic. Non-blocking algorithms have been the subject of scientific research for more than 20 years, but they have not been possible until Java 5.0 emerged.

Modern processors provide special commands to automatically update shared data and detect interference from other threads. compareAndSet () replaces the lock. (If you only want to increment the counter, AtomicInteger provides the increment method, but these methods are based on compareAndSet (), such as NonblockingCounter. increment ()).

The non-blocking version has several performance advantages over the lock-based version. First, it uses the native form of hardware to replace the JVM lock code path, so as to synchronize at a finer granularity level (independent memory location), and the failed thread can also try again immediately, instead of being suspended and rescheduled. Finer granularity reduces the chance of competition, and the ability to retry without rescheduling reduces the competition cost. Even if a small number of failed CAS operations are performed, this method is much faster than rescheduling due to lock contention.

The NonblockingCounter example may be simpler, but it demonstrates a basic feature of all non-blocking algorithms ?? The execution of some algorithm steps is risky because you may have to redo it if CAS fails. Non-Blocking Algorithms are generally called optimistic algorithms because they do not interfere with the assumption that they continue to operate. If interference is detected, the system will return and try again. In the counter example, the adventure step is incremental ?? It retrieves the old value and adds one to the old value, hoping that the value will not change during calculation and update. If it fails, it will retrieve the value again and redo the incremental calculation.


Non-blocking Stack

The example of a slightly more complex non-blocking algorithm is ConcurrentStack in listing 3. The push () and pop () operations in ConcurrentStack are similar in structure to NonblockingCounter, but the work is somewhat risky. It is hoped that the underlying assumptions will not expire when the work is "submitted. The push () method observes the current top node, constructs a new node, and puts it on the stack. If the top node does not change after the initial observation, then install the new node. If CAS fails, it means that another thread has modified the stack, and the process starts again.

Listing 3. Non-blocking stack using the Treiber Algorithm


Public class ConcurrentStack {
AtomicReference> head = new AtomicReference> ();

Public void push (E item ){
Node newHead = new Node (item );
Node oldHead;
Do {
OldHead = head. get ();
NewHead. next = oldHead;
} While (! Head. compareAndSet (oldHead, newHead ));
}

Public E pop (){
Node oldHead;
Node newHead;
Do {
OldHead = head. get ();
If (oldHead = null)
Return null;
NewHead = oldHead. next;
} While (! Head. compareAndSet (oldHead, newHead ));
Return oldHead. item;
}

Static class Node {
Final E item;
Node next;

Public Node (E item) {this. item = item ;}
}
}


Performance Considerations

In the case of mild to moderate contention, the performance of the non-blocking algorithm will surpass that of the blocking algorithm, because CAS is successful most of the time in the first attempt, the overhead for contention does not involve thread suspension or context switching. It only involves several loop iterations. CAS with no contention is much cheaper than those with no contention locks (this sentence must be true, because no contention locks involve CAS plus additional processing ), however, the CAS competition involves a shorter latency than the competition for Lock acquisition.

In high contention (that is, when multiple threads constantly compete for a memory location), the lock-based algorithm starts to provide better throughput than the non-blocking algorithm, because when the thread is blocked, it will stop competition and wait patiently for its turn to avoid further competition. However, such a high degree of contention is not common, because most of the time, the thread will separate the local computing of the thread from the operation of competing to share data, thus giving other threads the opportunity to share data. (Such a high degree of competition also indicates that you need to re-examine the algorithm and work toward less data sharing .) The figure in the "Popular Atom" is a bit confusing in this regard, because the competition in the tested program is extremely intensive, and it seems that even for a few threads, locking is also a better solution.

Non-blocking linked list

The examples (counters and stacks) so far are very simple non-blocking algorithms. Once CAS is used in a loop, they can be easily imitated. For more complex data structures, non-blocking algorithms are much more complex than these simple examples, because modifying a linked list, tree, or hash table may involve updating multiple pointers. CAS supports atomic condition update for a single pointer, but does not support more than two pointers. Therefore, to build a non-blocking linked list, tree, or hash table, you need to find a way to use CAS to update multiple pointers without putting the data structure in an inconsistent state.

Inserting an element at the end of the linked list usually involves updating two pointers: the "tail" pointer always points to the last element in the list, the "Next" Pointer Points from the last element in the past to the newly inserted element. Because two pointers need to be updated, two CAS are required. Updating two pointers in an independent CAS poses two potential issues that need to be considered: if the first CAS succeeds while the second CAS fails, what will happen? What happens if other threads attempt to access the linked list between the first and second CAS?

For non-complex data structures, the "trick" for Building non-blocking algorithms is to ensure that the data structure is always in the same State (even between the thread's start to modify the data structure and its completion ), make sure that other threads can not only judge whether the first thread has completed the update, but also determine what operations are required to complete the update if the first thread is going to AWOL. If the thread finds the data structure in the process of updating, it can "help" the thread executing the update to complete the update and then perform its own operations. When the first thread comes back and tries to complete its own update, it will find that it is no longer needed and the return is OK, because CAS will detect the intervention of the help thread (in this case, is a constructive intervention ).

This requirement for "helping neighbors" is required to protect the data structure from the failure of a single thread. If the thread finds that the data structure is in the middle of being updated by other threads, and then waits for other threads to complete the update, if other threads fail in the middle of the operation, the thread may wait forever. This method provides poor performance even if no fault occurs, because the new thread must discard the processor, resulting in context switching, or wait until its time slice expires (which is worse ).

The queue in Listing 4 shows the insert operation of the Michael-Scott non-blocking queue algorithm, which is implemented by concurrent1_queue:

Listing 4. Insert in Michael Scott's non-blocking queue algorithm


Public class upload queue {
Private static class Node {
Final E item;
Final AtomicReference> next;

Node (E item, Node next ){
This. item = item;
This. next = new AtomicReference> (next );
}
}

Private AtomicReference> head
= New AtomicReference> (new Node (null, null ));
Private AtomicReference> tail = head;

Public boolean put (E item ){
Node newNode = new Node (item, null );
While (true ){
Node curTail = tail. get ();
Node residue = curTail. next. get ();
If (curTail = tail. get ()){
If (residue = null)/* */{
If (curTail. next. compareAndSet (null, newNode)/* C */{
Tail. compareAndSet (curTail, newNode)/* D */;
Return true;
}
} Else {
Tail. compareAndSet (curTail, residue)/* B */;
}
}
}
}
}


Like many queuing algorithms, an empty queue only contains one false node. The header pointer always points to a false node, and the tail pointer always points to the last or last node. Figure 1 shows a queue with two elements under normal conditions:

Figure 1. Queue with two elements in the static state


As shown in Listing 4, inserting an element involves two pointer updates, both of which are performed through CAS: link from the current last node (C) of the queue to the new node, and move the tail pointer to the new last node (D ). If the first step fails, the status of the queue remains unchanged, and the inserted thread continues to retry until the queue is successful. Once the operation is successful, the insert operation takes effect and other threads can see the modification. You also need to move the tail pointer to the position of the new node, but this job can be regarded as a "cleaning job", because any thread in this situation can determine whether such cleaning is required, also know how to clean up.

The queue is always in either of the two states: normal (or static, Fig 1 and Fig 3) or intermediate (Fig 2 ). The queue is in the static state before the insert operation and after the second CAS (D) is successful. After the first CAS (C) is successful, the queue is in the intermediate state. In the static state, the next field of the link node pointed to by the tail pointer is always null, while in the intermediate state, this field is not null. By comparing whether tail. next is null, any thread can determine the queue status, which is the key to allowing the thread to "complete" operations on other threads.

Figure 2. queue in the middle of insertion, after the new element is inserted, before the last pointer is updated


Before inserting A new element (A), check whether the queue is in the intermediate state, as shown in Listing 4. If it is in the intermediate state, there must be other threads that are already in the middle of element insertion, between steps (C) and (D. Without waiting for other threads to complete, the current thread can "help" it to complete the operation and move the tail Pointer Forward (B ). If necessary, it will continue to check the tail pointer and move the pointer forward until the queue is static, then it can start its own insertion.

The first CAS (C) may fail because two threads compete to access the current last element of the queue; in this case, no modification is made, threads that lose CAS reload the tail pointer and try again. If the second CAS (D) fails, the insert thread does not need to retry ?? Because other threads have completed this operation for it in step (B!

Figure 3. After the tail pointer is updated, the queue is in the static state again.


Behind-the-scenes non-Blocking Algorithms

If you go deep into JVM and the operating system, you will find that non-blocking algorithms are everywhere. The garbage collector uses non-blocking algorithms to accelerate concurrent and parallel garbage collection. The scheduler uses non-blocking algorithms to effectively schedule threads and processes and implement internal locks. In Mustang (Java 6.0), the lock-based SynchronousQueue algorithm is replaced by a new non-blocking version. Few developers use SynchronousQueue directly, but the thread pool built through the Executors. newCachedThreadPool () factory uses it as a working queue. The Performance Comparison Test of the cache thread pool shows that the new non-blocking synchronization queue implementation provides almost three times the current implementation speed. Further improvements have been planned in subsequent Mustang versions (the code name is doldolphin.

Conclusion

Non-Blocking Algorithms are much more complex than lock-based algorithms. Developing non-blocking algorithms is quite professional training, and it is extremely difficult to prove that the algorithms are correct. However, many improvements to the concurrency performance between Java versions come from the adoption of non-blocking algorithms. As concurrency performance becomes more and more important, it is foreseeable that in the future release of the Java platform, more non-blocking algorithms are used.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.