Directory
- 1. Preface
- 2 Queue Internal structure
- 2.1 Definition of the node
- 2.2 Why the next pointer inside the node needs an atomic update
- 2.3 Internal member variables for queues
- 3. Building a lock-free concurrent queue based on CAS algorithm and unidirectional linked list
- 3.1 Team-out method
- 3.2 Queue method
- 4. Performance Testing
- 5. Summary
1. Preface
We build our own lock-free queue based on CAS algorithm, and the bottom-up is a doubly linked list without sentinel node in our own building of the lock-free concurrent containers (stacks and queues). A doubly linked list retains a reference to the predecessor node for the current node, which is sometimes useful, such as when the Reentrentlock thread is awakened, the prev pointer is used to find the precursor node and determines whether to acquire the lock by judging whether it is a head node. In most cases, however, we only need the queue to provide basic queuing and out-of-band functionality, and the implementation of a doubly linked list is undoubtedly complicating the problem. At the same time, due to the increase in the queue and the process of unnecessary pointer operation, to a certain extent also affect its performance. From the last one, it can be seen from the performance test of different queues that the non-lock queue based on the bidirectional linked list is not satisfactory in the concurrency environment, and is not only the concurrentlinkedqueue difference between the same CAs. Even Linkedblockingqueue, which is based on exclusive lock implementations, is a bit awkward. From the standpoint of building more efficient queues, the doubly linked list is not the optimal choice, although it is useful in special cases, and a better approach is to use a one-way list. The following is a discussion of how to implement a lock-free concurrent queue based on the CAS algorithm and the unidirectional linked list. Complete code See github:beautiful-concurrent.
2 definition of the internal structure of the queue 2.1 nodes
Before we do the formal coding, let's think about what structure the queue needs to be implemented based on the one-way list. The first thing to think about is whether the list has sentinel nodes. Sentinel nodes can help us to eliminate certain boundary conditions and facilitate the unification of programming models. Although the Sentinel is not used in the implementation of a queue based on a doubly linked list, this is done by selecting a sentinel node, where the sentinel node and normal nodes in the list are represented by the same data structure, as shown in the following:
Linked list node private static class Node<e> {//unsafe object for CAS operation private static final Sun.misc.Unsafe UNSAFE; The offset of the next pointer field in the Node object private static final long nextoffset; static {try {//class load-time execution, reflection creates an unsafe object, we want to update the next pointer in the//node object as a CAS through the object class<?> unsafeclass = Unsafe.class; Field f = Unsafeclass.getdeclaredfield ("Theunsafe"); F.setaccessible (TRUE); unsafe = (unsafe) f.get (null);//unsafe = Sun.misc.Unsafe.getUnsafe (); class<?> k = LockFreeSingleLinkedQueue.Node.class; Nextoffset = Unsafe.objectfieldoffset (K.getdeclaredfield ("Next")); } catch (Exception e) {throw new Error (e); }}//cas mode update next pointer, expect:cmp update:val private boolean casnext (node<e> cmp, node<e> VA L) {return UnsaFe.compareandswapobject (this, Nextoffset, CMP, Val); }//The element that is actually stored public E item; Pointer to next node public volatile node<e> next; Public Node (E item) {This.item = Item; } @Override Public String toString () {return "node{" + "item=" + Item + '}'; } }
The definition of the node seems to be many, its internal main members of two, the generic element of item, which is stored in the node is really useful information, followed by a pointer to the next node of the second. The rest of the content is primarily to ensure that the atom updates the next pointer. Why does the next pointer in the node that points to the next node need to be updated by the atom?
2.2 Why the next pointer inside the node needs an atomic update
When a queue is implemented based on a doubly linked list, the next pointer defined in the node node does not need to be updated by the atom, which is determined by its special structure and special insertion method. Recall the process of its queue:
- 1. The prev pointer to the new node points to the original tail node
- 2.CAS method Update tail pointer to new node
- 3. The next pointer to the original tail node points to the new node
1 is used primarily to determine the order of nodes in the queue, 2 to determine whether the current thread's insertion is successful, and once the 2 succeeds, the new node is inserted successfully. We can think of the advantages of using the Prev pointer of the new node to determine the order of the queue: Each node is to maintain the order in the queue before joining the queue, and its prev pointer points to the original tail node. Because Prev pointers are available for each node, the concurrency situation does not overlap if the prev pointers of multiple nodes point to the same tail node, and they do not affect each other. That is, before 2 is really executed, all nodes that successfully execute 1 are equal and enjoy the right of 2 CAs to compete successfully in the queue. As shown in the following:
Although the figure is that thread 2CAS competes successfully to queue its nodes, it is similar if thread 1 or thread 3CAS competes successfully. What kind of a queue is that single-linked list? The single-linked list is relatively concise, with only two steps:
- 1. The next pointer to the original tail node points to the new node
- 2.CAS update tail pointer points to new node
1 is used to determine the order of the new nodes in the queue, and the two-way linked list through their own prev pointers to achieve the difference, one-way linked list by modifying the original tail node of the next pointer to its own to achieve the purpose. This causes the result to overwrite the problem in the concurrency environment, and the next thread that executes 1 overwrites the result of the previous thread's execution 1. This means that the new node will successfully join the queue, not only requiring its thread to compete successfully with CAS, but also ensuring that no other thread will overwrite its 1 results, which is difficult to guarantee in a concurrency environment, such as:
Thread 1 before successfully competing for CAS, thread 2 changes the tail node's next from the node pointing to thread 1 to the node of thread 2, even if thread 1 performs the CAS successfully, the structure of the queue has been destroyed. The best way to do this is to determine if the next pointer of the tail node is pointing to another node before modifying it, so the modification of the next pointer must be atomic. Of course the easiest way to do this is to use atomicreference as an atomic reference, but as pointed out in section 3rd of the concurrent container (stack and queue), which is built without locks, the atomic reference can logically be a special reference, but it is essentially an object. Node nodes are created in a large number of queues, and their internal atomicreference objects will be created in large numbers, which will undoubtedly affect performance in a highly concurrent environment. Do you really need to create an object each time you update a member variable for an atom? It's not really necessary. All of the atomic variables, the CAS algorithm underlying is implemented by the local method provided in the Sun.misc.Unsafe class, although it is designed to not be used directly by the developer, but through reflection we can still easily bypass this layer of restrictions. In the definition of the node class, a static block of code obtains the unsafe object in reflection when the class is loaded, and calculates the relative offset of the next pointer field in the node object that is to be atomically updated, which enables the atomic update of the next pointer to the node. This part of the logic is defined in the Casnext method, as follows:
private boolean casNext(Node<E> cmp, Node<E> val) { return UNSAFE.compareAndSwapObject(this, nextOffset, cmp, val); }
2.3 Internal member variables for queues
The member variables of the queue are defined as follows
//不带任何信息的哨兵结点Node<E> sentinel = new Node<>(null);//头指针指向哨兵结点Node<E> head = sentinel;//尾指针,原子引用AtomicReference<Node<E>> tail = new AtomicReference<>(sentinel);
The sentinel node that Sentinel represents is initialized when the queue is created. Head pointer head from beginning to end point to the Sentinel node, because it will not be modified, it does not need to use volatile modification. The tail pointer uses atomicreference to ensure that the atom is modified.
3. Building a lock-free concurrent Queue 3.1 Outbound method based on CAS algorithm and unidirectional linked list
/** * 将队列首元素从队列中移除并返回该元素,若队列为空则返回null * * @return */@Overridepublic E dequeue() { for (; ; ) { //队列首元素结点(哨兵结点之后的结点) Node<E> headed = head.next; //队列尾结点 Node<E> tailed = tail.get(); if (headed == null) { //队列为空,返回null return null; } else if (headed == tailed) { //队列中只含一个元素 //CAS方式修改哨兵结点的next指针指向null if (sentinel.casNext(headed, null)) { //cas方式修改尾指针指向哨兵结点,失败也没关系 tail.compareAndSet(tailed, sentinel); return headed.item; } } else if (sentinel.casNext(headed, headed.next)) { //队列中元素个数大于1 return headed.item; } }}
The logic of the team is relatively simple and only needs to be discussed in three different situations.
- 1. The queue is empty and returns null directly
- 2. The number of element nodes in the queue (excluding Sentinel nodes) is greater than 1, only the next pointer to the first element node of the original queue needs to be modified by the next point of the Sentinel node.
- 3. There are only 1 element nodes in the queue (not including Sentinel nodes) and the next pointer of the tail pointer and sentinel node (the head pointer head always points to the Sentinel node, the first element node of the queue is pointed to by the Sentinel node's next pointer).
This is divided into two steps, first the CAS update sentinel node's next pointer to null, if the execution succeeds and then the CAS modify the tail pointer point to sentinel node. Of course, the second part of the operation that modifies the tail pointer may fail, or it may lose CPU execution before it is executed, causing the queue to be in a state of confusion. For example, the following scenario:
Only consider the team situation, the team usually only need to modify the sentinel node next pointer, only if the queue contains elements of the number of nodes is exactly 1 o'clock need to update the tail pointer, even if the operation is not executed or failed to execute, as shown in, as long as the Sentinel node's next pointer is null, can already indicate that the queue is empty , the subsequent outbound operation will return NULL. So just in terms of the team situation, whether the tail pointer is updated successfully has no effect on the result of the team. In the case of a queue, it is possible to correctly determine that the queues are in this intermediate state and fix them before the new node is enqueued.
3.2 Queue method
/** The element to the end of the queue * @param e element to be queued * @return true: Queued success false: Queue failed */@Overridepublic Boolean enqueue (e) {/ node<e> NewNode = new node<> (E); Dead loop, guaranteed queue for (;;) {//Current tail node node<e> tailed = Tail.get (); The next node of the current tail node node<e> tailednext = Tailed.next; Determine if the queue is in an intermediate state caused by the out of team operation if (Sentinel.next = = null && tailed! = Sentinel) {//cas Way to point the tail pointer to the Sentinel node, failed Relationship Tail.compareandset (tailed, Sentinel); The else if (tailed = = Tail.get ()) {///tail pointer has not changed, that is, no other thread is inserting the node into the queue//other threads are being queued, and the queue is in an intermediate state if (tailednext ! = NULL) {//To complete the operation of updating the tail pointer for other threads Tail.compareandset (tailed, Tailednext); } else if (Tailed.casnext (null, NewNode)) {//No other thread is executing the queue, the next pointer of the CAS update tail node points to the new node//cas update the tail pointer, it's okay to fail. Tail.compareandset (tailed, NewNode); return true; } } }}
To "record" the status of the current queue before formally performing a queued operation: Gets the next node Tailednext (possibly null) for the tail node of the current queue, tailed, and the tail node. This is a common process for CAS algorithms, after all, the success of CAS update is based on the premise that no other thread has modified the state. There is, however, a second layer of meaning that allows the thread to discover whether the current queue is in the middle of a node, where a thread has modified the tail node's next pointer to the new node, but has not yet been able to update the tail pointer. The current thread can perform a cas update tail pointer operation for another thread if it finds that the queue is in such an intermediate state. Before doing so, however, you must also ensure that the queue is not in the middle state caused by the outbound operation at this time: The queue is empty, but the tail pointer does not point to the Sentinel. Therefore, we need to pass
if (sentinel.next == null && tailed != sentinel)
And, if necessary, fix the status of the queue. After that, by
if (tailed == tail.get())
To determine if another thread has already completed the queue operation, then re-execute the For loop. Otherwise the queue must be in one of the following two situations at this time:
- 1. The next node of the recorded tail node (tailednext) exists
Note that at this time the thread is in the queue operation, but just finished the first step, before the time to modify the tail pointer, then we help it to complete the rest of the work, of course, this step failure is OK, it may be completed by itself or other Lei Feng beat us a step.
- 2 The next node of the recorded tail node (tailednext) does not exist
At this point, the next pointer to the CAS update tail node points to the new node, the CAS competition fails to indicate that there are other threads that are quick to execute a for loop, or the CAs method updates the tail pointer, which of course may fail, but it doesn't matter, I'm for everyone, everyone for me, There must be other threads that helped us do the work.
4. Performance Testing
In order to enlarge the difference of the test results, this time we open 200 threads, each thread mix 10,000 times the queue and the queue operation, the above process repeats 100 times the average time of execution (in milliseconds), the complete test code see github:beautiful-concurrent. The test results are as shown
From the test results, we know that the non-lock queue lockfreesinglelinkedqueue based on unidirectional linked list has achieved good results in second place. Although there is still a big gap with Concurrentlinkedqueue, which is also built with CAs, it is much better than linkedblockingqueue, which is thread-safe based on locks. Of course, it is also better than the non-lock queue based on the bidirectional linked list.
5. Summary
The CAS algorithm reduces the scope of synchronization between multiple threads to a single variable, maximizing the degree of parallelism of program execution, and to some extent, the scalability of this parallelization is the greatest advantage of CAS algorithm. However, it is also to be seen that when modifying a data structure involves multiple variables, the CAS algorithm cannot synchronize multiple variables at the same time, which may cause the thread to modify the data structure in some intermediate state, which may cause other threads to fail to continue execution. At this time, the collaboration between threads can be implemented based on the CAS algorithm. When a thread discovers that the data structure is in an intermediate state (meaning that other threads are modifying it, but may lose CPU execution without modifying it), the intermediate state can be repaired (performing its unfinished operations for other threads), which consumes the CPU compared to a white spin. This interaction clearly improves performance, and the implementation of the previous queue method is a typical example of this CAS-based collaboration, which also relies on the characteristics of the CAS operation itself: it meets expectations to succeed and execution fails without side effects. Understanding this, we can better use the CAS algorithm to build an efficient lock-free data structure.