. Implementation and analysis of parallel programming -2.concurrentstack in net

Source: Internet
Author: User
Tags cas

In the previous article, parallel programming in. NET-1. Basic knowledge, which is the basic knowledge needed in. NET for multicore or parallel programming, analyzes the implementation of the lock-free stack in a relatively simple and commonly used concurrency data structure--. NET class library in the basic knowledge tree today.

First explain what the "no lock" concept here is.

The so-called unlocked is actually in the ordinary stack implementation of the use of atomic operations, the principle of atomic operation is the CPU on the system bus set a signal, when other threads to the same block of memory access when the CPU monitoring the signal exists, and then the current thread will wait for the signal to be released before memory access. Atomic operations are implemented by the operating system API to support the underlying hardware, commonly used operations are: atomic increment, atomic decrement, compare exchange, the implementation of Concurrentstack is using the atomic operation of the comparison exchange operation.

Benefits of using atomic operations:

First, because the lock is not used, you can avoid deadlocks.

Second, atomic operations do not block threads, such as when a command is executed when a thread is suspended (or a context switch is performed), other threads can continue to operate, and if a lock lock is used, the current thread is suspended because no lock is released, and other threads are blocked when they do so.

Third, because the atomic operation is directly supported by the hardware instruction, the atomic operation performance is higher than the normal lock.

Disadvantages of using atomic operations:

First, the use of atomic operations typically fails with fallback technology to retry the current operation, so it is easy to create live lock and thread starvation, but can be mitigated by techniques such as random backoff, but not eliminated.

Second, the programmer is difficult to develop and use, the test is more difficult.

Here's how to get started:

Because of the Concurrentstack code in. NET is more so this article does not post all the code, I also only analyze the author thinks important parts, all the source code can go to the following Microsoft Official website view

http://referencesource.microsoft.com/#mscorlib/system/collections/concurrent/concurrentstack.cs

The traditional stack structure is usually implemented using a single-linked list (the stack in. NET uses an array), and the stack operation is to replace the head node with the new node, and the stack operation is to point the head to the next node. So when a large number of threads are concurrently accessed, the thread's race condition is at the head node, that is, if we can make sure it is safe for the head node operation, then the entire stack is safe.

into the stack operation public void Push (T item)
 Public voidPush (T item) {Node NewNode=NewNode (item); Newnode.m_next=M_head; if(Interlocked.compareexchange (refM_head, NewNode, newnode.m_next) = =Newnode.m_next) {                return; }            //If We failed, go to the slow path and loop around until we succeed.Pushcore (NewNode, NewNode); }        Private voidPushcore (node Head, node tail) {SpinWait spin=NewSpinWait (); //Keep trying to CAS the exising head with the new node until we succeed.             Do{spin.                SpinOnce (); //reread the head and link our new node.Tail.m_next =M_head; }             while(Interlocked.compareexchange (refM_head, head, tail.m_next)! =tail.m_next); }

(In the original note we can see that the stack uses the Compare Exchange (CAS) operation in atomic operations)

The stack is divided into three steps:

A. When the data is in the stack, a new node is assigned, and then the head node in the current memory at the moment is the next node of the new node, and Newnode.m_next = M_head holds a snapshot of the current head node, which means that another thread may have changed the node that the M_head points to. Note the volatile keyword was used earlier in the field declaration of the head node (m_head), and we know that the volatile keyword has two effects: the first is to disallow the compiler and CPU from changing the location of the field, and the second is to force the cache to flush the CPU. When reading the field that declares the keyword, each time it goes to the memory to reload the data and then reads the CPU cache instead of the older data in the CPU cache, this place m_head uses volatile because when the CPU thread running at the other core changes the M_head value, And our current core of the CPU cache is not updated in a timely manner, and there is a stack to prevent M_head operation statement after moving to other statements resulting in logic code does not follow the logic, such as Newnode.m_next = M_ The head operation is placed after the IF statement resulting in a logic error or a logic error that occurs when the CPU's instruction is executed in a random order.

B. Compare whether the current head node is the same as the snapshot header node of the newnode.m_next that we saved, and if it is the same, replace the new node with the head node or the comparison fails with the C step.

Interlocked.compareexchange (ref m_head, NewNode, Newnode.m_next) is actually equivalent to the following code, except that the execution of the code is performed atomically.

 if  (m_head == Newnode.m_next)
 {M_head  = Newnode.m_next; 
  " else   return   M_head;}  
 

C. If the B-step fails to enter Pushcore (node Head, node tail), the Pushcore step is to repeat the Interlocked.compareexchange in step B (ref m_head, NewNode, Newnode.m_next) operation until a new node is written, spin is used during the loop. SpinOnce (), the use of this API is to prevent live lock, Exchange failed thread will be back, like two people walk, it may happen: you go to the left he also left, you right he also to the right, so in order to avoid the two people meet together, so that a person can stop in situ, And then continue to walk, when continue to find the direction is still the same when you can change the stop time, such as stop 5 seconds, and then if the same direction is stopped 10 seconds, and then the same direction can go to drink a cup of tea slowly again, in fact, this idea is a random concession. Of course, the spinonce implementation of the computer is not as simple as it is imagined, and when it comes to the implementation of synchronous primitives Spinwati I will detail how the Spinonce method is implemented, and this place is understood to allow the thread to rest for a while (when analyzing spinonce, a small detail is found Thread,sleep (0) and Thread.Sleep (1), in fact, I found a lot of people do not know the difference between 0 and 1, here to illustrate: thread,sleep (0) means to hand over the current thread of the time slice, Let the thread with the same or higher priority run or continue running the current thread, that is, if no thread of the same level or higher is waiting to run then the current thread is running, this place will cause a thread starvation problem. Thread,sleep (1) indicates that the thread sleeps 1 milliseconds and other thread priorities do not matter, in fact, it is set to 1 o'clock is not sleep 1 milliseconds but 13 milliseconds or longer, specific and system clock cycle related.

Stack operation public bool Trypop (out T result)

The stack operation is also used by the CAS operation, the head node is constantly pointing to the next node and then return to the head node, if the exchange failed to loop, the cycle is also using the random backoff technique to less live lock the probability that the time will only yield to the concession with the number of times to increase. The API design for the stack operation is Trypop () return value bool, because it is possible for a thread to be out of the stack at the same time when multiple threads are out, so the stack may fail.

Bulk stack operations public void PushRange (t[] items, int startIndex, int count)

Bulk-on-stack is implemented on a single-stack basis, when multiple items are loaded into a stack, and then the head node uses the CAS operation to point to the generated stack, so the efficiency of bulk call into the stack is more efficient than a single call. Digression: Here is a small detail to write code, in the bulk into the stack method will first call Validatepushpoprangeinput (items, StartIndex, count) method to verify the correctness of the passed parameters, In fact, the code is the most important principle of coding is called "Operating room Principle", which is interpreted as the doctor into the operating room, the gloves, the body has been disinfected, these preparations have been completed, the rest of the job is the doctor concentrate on completing the operation. This coding principle not only improves the cleanliness of the code, but also reduces the number of CPU branch pre-judgment to improve the speed of the code, so in daily coding we can use this method to abstract a large number of parameters into a separate method.

Determines whether an empty IsEmpty property

The implementation of this operation is relatively simple, as long as the head node is determined to be empty. In Microsoft's documentation we found that we should use this property instead of using the Count = = 0 when we determine if the element in the stack is empty, because the count counts the number of elements in the stack, each time it is used, the entire stack is traversed, the complexity is O (N) and IsEmpty is O (1), So the use of count==0 efficiency compared to the bottom, especially in the case of large data volume.

Implementation of the Ienumerable<t> interface member GetEnumerator ()

The implementation of this method is to get the head node and then loop through the stack, note that the method has only the current moment the entire stack of snapshots, during the traversal of the elements in the stack or decrease the number of GetEnumerator () return value will not change.

Other questions

1. There is no need to consider ABA when writing unlocked code in. NET, as this is guaranteed by. NET garbage collection, unless the object pooling technique is used, for example, the operation of an internally assigned node is handled by the object pool.

2. In the. NET source code, the stack<t> internal use of the array implementation, and concurretstack internal use of the linked list, the main reason is that stack in the use of array expansion will have the cost of copying data, Especially in the case of large data volume, this performance loss is still relatively large, there is another reason is that the internal use of the list can avoid the ABA problem (if the allocation of internal nodes is not using the object pool), but the implementation of the list is not without shortcomings, such as the stack when we assign a new node, And this node will be collected by GC after the end of the stack, this node becomes the garbage node at this time, but in the implementation of the Concurretqueue because the queue of advanced features using another solution-linked list + array way, This method solves the problem of the garbage node and solves the performance cost of the array expansion replication data.

Finally, in the process of reading. NET source code, we can actually find a lot of very classic coding techniques and coding style, let us look at the code from top to bottom like a flowing one go can, this is also my preferred code style-write your own code like a poem, let others read your code like poetry.

It's time to get here, and in the next article I'll continue to analyze another classic concurrency data structure concurrentqueue implementation in. NET.

Due to the limited ability of the author, there are errors in the analysis of the place will inevitably occur, welcome to correct them.

. Implementation and analysis of parallel programming -2.concurrentstack in net

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.