Parallel Programming in. Net-2. Implementation and Analysis of ConcurrentStack
First of all, let's explain what is "unlocked. The so-called lockless operation is actually the use of atomic operations in the implementation method of the common stack. The principle of atomic operations is that the CPU sets a signal on the system bus, when other threads access the same memory, the CPU detects that the signal exists. Then, the current thread can access the memory only after the signal is released. Atomic operations are supported by hardware at the underlying layer by operating system APIs. Common Operations include atomic increments, atomic declines, and comparison exchanges. The ConcurrentStack implementation uses the comparison and exchange operations in atomic operations. Advantages of atomic operation: first, because no lock is used, deadlock can be avoided. Second, atomic operations do not block threads. For example, when a command is executed, the current thread is suspended (or a context switch is executed). Other threads can continue the operation. If the lock is used, the lock is not released after the current thread is suspended. Other threads will be blocked during operations. Third, because atomic operations are directly supported by hardware commands, the atomic operation performance is higher than that of common locks. Disadvantages of atomic operations: first, when an atomic operation fails, the rollback technique is used to retry the current operation. Therefore, it is prone to the issue of live locks and thread hunger, however, it can be mitigated by random backoff, but cannot be eliminated. Second, it is difficult for programmers to develop and use data and to test data. Start to enter the subject:. net ConcurrentStack has a lot of code, so this article will not post all the code, I only analyze the several parts that I think are important, all the source code can go to the following Microsoft official website to view http://referencesource.microsoft.com/#mscorlib/system/Collections/Concurrent/ConcurrentStack.cs The traditional stack structure is generally implemented using a single-chain table (. net Stack uses an array). The inbound Stack operation replaces the head node with the new node, and the outbound Stack operation points the head node to the next node. Therefore, when a large number of concurrent threads access the thread, the competition conditions are all at the header node. That is to say, if we can guarantee the security of the operation on the header node, the entire stack is safe. The public void Push (T item) operation in the stack copies the code public void Push (T item) {Node newNode = new Node (item); newNode. m_next = m_head; if (Interlocked. compareExchange (ref m_head, newNode, newNode. m_next) = newNode. m_next) {return;} // If we failed, go to the slow path and loop around until we succeed. pushCore (newNode, newNode);} private void PushCore (Node head, Node tail) {SpinWait spin = new SpinWait (); // K Eep trying to CAS the exising head with the new node until we succeed. do {spin. spinOnce (); // Reread the head and link our new node. tail. m_next = m_head;} while (Interlocked. compareExchange (ref m_head, head, tail. m_next )! = Tail. m_next);} copy the code (in the comments of the original version, we can see that the import stack uses the comparison Exchange (CAS) operation in the atomic operation) into the stack in three steps:. when data is written into the stack, a new node is allocated, and the header node in the current memory is used as the next node of the new node, newNode. m_next = m_head is the snapshot of the current header node, that is, another thread may change the node pointed to by m_head at this time. Note that the header node (m_head) the volatile keyword is used before the field Declaration. We know that the volatile keyword has two functions: the first is to disable the compiler and CPU to change the field location, and the second is to force refresh the high-speed cache of the CPU, when you read the field that declares this keyword, the data is reloaded in the memory every time and then read the high-speed cache of the CPU without using the older data in the CPU cache, here m_head uses volatile because the m_head value is changed when the CPU thread running on another core, but the current core CPU cache is not updated in a timely manner. And prevent the logic code from failing to follow the pre-defined logic after the m_head operation statement is moved to other statements during stack exit, such as newNode. m_next = m_head is a logical error caused by the if statement or the execution of CPU commands in disorder. B. Compare whether the current header node is the same as the snapshot header node of newNode. m_next we saved. If it is the same, replace the New node with the header node. Otherwise, the comparison fails and perform Step c. Interlocked. CompareExchange (ref m_head, newNode, newNode. m_next) is actually equivalent to the following code, but the code is executed in an atomic way. Copy the Code if (m_head = newNode. m_next) {m_head = newNode. m_next;} else {return m_head;} copy the code c. if step B fails, go to PushCore (Node head, Node tail). The PushCore step is to repeat the Interlocked in step B. compareExchange (ref m_head, newNode, newNode. m_next) operation until the new node is written. spin is used during the cycle. in fact, this API is used to prevent live locks. The thread that fails to switch will return, just like two people come on the way, this may happen: you go to the left, and he also goes to the left. You also turn to the right, so in order to avoid the two coming together, you can stop the first person and then continue, when you continue, you can change the stop time in the same direction. For example, you can stop the task for 5 seconds first. If the direction is the same, it will stop for 10 seconds. If the direction is the same, you can go for a cup of tea and try again. In fact, this idea is random concession. Of course, the spin-once implementation in the computer is not as simple as you think. When analyzing the implementation of the synchronization primitive SpinWati, I will introduce in detail the implementation method of the spin-once method, this part is understood as taking the Thread to rest for a while (a small detail Thread, Sleep (0) and Thread are found during the analysis of the source code of SpinOnce. when using Sleep (1), I find that many people do not know the difference between 0 and 1. here we need to explain: Thread, Sleep (0) indicates to hand over the time slice of the current Thread, run threads with the same or higher priority. Otherwise, the current thread will continue to run. That is to say, if there are no threads of the same level or higher level waiting for running, the current thread will still be run, thread hunger occurs in this area. Thread, Sleep (1) indicates that the Thread Sleep for 1 ms has no relationship with the priority of other threads. In fact, when it is set to 1, it is not set to Sleep for 1 ms, but 13 Ms or longer, specific to the system's clock cycle ). The public bool TryPop (out T result) Operation of the stack is also a CAS operation. The first node is constantly directed to the next node and then the first node is returned. If the switch fails, the process is cyclically performed, during the cycle, the random bounce technique is also used to reduce the probability of live locks, but the concession time will increase with the number of concession times. The stack operation API is designed to return the value of TryPop () to bool, because when multiple threads exit the stack at the same time, one thread may leave the stack empty after it leaves the stack, therefore, an error may occur in the outbound stack. The public void PushRange (T [] items, int startIndex, int count) Batch import operation is implemented on the basis of a single inbound stack, when multiple projects are pushed into the stack, the projects are first pushed into one stack, and then the header node is directed to the generated stack using the CAS operation, therefore, the efficiency of one batch call to the stack is higher than that of multiple calls to the stack. Question: Here is a small detail of code writing. In the method of batch stack entry, the ValidatePushPopRangeInput (items, startIndex, count) method is called to verify the correctness of the input parameters, in fact, this encoding is an important principle when writing code, called the operating room principle. It explains that when a doctor enters the operating room, the glove and body have been disinfected, these preparations have been completed, and the rest of the work is the focus of the doctor to complete the operation. This encoding principle not only improves code purity, but also reduces the number of pre-prediction times of CPU branches to speed up code execution. Therefore, we can use this method in daily coding, abstract A large number of parameter judgments into separate methods. Determine whether the IsEmpty attribute is empty. This operation is easy to implement. You only need to determine whether the header node is empty. In Microsoft's documents, we found that this attribute should be used instead of Count = 0 when determining whether the elements in the stack are empty, when Count counts the number of elements in the stack, it traverses the entire stack every time it is used. The time complexity is O (N) and IsEmpty is O (1 ), therefore, the efficiency of using Count = 0 is lower, especially when the data volume is large. IEnumerable <T> interface member implementation GetEnumerator () This method obtains the header node and traverses the entire stack in sequence. Note that this method only takes snapshots of the entire stack at the current time, the addition or reduction of elements in the stack during traversal does not change the number of GetEnumerator () return values. Other questions 1. in. net does not need to consider the ABA issue when writing lockless code, because this is. net garbage collection to ensure that, unless the object pool technology is used, for example, the operation of internal allocation nodes is the responsibility of the Object pool. 2. in. in the net source code, the common Stack <T> uses an array internally, while the ConcurretStack uses a linked list internally. The main reason is that the Stack has the overhead of copying data when resizing the array, especially in the case of a large amount of data, this performance loss is still relatively large. Another reason is that the internal use of linked lists can avoid ABA problems (provided that the object pool is not used when internal nodes are allocated ), however, the implementation of the linked list is not without any disadvantages. For example, we will allocate a new node to the stack, and GC will recycle the node after the node is output from the stack, this type of node becomes a spam node at this time. However, when ConcurretQueue is implemented, another solution is used because of the advanced features of the queue-the linked list + array method, this method not only solves the spam node problem, but also solves the performance overhead problem caused by array resizing and copying data. Finally, let's read. net source code process can actually find a lot of very classic coding skills and coding style, let us look at the code from top to bottom like a cloud like a piece of water in one go, this is also a code style that I highly recommend-to write your own code like writing poems, so that others can read your code like reading poems.