A detailed description of C # Multi thread Synchronization (graphics)

Last Update:2017-03-28 Source: Internet

Author: User

Tags lock queue mscorlib

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article mainly introduces the knowledge about C # thread synchronization. Have a good reference value, follow the small series together to see it

Multithreaded content roughly divided into two parts, one is asynchronous operation, through the dedicated, thread pool, task,parallel,plinq, and so on, and this involves the work thread and IO thread, and the other is the thread synchronization problem, I now learn and explore is the thread synchronization problem.

Through the study of the contents of the CLR via C #, the thread synchronization formed a clearer framework, the thread synchronization in multi-threading thread synchronization is the construction, this structure is divided into two categories, one is the primitive structure, and the other is a hybrid structure. The so-called primitives are the simplest constructs used in the code. Erroneous constructs are divided into two categories, one is User mode and the other is kernel mode. The hybrid construct is a user-mode and kernel-mode that uses primitive constructs internally, and the mode of using it will have a certain strategy, because the user mode and kernel mode each have the pros and cons, and the hybrid structure is designed to balance the pros and cons of both. The following lists the entire thread synchronization architecture

Primitives

1.1 User Mode

1.1.1 Volatile

1.1.2 Interlock

1.2 Kernel mode

1.2.1 WaitHandle

1.2.2 ManualResetEvent and AutoResetEvent

1.2.3 Semaphore

1.2.4 Mutex

Mixed

2.1 Various Slim

2.2 Monitor

2.3 MethodImplAttribute and SynchronizationAttribute

2.4 ReaderWriterLock

2.5 Barier (less)

2.6 Coutdownevent (less)

Start with the cause of the thread synchronization problem, when there is an integer variable A in memory, the value is 2, when thread 1 executes it will take the value of a from memory to the CPU register, and assign a value of 3, at which time the thread 1 ends; then the CPU points the time slice to thread 2. Thread 2 also takes a from the memory value to the memory, but since thread 1 does not put the new value of variable a 3 back into memory, thread 2 reads the old value (that is, dirty data) 2, then thread 2 if you need to make some judgments on the value of a and so on, there will be some unintended results.

In response to the above-mentioned resource sharing problem processing, a variety of approaches are often used. The following describes

First of all, the user mode in the primitive structure, the advantage of the user mode is that it executes relatively fast, because it is coordinated by a series of CPU instructions, it caused the blocking is only a very short time of blocking, the operating system, the thread has been running, has never been blocked. The disadvantage is that only the system kernel can stop such a thread from running. On the other hand, because the thread is spinning instead of blocking, it also consumes the CPU time, resulting in a waste of CPU time.

The first is the volatile constructs in the primitive user-mode constructs, this constructs the network many to say is lets the CPU to the specified field (field, namely the variable) reads is reads from the memory, each write is writes to the memory. However, it is related to the compiler's code optimization. First look at the following code

public class Strageclass {vo int mflag = 0; int mvalue = 0, public void Thread1 () {  mvalue = 5;  Mflag = 1; } public void Thread2 () {  if (Mflag = = 1)  Console.WriteLine (Mvalue);}}

In the understanding of multi-threaded synchronization problems of the students will know that if the two threads to perform the above two methods, the result is two:

1. Do not output anything;

2. Output 5. However, in the process of compiling the CSC compiler into IL or JIT compiling into machine language, code optimization is done, in method Thread1, the compiler will feel that assigning a value to two fields will not be so called, it will only stand in the perspective of a single thread execution, will not take into account the problem of multithreading, Therefore, it is possible that the execution order of the two lines of code is scrambled, causing the mflag to be assigned a value of 1, and then assigning a value of 5 to Mvalue, which results in a third result, output 0. Unfortunately, I have been unable to test this result.

The solution to this phenomenon is the volatile construct, which uses the effect that when a field that uses this construct is read, the operation is guaranteed to be executed first in the original code order, or when a field that uses this construct is written. This is guaranteed to be executed at the end of the original code sequence.

The implementation of the volatile structure now has three, one is the thread of the two static methods Volatileread and Volatilewrite, on the msnd on the following analysis

Thread.volatileread read the field value. This value is the most recent value written by any processor of the computer, regardless of the number of processors or the state of the processor cache.

Thread.volatilewrite immediately writes a value to the field so that the value is visible to all processors in the computer.

On multiprocessor systems, Volatileread obtains the most recent value of the memory location written by any processor. This may require flushing the processor cache; Volatilewrite ensure that the value written to the memory location is immediately visible to all processors. This may require a flush of the processor cache.

Even on a single-processor system, Volatileread and volatilewrite ensure that values are read or written to memory and are not cached (for example, in a processor register). Therefore, you can use them to synchronize access by another thread, or by a field that is updated by hardware.

There is no link between him and the Code optimizer from the text above, then look down.

The volatile keyword is another way to implement the volatile construct, which is a simplified version of Volatileread and Volatilewrite, and using the volatile modifier pairs the field to ensure that all access to the field is used Volatileread or Volatilewrite. The description of the volatile keyword in MSDN is

The volatile keyword indicates that a field can be modified by multiple concurrently executing threads. Fields declared as volatile are not limited by compiler optimizations (presumably accessed by a single thread). This ensures that the field is present at any time with the most recent value.

From here you can see that it is related to code optimization. And a look at the above introduction to draw two conclusions:

1. The use of volatile construction of the field read and write is directly to the memory operation, does not involve the CPU register, so that all the threads read and write to it is synchronous, there is no dirty read. Read operations are atomic, and write operations are atomic.

2. Using the volatile construct modifier (or access) field, it is executed in the order in which the code is written, the read operation will be executed at the earliest, and the write operation will be executed at the latest.

The last volatile construct, added in the. NET framework, contains both read and write, which is actually equivalent to the Volatileread and volatilewrite of the thread. This needs to take the source to understand, casually take a volatile read method to see

And then look at Thraed's Volatileread method.

Another user-mode construct is interlocked, which ensures that both read and write are in the atomic operation, which is the biggest difference from the above volatile, and volatile can only ensure simple reading or simple writing.

Why interlocked is so, look at the Interlocaked method will know

ADD (ref Int,int)//Call Externadd external method CompareExchange (ref Int32,int32,int32)//1 is equal to 3, equals replaces 2, returns 1 of the original value decrement (ref INT32)//decrement and return call Addexchange (ref Int32,int32)//Set 2 to 1 and return increment (ref Int32)//self-increment call add

Just take one of these methods add (ref Int,int) (increment and decrement actually call the Add method internally), it reads the value of the first parameter, and after summing the second argument, writes the result to the first argument. First this entire process is an atomic operation, which contains both a read and a write in this operation. As to how to ensure the atomicity of this operation, it is estimated that the rotor source code is needed to check. In terms of code optimization, it ensures that all writes are performed before interlocked, which ensures that the values used in interlocked are up-to-date, and that any variable reads after interlocked, which guarantees that the values used later are the most recently changed.

CompareExchange method is very important, although interlocked provides a few methods, but based on this can be extended to other more methods, below is an example, to find out the maximum value of two values, directly copied the Jeffrey Source

Look at the above code, before entering the loop, declare the value of target at the beginning of each loop, after calculating the maximum value, check whether the value of the target has changed, if there is a change, we need to record the new value again, according to the new value to find the maximum value, until the target is unchanged, This satisfies the interlocked that the writing takes place before interlocked, interlocked will be able to read the latest value later.

Primitive kernel mode

Kernel mode relies on the kernel object of the operating system to handle thread synchronization issues. First of all, its drawbacks, it will be relatively slow speed. There are two reasons, one because it is implemented by the operating system kernel objects, the need for internal coordination of the operating system, and another reason is that the kernel objects are some unmanaged objects, after understanding the AppDomain will know that the object is not in the current AppDomain or is marshaled by value, Or you can marshal by reference. After observing that this part of the unmanaged resource is marshaled by reference, there is a performance impact. Combine the above two aspects of two points to derive the drawbacks of kernel mode. But he is also advantageous: 1. Threads do not "spin" but block while waiting for resources, which saves CPU time, and this block can set a timeout value. 2. You can implement synchronization between the window thread and the CLR thread, or you can synchronize threads in different processes (the former is not experienced, while for the latter you know that there are boundary value resources in semaphores). 3. Security settings can be applied to prohibit access to authorized accounts (this is not known).

The base class for all kernel-mode objects is WaitHandle. All class hierarchies for kernel mode are as follows

WaitHandle

EventWaitHandle

AutoResetEvent

ManualResetEvent

Semaphore

Mutex

WaitHandle inherits MarshalByRefObject, which is the marshaling of unmanaged objects by reference. WaitHandle is primarily a variety of wait methods that call the wait method to be blocked until the signal is received. WaitOne is waiting for a signal, WaitAny (waithandle[] waithandles) is receiving any one waithandles signal, WAITALL (waithandle[] waithandles) is the signal waiting for all waithandles. Each of these methods has a version that allows setting a time-out. Other kernel-mode constructs have a similar wait method.

The EventWaitHandle maintains a Boolean value, and the wait method blocks the thread when the Boolean value is False, until the thread is freed when the Boolean value is true. The method for manipulating this Boolean value is set () and reset (), which sets the Boolean value to True and the latter to false. This is equivalent to a switch, where the thread executes until wait is paused until set is resumed after the reset is called. It has two subclasses, in a similar way, except that the reset is automatically called after AutoResetEvent calls set, and the switch is immediately restored to a closed state, while ManualResetEvent needs to call set manually to close the switch. This results in an effect that normally autoresetevent a thread each time it is released, while ManualResetEvent may allow multiple threads to pass before manually calling reset.

The interior of the semaphore is maintained by a shaping, when constructing a semaphore object will specify the maximum semaphore and initial semaphore value, whenever the call WaitOne, the semaphore will be added 1, when added to the maximum value, the thread will be blocked, When release is called, one or more semaphores are released, and one or more of the threads that are blocked are freed. This is in line with the problem of producers and consumers, when the producers continue to add products to the product queue, he will be WaitOne, when the queue is full, the equivalent of a signal is full, the generator will be blocked, when the consumer consumes a product, will release released the product queue of a space, At this time, because there is no space to store the product producers can start to work in the product queue to store products.

The internal and the rule of the mutex is slightly more complex than the previous two, the first thing that is similar to the previous is that the same will block the current thread through WaitOne and Releastmutex to release the blocking of the thread. The difference is that WaitOne allows the first invocation of the thread through, the rest of the thread calls to WaitOne will be blocked, through the WaitOne thread can repeatedly call WaitOne multiple times, but must call the same number of ReleaseMutex to release, Otherwise, it will cause other threads to remain blocked because of the wrong number of times. Compared to the previous construction, this structure will be wired to the ownership and recursion of the two concepts, this is simply by the previous structure can not be achieved, except for additional packages.

Mixed construction

The primitive structure above is the simplest implementation, the user mode has a user-mode fast, but it will lead to a waste of CPU time, kernel mode solves this problem, but it will bring the loss of performance, each has the pros and cons, and the mixed structure is to gather the benefits of both, It uses the user mode internally through a certain strategy at the right time, and in another case it uses kernel mode. But these layers of judgment bring in memory overhead. In multi-threaded synchronization, there is no perfect structure, each structure has pros and cons, there is meaning, combined with specific application scenarios will have the best structure to use. It's just that we can weigh the pros and cons according to the specific scenario.

Classes of various slim suffixes, in the System.Threading namespace, you can see several classes that end with a slim suffix: manualreseteventslim,semaphoreslim,readerwriterlockslim. In addition to the last one, the remaining two have the same structure in primitive kernel mode, but these three classes are simplified versions of the original constructs, especially the first two, using the same way as the original, but to avoid using the operating system kernel objects, and achieve a lightweight effect. For example, the kernel construction ManualResetEvent is used in Semaphoreslim, but this construct is not used when the delay is initialized, not when it is not required. As for ReaderWriterLockSlim, we will introduce it later.

Monitor and Lock,lock keyword is the most widely known as a means of implementing multi-threaded synchronization, then the following from a section of code

This method is quite simple and meaningless, just to see how the compiler compiles this code, by looking at Il as follows

Notice that the try...finally statement block, Monitor.Enter, and Monotor.exit methods appear in the IL code. Then change the code and then compile and look at IL.

Il code

The code is similar, but not equivalent, and actually the code equivalent to the lock statement block is as follows

So since lock is essentially a call to monitor, monitor is how to implement thread synchronization by locking an object. It turns out that each object in the managed heap has two fixed members, a pointer to the object type, and a thread synchronization block index. This index points to an element of an array of synchronous blocks, and the monitor-to-line Cheng is relying on this synchronization block. According to Jeffrey (the author of the CLR via C #), there are three fields in the synchronization block, the thread ID of the ownership, the number of waiting threads, and the number of recursion. However, I learned through another article that the thread synchronization block members are not just these few, interested students can read the "Reveal Synchronization block index" article, there are two articles. When monitor needs an object obj locking, it checks if obj's synchronization block index has an index to an array, and if 1, finds an idle synchronization block associated with it from the array, while the ownership thread ID of the synchronization block records the ID of the current thread When the thread calls the monitor again, it checks whether the ownership ID of the synchronization block corresponds to the current thread ID, which can be passed by the corresponding, plus 1 on the recursive count, and if it does not, throw it into a ready queue (which actually exists in the synchronization block) and block it. This synchronization block will check the recursive count when calling exit to ensure that the ownership thread ID is cleared when the recursion is finished. By waiting for the number of threads to know if the thread is waiting, if any, remove the threads from the wait queue and release them, otherwise disassociate the synchronization block and let the synchronization block wait for the next locked object to be used.

There is also a pair of methods in monitor wait and pulse. The former allows the thread that obtains the lock to briefly release the lock, and the current thread is blocked and placed in the wait queue. Until the other thread calls the Pulse method, the thread is placed in the ready queue from the wait queue, waiting for the next lock to be released, to be able to acquire the lock again, depending on the situation in the waiting queue.

ReaderWriterLock read-write lock, the traditional lock keyword (that is, equivalent to the monitor's enter and exit), his lock on the shared resource is a full mutex, once the lock of the resources of other resources are completely inaccessible.

In addition, ReaderWriterLock locks and write locks on mutually exclusive resources, similar to shared and exclusive locks mentioned in the database. The general situation is that a resource with a read lock allows multiple threads to access it, while a write-lock resource can be accessed by only one thread. Two kinds of threads with different scaling cannot access resources at the same time, and strictly speaking, the thread with the read lock can access the resource only in the same queue, while the different queues are inaccessible; The resource with the write lock can only be in one queue, and only one thread in the Write lock queue can access the resource. The criterion for distinguishing between read-lock threads is whether the thread of this read-lock is the same as the last thread that read the lock, whether there are other threads with write locks, no other thread loads write locks, and all two threads are in the same read-lock queue.

Similar to ReaderWriterLockSlim and ReaderWriterLock, the latter is the upgraded version, appearing in. NET Framework3.5, which is said to optimize recursion and simplify operations. I have not yet delved into this recursive strategy. Now I'll give you a list of the methods they usually use.

Common methods of ReaderWriterLock

Acqurie or release readerlock or writelock arrangement combinations

The Upgradetowritelock/downgradefromwritelock is used to upgrade to a write lock in a read lock. Of course, during this upgrade process, threads are also involved in switching from the read-lock queue to the write-lock queue, so they need to wait.

Releaselock/restorelock Release all lock and restore lock status

The ReaderWriterLock implements the IDispose interface, which is the following pattern

Tryenter/enter/exit Readlock/writelock/upgradeablereadlock

Coutdownevent is less of a hybrid construct, which, contrary to semaphore, is the semaphore when the internal count (that is, the semaphore) reaches its maximum value, causing the thread to block, Countdownevent is the thread that blocks when the internal count reaches 0. The method has

Addcount//Count is incremented, Signal//Count is decremented, reset//Count is reset to specified or initial, Wait//is not blocked if count is 0, otherwise it is blocked.

Barrier is also a less-used hybrid construct that handles multi-threaded collaboration issues in step-by-steps operations. It maintains a count internally, which represents the number of participants in the collaboration, and when different threads call signalandwait, it adds 1 to the count and blocks the calling thread until the count reaches its maximum value, releasing all blocked threads. If you still don't understand, just take a look at the example code above MSND

The number of participants here to initialize the barrier is 3, and the delegate is invoked each time a step is completed, which is the value step index of the output count. The number of participants subsequently increased by two and decreased by one. Each participant's action is the same, the atomic increment is made to count, and the sgnalandwait is called to tell barrier that the current step is complete and waits for the start of the next step. But the third time because an exception is thrown in the callback method, each participant throws an exception when calling SignalAndWait. A parallel operation was initiated through parallel. Assuming that the number of jobs that are opened in parallel is different from the number of barrier participants, there will be unexpected situations in signalandwait.

Next say two attribute, this estimate is not a synchronous structure, but also can play a role in the thread synchronization

MethodImplAttribute This attribute applies to methods, When a given parameter is methodimploptions.synchronized, it locks the method body of the entire method, and the thread that invokes the method is blocked when the lock is not acquired until the thread that owns the lock is released before it wakes up. For a static method it is equivalent to locking the type object of the class, that is, lock (typeof (ClassType)), and for instance methods he is equivalent to locking the instance of the object, that is, lock (this). The first call to it inside the lock this conclusion there is suspicion, so with il compiled a bit, found that the code of the method body is no different, see some source or no clue, and later found its Il method head with ordinary methods have a difference, more a synchronized

So find all kinds of information on the net, finally found "junchu25" blog [note] mentioned with WinDbg to see the JIT generated code.

Call the attribute

Call the Lock's

It is not recommended for thread synchronization Jeffrey implemented with this attribute.

System.Runtime.Remoting.Contexts.SynchronizationAttribute This attribute applies to classes, adding this attribute to the class definition and inheriting the class with Contextboundoject, which will All methods add the same lock, compared to methodimplattribute it's wider, when a thread calls any method of this class, if the lock is not acquired, the thread is blocked. There is a saying is that it essentially called the lock, for the proof of this argument is more difficult, the domestic resources are few, it involves the AppDomain, thread context, the final core is by synchronizedservercontextsink this class to achieve. The AppDomain should be introduced in a separate article. But here also to say a little bit, previously thought in memory is wired stacks and heap memory, and this is only a very basic division, heap memory will also be divided into several AppDomain, in each AppDomain also has at least one context, Each object is subordinate to a context within an AppDomain. Objects that span an AppDomain are not directly accessible or marshaled by value (equivalent to a deep copy of an object to the calling AppDomain) or marshaled by reference. This class is required to inherit MarshalByRefObject for marshaling by reference. The invocation of an object inheriting this class is not the invocation of the class itself, but the invocation in the form of a proxy. Then the push-to-value marshaling is also required for the steps below. An object that is normally constructed is in the default context under the default AppDomain of the process, and the class that uses the SynchronizationAttribute attribute has its instance in a different context. A class that inherits the ContextBoundObject base class accesses the object cross-context, and is also accessed by a delegate in a way that is marshaled by reference, rather than accessing the object itself. The Remotingservices.isobjectoutofcontext (obj) method can be used to determine whether to access the object below. Synchronizedservercontextsink is an inner class of mscorlib. When a thread invokes an object that is on a synchronizedservercontextsink, the call is encapsulated into an WorkItem object, which is also mscorlib an inner class in the Synchronizedservercontextsink on request SynchronizationAttribute,attribute depending on whether there are more than one WorkItem execution request now, the WorkItem that is currently being processed will be executed immediately or put into a FIFO workitem queue and executed sequentially, This queue is a member of the SynchronizationAttribute, queue members are queued when the team or attribute to determine whether to execute the WorkItem immediately need to obtain a lock locks, the object is the lock is the WorkItem queue. This involves several classes of interaction, I have not yet fully understood, the above process may be wrong, to be analyzed and then supplemented. However, through this attribute implementation of thread synchronization pressing intuition is also not recommended, mainly the performance of the loss, the scope of the lock is also relatively large.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More