Multi-thread synchronization and multi-thread

Last Update:2017-01-19 Source: Internet

Author: User

Tags lock queue mscorlib

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Multi-thread synchronization and multi-thread

Multi-thread content is roughly divided into two parts. One is asynchronous operations, which can be implemented through special, thread pool, Task, Parallel, PLINQ, and so on. Here it involves the working thread and IO thread; the second is thread synchronization. What I want to learn and explore now is thread synchronization.

By studying the content in CLR via C #, a clear system structure is formed for thread synchronization. In multithreading, the thread synchronization structure is implemented, this structure is divided into two categories: one is a primitive structure and the other is a hybrid structure. The so-called primitive is the simplest structure used in the code. The base structure is divided into two types: user mode and kernel mode. Hybrid constructor uses the user mode and Kernel Mode of primitive constructor internally, and its mode has certain policies, because the user mode and kernel mode have their own advantages and disadvantages, the hybrid structure is designed to balance the advantages and disadvantages of the two. The following describes the entire thread synchronization architecture.

Starting from the cause of thread synchronization problems, when there is an integer variable A in the memory, the value stored in it is 2, when thread 1 is executed, it extracts the value of A from the memory and stores it in the CPU register, and assigns A value to 3. At this time, the time slice of thread 1 ends; then, the CPU distributes the time slice to thread 2, and thread 2 also extracts A from the memory value and puts it into the memory. However, because thread 1 does not put the new value 3 of variable A back into the memory, so what thread 2 reads is still the old value (that is, dirty data) 2, then, if thread 2 needs to make some judgments on the value, some unexpected results will appear.

Various methods are often used to solve the above problem of resource sharing. Next we will introduce them one by one.

First, let's talk about the user mode in the primitive structure. The advantage of the user mode is that it runs relatively fast because it is coordinated by a series of CPU commands, and the blocking caused by it is only a very short time blocking, for the operating system, this thread is always running and never blocked. The disadvantage is that only the system kernel can stop such a thread. On the other hand, because the thread is spinning rather than blocking, it will also occupy the CPU time, resulting in a waste of CPU time.

First, in the construction of the primitive user modeVolatileConstructor: This constructor is used to allow the CPU to read the specified Field (Field, that is, the variable) from the memory and write each write to the memory. However, it is related to the code optimization of the compiler. Let's take a look at the following code:

    public class StrageClass    {        vo int mFlag = 0;        int mValue = 0;         public void Thread1()        {            mValue = 5;            mFlag = 1;        }         public void Thread2()        {            if (mFlag == 1)                Console.WriteLine(mValue);        }    }

If two threads are used to execute the preceding two methods separately, the results are as follows: 1. No output is made; 2. Output 5. However, when the CSC compiler is compiled into the IL language or the JIT compiler is compiled into the machine language, the code will be optimized. In the Thread1 method, the compiler will think that assigning values to two fields is nothing, it only looks at the execution of a single thread, and does not take into account the problem of multithreading at all. Therefore, it may change the execution sequence of the two lines of code, resulting in assigning a value of 1 to mFlag, assign a value of 5 to mValue, which leads to the third result with 0 output. Unfortunately, I have never been able to test this result.

The volatile structure is used to solve this problem. The effect of using this structure is that when you perform read operations on fields using this structure, this operation is performed first in the original code order;OrWhen you write a field that uses this structure, this operation is performed at the end of the original code order.

The construction of volatile is now three. One is the two static methods VolatileRead and VolatileWrite of Thread. The analysis on MSND is as follows:

Thread. VolatileRead reads the field value. Regardless of the number of processors or the status of the processor cache, this value is the latest value written by any processor of the computer.

Thread. VolatileWrite immediately writes a value to the field so that the value is visible to all processors on the computer.

In a multi-processor system, VolatileRead obtains the latest value for the memory location written by any processor. This may need to refresh the processor cache; VolatileWrite ensures that the value of the written memory location is immediately visible to all processors. This may need to refresh the processor cache.

Even on a single processor system, VolatileRead and VolatileWrite ensure that the value is read or written to the memory and not cached (for example, in the Processor register ). Therefore, you can use them to synchronize access by another thread or by hardware update fields.

From the above text, it cannot be seen that he has any association with code optimization. Then let's look at it.

The volatile keyword is another implementation method constructed by volatile. It is a simplified version of VolatileRead and VolatileWrite, the volatile modifier can be used to ensure that all accesses to this field use VolatileRead or VolatileWrite. The description of the volatile keyword in MSDN is:

The volatile keyword indicates that a field can be modified by multiple concurrent threads. Fields declared as volatile are not restricted by Compiler Optimization (assuming that they are accessed by a single thread. This ensures that the field displays the latest value at any time.

It can be seen from this that it has something to do with code optimization. The above two conclusions are drawn:

1. Fields constructed with volatile are read/write operations directly on the memory and do not involve CPU registers, so that all threads can perform read/write operations on it and there is no dirty read. Read operations are atomic and write operations are atomic.

2. a volatile is used to construct a modifier (or access) field, which is executed in strict accordance with the code writing sequence. The read operation will be executed as early as possible, and the write operation will be executed at the latest.

The last volatile structure is added to the. NET Framework. The methods included are Read and Write, which is actually equivalent to VolatileRead and VolatileWrite of Thread. This requires clear source code. Let's take a Volatile Read method as a reference.

Let's look at Thraed's VolatileRead method.

Another user mode structure isInterlockedThis structure ensures that both read and write operations are in atomic operations. This is the biggest difference from volatile. volatile can only ensure read or write operations.

Why does Interlocked look like this? Let's take a look at the Interlocaked method.

Add (ref int, int) // call the ExternAdd external Method

CompareExchange (Int32 ref, Int32, Int32) // check whether 1 and 3 are equal. If they are equal, replace 2 with the original value of 1.

Decrement (Int32 ref) // decrease and return to call add

Exchange (ref Int32, Int32) // set 2 to 1 and return

Increment (Int32 ref) // auto-Increment call add

Take one of the methods Add (ref int, int) as an example (the Increment and Decrement methods actually call the Add method internally). It reads the value of the first parameter first, after summation with the second parameter, write the result to the first parameter. First, this entire process is an atomic operation, which includes both read and write operations. As for how to ensure the atomicity of this operation, it is estimated that you need to check the Rotor source code. In terms of code optimization, it ensures that all write operations are executed before Interlocked, which ensures that the values used in Interlocked are the latest; any variable is read after Interlocked, which ensures that the values used later are updated.

The CompareExchange method is very important. Although Interlocked provides very few methods, it can be used to expand other methods. The following is an example to find the maximum values of the two values and directly copy Jeffrey's source code.

Check the code above. before entering the loop, declare the target value at the beginning of each loop. After finding the maximum value, check whether the target value has changed. If yes, record the new value, calculate the maximum value again based on the new value until the target value remains unchanged. This satisfies the requirements stated in Interlocked. All writes occur before Interlocked, and Interlocked can read the latest values later.

Primitive kernel mode

The kernel mode processes thread synchronization issues by operating system kernel objects. First, let's talk about its disadvantages. Its speed will be relatively slow. There are two reasons: first, because it is implemented by the operating system kernel object, it needs to be coordinated within the operating system, and the other reason is that the kernel object is a number of unmanaged objects, after understanding the AppDomain, you will know that the accessed object is not in the current AppDomain, either by value or by reference. It is observed that the unmanaged resources are sent by reference, which may affect the performance. Combine the two points above to find out the disadvantages of the kernel mode. But it is also advantageous: 1. The thread does not "Spin" but block when waiting for resources. This saves CPU time and can set a timeout value. 2. The Window thread and CLR thread can be synchronized, or the threads in different processes can be synchronized (the former is not experienced, but the latter knows that semaphores has a boundary value resource ). 3. Security settings can be applied to prohibit access to an authorized account (this does not know what is going on ).

The base class of all objects in kernel mode is WaitHandle. The layers of all classes in kernel mode are as follows:

WaitHandle

EventWaitHandle

AutoResetEvent

ManualResetEvent

Semaphore

Mutex

WaitHandleInherits MarshalByRefObject. This means that an unmanaged object is sent by reference. WaitHandle mainly contains various Wait methods. The Wait method called will be blocked before the signal is received. WaitOne is waiting for a signal, WaitAny (WaitHandle [] waitHandles) is receiving any waitHandles signal, WaitAll (WaitHandle [] waitHandles) is waiting for all waitHandles signal. Each of these methods has a version that allows you to set a time-out period. Similar Wait methods are used to construct other kernel modes.

EventWaitHandleA boolean value is maintained internally, and the Wait method will be blocked when the Boolean value is false until the thread is released when the Boolean value is true. The methods that manipulate this Boolean value include Set () and Reset (). The former sets the Boolean value to true, and the latter to false. This is equivalent to a switch. After the Reset is called, the thread stops executing Wait until the Set is restored. It has two sub-classes, which are used in a similar way. The difference is that AutoResetEvent automatically calls the Reset after the Set is called, so that the switch is immediately restored to the off state. ManualResetEvent needs to be manually called to turn the switch off. In this way, a thread is allowed to pass each time the AutoResetEvent is released. ManualResetEvent may pass multiple threads before manually calling Reset.

SemaphoreInternally, an integer is maintained. When a Semaphore object is constructed, the maximum semaphores and initial signal values are specified. Each time WaitOne is called, The semaphores Add 1. When it is added to the maximum value, the thread will be blocked. When Release is called, one or more semaphores will be released, and one or more blocked threads will be released. This is in line with the producer and consumer problems. When the producer constantly adds the product to the product queue, it will WaitOne. When the queue is full, it is equivalent to the semaphore is full, the producer will be blocked. When the consumer consumes a product, it will Release a space in the product queue, at this time, because there is no space to store products, the producer can start to store products in the product queue.

MutexThe internal and rule of is a little more complex than the previous two. First, similar to the previous one, WaitOne is used to block the current thread and ReleastMutex is used to release the blocking of the thread. The difference is that WaitOne allows the first call thread to pass, and the other threads that call WaitOne will be blocked. The WaitOne thread can call WaitOne multiple times, however, you must call ReleaseMutex of the same number of times to release the instance. Otherwise, other threads will remain in the blocking state due to the number of wrong calls. Compared with the previous constructor, this constructor has two concepts: thread ownership and recursion, which cannot be implemented simply by the preceding constructor, except for additional encapsulation.

Hybrid Construction

The above primitive structure is the simplest implementation method. The user mode has a user mode that is fast, but it will waste CPU time. The Kernel Mode solves this problem, however, this will cause performance loss, and each has its own advantages and disadvantages. The hybrid structure integrates the advantages of both, and it will use the user mode internally through a certain policy, as appropriate, in another case, the kernel mode is used. However, these layer-by-layer judgments bring about memory overhead. There is no perfect structure in multi-threaded synchronization, and each structure has advantages and disadvantages. It makes sense if it exists. In combination with specific application scenarios, there will be an optimal structure available for use. It's just about whether we can weigh the pros and cons based on specific scenarios.

Classes with various Slim suffixesIn the System. Threading namespace, You can see several classes ending with the Slim Suffix: ManualResetEventSlim, SemaphoreSlim, ReaderWriterLockSlim. Except the last one, the other two are constructed in the same primitive kernel mode, but these three classes are all simplified versions of the original structure, especially the first two, which are used in the same way as the original one, but try to avoid using the kernel objects of the operating system, and achieve the lightweight effect. For example, in SemaphoreSlim, the kernel is used to construct ManualResetEvent. However, this structure is initialized by latency and is not used when it is not a last resort. ReaderWriterLockSlim will be introduced later.

Monitor and lockThe lock keyword is the most widely known method for implementing multi-thread synchronization. Next we will start with a piece of code.

This method is quite simple and has no practical significance. It just aims to see what the compiler will compile the code.

Note that try… appears in the IL code... Finally statement block, Monitor. Enter, and Monotor. Exit. Then change the code and compile it to see IL.

IL code

The code is similar, but not equivalent. In fact, the code equivalent to the lock block is as follows:

Since the lock essentially calls the Monitor, how does the Monitor lock an object and then implement thread synchronization. Originally, each object in the managed heap has two fixed members, one pointing to the object type, and the other pointing to a thread synchronization block index. This index points to an element of the synchronization block array, and Monitor locks the thread by this synchronization block. According to Jeffrey (author of CLR via C #), the synchronization block has three fields, the ownership thread Id, the number of waiting threads, and the number of recursion. However, I learned from another batch of articles that the members of thread synchronization blocks are not just these. If you are interested, you can read the article "reveal synchronous block Index". There are two articles. When Monitor needs to lock an object obj, it checks whether the synchronized block index of obj is an index of an array. If it is-1, find an idle synchronization block associated with it from the array, and the ownership thread Id of the synchronization block records the Id of the current thread; when a thread calls Monitor again, it will check whether the ownership Id of the synchronization block corresponds to the current thread Id. If yes, it will pass through and Add 1 to the number of recursion, if it does not match, the thread will be thrown into a ready queue (this queue actually exists in the synchronization block) and blocked; this synchronization block checks the number of recursion times when calling Exit to ensure that the ownership thread Id is cleared after recursion. The number of waiting threads determines whether a thread is waiting. If yes, the thread is removed from the waiting queue and released. Otherwise, the association with the synchronization block is removed, wait for the synchronization block to be used by the next locked object.

Monitor also has a pair of Methods: Wait and Pulse. The former allows the thread that obtains the lock to release the lock temporarily, and the current thread will be blocked and put into the waiting queue. It is not until other threads call the Pulse method that they put the thread in the ready queue from the waiting queue. Only when the next lock is released can they get the lock again, whether the data can be obtained depends on the situation in the waiting queue.

ReaderWriterLock, The traditional lock keyword (equivalent to the Enter and Exit of Monitor), the lock on the shared resources is completely mutex lock, once the locked resources are completely inaccessible to other resources.

ReaderWriterLock is similar to the shared lock and exclusive lock mentioned in the database. Generally, the read lock resource allows multiple threads to access it, while only one thread can access the resource with the write lock. Both threads with different shrinkage cannot access resources at the same time, but strictly speaking, the threads with read locks can access resources as long as they are in the same queue, different queues cannot be accessed. Resources with write locks can only be in one queue, while only one thread in the write lock queue can access resources. Whether the reading lock thread is determined by the criteria in a uniform queue. Whether the read lock thread and the last read lock thread have added the write lock during this time period, no other threads apply write locks, so both threads are in the same read lock queue.

ReaderWriterLockSlim is similar to ReaderWriterLock. It is an upgraded version of the latter and appears in. NET Framework3.5. It is said that it optimizes recursion and simplifies operations. I have not made any further research on this recursive policy. At present, I would like to list their common methods.

ReaderWriterLock

Acqurie, Release ReaderLock, or WriteLock

UpGradeToWriteLock/DownGradeFromWriteLock is used to upgrade to the write lock while reading the lock. Of course, this upgrade also involves switching the thread from the read lock queue to the write lock queue, so you need to wait.

ReleaseLock/RestoreLock release all locks and restore lock status

ReaderWriterLock implements the IDispose interface. Its rule is as follows:

TryEnter/Enter/Exit ReadLock/WriteLock/UpGradeableReadLock

(The above content is referenced from another note ReaderWriterLock.)

CoutdownEventThis is opposite to Semaphore, because Semaphore blocks threads when the internal count (that is, the Semaphore) reaches the maximum value, countdownEvent blocks threads when the internal count reaches 0. The methods are as follows:

AddCount // increase the count;

Signal // decrease the count;

Reset // Reset the count to a specified or initial value;

Wait // It is blocked only when the count is 0.

BarrierIt is also a relatively less-used hybrid structure, used to deal with multi-thread collaboration issues in step-by-step operations. It maintains a count internally, which indicates the number of participants in the collaboration. When different threads call SignalAndWait, it adds 1 to the count and blocks the called thread, all blocked threads are released until the count reaches the maximum value. If you still don't understand it, let's take a look at the sample code above MSND.

Here, the number of participants initialized for Barrier is 3, and the delegate is called every time a step is completed. This method outputs the value of count step index. The number of participants increases by two and decreases by one. The operations of each participant are the same. Perform atomic auto-increment for count. after auto-increment, call SgnalAndWait to inform Barrier that the current step has been completed and waits for the start of the next step. However, an exception is thrown in the callback method for the third time. Each participant will throw an exception when calling SignalAndWait. A Parallel operation is started through Parallel. Assume that the number of parallel jobs is different from the number of Barrier participants, which may lead to unexpected SignalAndWait situations.

Next we will talk about two attributes. This estimation is not a synchronous structure, but it can also play a role in thread synchronization.

MethodImplAttributeThis Attribute applies to methods when the given parameter is MethodImplOptions. synchronized, which locks the method body of the entire method. Any thread that calls this method will be blocked when the lock is not obtained, it will not be awakened until the thread with the lock is released. For a static method, it is equivalent to locking the type object of the class, that is, lock (typeof (ClassType). For an instance method, it is equivalent to locking the instance of the object, that is, lock (this ). At first, I suspected that I had called the lock conclusion internally. So I compiled it with IL and found that the code of the method body was no different. I checked some source code and had no clue, later, we found that its IL method header is different from a common method, and it has a synchronized

So I found a variety of information on the Internet and found that the "junchu25" blog [1] [2] mentioned the use of WinDbg to view the code generated by JIT.

Call Attribute's

Call lock

It is not recommended to use thread synchronization with this Attribute even Jeffrey.

System. Runtime. Remoting. Contexts. SynchronizationAttributeThis Attribute applies to classes. This Attribute is added to the class definition and inherits from the ContextBoundOject class. It will add the same lock to all methods in the class and has a wider range than MethodImplAttribute, when a thread calls any of these methods, if the lock is not obtained, the thread will be blocked. There is a saying that it calls lock in essence, and it is not easy to verify this statement. There are few resources in China, and AppDomain and thread context are involved in it. At last, the core isSynchronizedServerContextSinkThis class is implemented. AppDomain should be introduced in another article. But here we should also talk about it a little bit. Previously we thought there was a thread stack and heap memory in the memory. This was just a basic division, and the heap memory would be divided into several AppDomains, each AppDomain also has at least one context, and each object is subordinate to an context in an AppDomain. Cross-AppDomain objects cannot be directly accessed. either sending by value (equivalent to copying an object to the called AppDomain), or sending by reference. For sending by reference, the class must inherit MarshalByRefObject. When calling an object that inherits this class, it is called in the form of a proxy instead of calling the class itself. Therefore, cross-context messages also need to be sent by value. Normally, an object is constructed in the default context under the default AppDomain of the process, and its instance using the SynchronizationAttribute feature belongs to another context, when a class that inherits the ContextBoundObject base class accesses an object across contexts, it also uses a proxy to access the object by referencing the envelope, rather than accessing the object itself. The RemotingServices. IsObjectOutOfContext (obj) method can be used to determine whether to access objects across contexts. SynchronizedServerContextSink is an internal class of mscorlib. When a thread calls a cross-context object, the call is encapsulated into a WorkItem object by SynchronizedServerContextSink. This object is also an internal class in mscorlib. SynchronizedServerContextSink requests SynchronizationAttribute, attribute determines whether the currently processed WorkItem is executed immediately or in a first-in-first-out WorkItem Queue according to whether there are multiple WorkItem execution requests, this queue is a member of SynchronizationAttribute. When a queue member leaves the queue or Attribute determines whether to execute WorkItem immediately, a lock is required. The locked object is also the queue of this WorkItem. There are several types of interactions involved here, which are not completely visible yet. The above process may be wrong and will be supplemented after analysis. However, thread synchronization implemented using this Attribute is not recommended based on the sequence intuition, mainly because of the performance loss, and the lock range is also relatively large.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More