Introduction: multi-core multithreading has become the next topic of fashion, and lock-free programming is also a hot topic in this topic. The Linux kernel may be one of the largest and most complex parallel programs today, providing an excellent example for us to analyze multi-core multithreading. Kernel designers have brought the latest lock-free programming technology into the 2.6 system kernel. This article will explain the latest version 2.6.10.
Introduction to non-blocking Synchronization
How to correctly and effectively protect shared data is a challenge for writing parallel programs. The common means is synchronization. Synchronization can be divided into Blocking-type synchronization (blocking synchronization) and non-blocking-type synchronization (non-blocking synchronization ).
Blocking synchronization means that when a thread reaches the critical section, it is blocked because another thread already holds a lock to access the shared data and thus cannot obtain the lock resources until another thread releases the lock. Common synchronization primitives include mutex and semaphore. If the synchronization scheme is improperly used, the deadlock, livelock, priority inversion, and inefficiency will occur. In order to reduce the risk and improve the program running efficiency, the industry has proposed a synchronization scheme without locks. The algorithm designed based on this design idea is called a non-blocking algorithm, its essential feature is that stopping a thread does not impede the running of other execution entities in the system.
There are three popular non-blocking synchronization implementation solutions:
1. Wait-free
Wait-free means that any operation of any thread can end in a limited step without worrying about the execution speed of other threads. Wait-free is based on per-thread and can be considered starvation-free. Unfortunately, this is not the case. The use of wait-free programs does not guarantee starvation-free, and the memory consumption also increases linearly with the number of threads. Currently, only a few non-blocking algorithms have implemented this.
2. Lock-free
Lock-free means to ensure that at least one of all threads that execute it can continue to be executed. Because each thread is not starvation-free, that is, some threads may be arbitrarily delayed. However, at least one thread in each step can be executed, therefore, the system as a whole is continuously executed and can be considered as system-wide. All wait-free algorithms are lock-free.
3. obstruction-free
Obstruction-free means that at any point in time, each operation of an isolated running thread can end within a limited step. As long as there is no competition, the thread can continue to run. Once the shared data is modified, obstruction-free requires that you stop some completed operations and perform rollback. All lock-free algorithms are obstruction-free.
In conclusion, it is not difficult to conclude that obstruction-free is the worst performing non-blocking synchronization, while wait-free is the best, but it is also the most difficult to implement, therefore, the lock-free algorithm has been widely used in today's running programs, such as Linux kernel.
Generally, the lock-free algorithm is implemented using atomic read-Modify-write primitives. ll and SC are ideal primitives in the field of lock-free theoretical research, however, implementing these primitives requires the support of CPU commands. Unfortunately, no CPU directly implements the SC primitive. Based on this theory, the industry has put forward the famous CAS (compare-and-swap) Operation to implement the lock-free Algorithm Based on atomic operations. Intel has implemented a command similar to this operation: cmpxchg8.
The CAS primitive is used to compare the value of a memory address (1 byte) with an expected value. If the value is equal, the value of the memory address is replaced with a new value, the CAS operation pseudo code is described as follows:
Bool CAS (T * ADDR, t expected, t newvalue)
{
If (* ADDR = expected)
{
* ADDR = newvalue;
Return true;
}
Else
Return false;
}
In the actual development process, CAS is used for synchronization. The Code is as follows:
Do {back up old data;
Construct new data based on old data;
} While (! CAS (memory address, backup old data, new data ))
This means that when the two are compared, if they are equal, the shared data is not modified, replaced with a new value, and then continues to run. If they are not equal, the shared data has been modified, discard the operation that you have done, and then re-execute the operation. It is easy to see that the CAS operation is based on the assumption that the shared data will not be modified, and adopts the Commit-retry mode similar to the database. This assumption can greatly improve performance when there are few opportunities for synchronization conflicts.
Lock level
Based on the complexity, lock granularity, and running speed, the following lock levels can be obtained:
Blocking synchronization and non-blocking
Synchronization. The difference between lock-based and lockless-based is only the difference in lock granularity. The bottom-layer scheme in the figure is the mutex and semaphore which are frequently used. The Code complexity is low, but the running efficiency is also the lowest.
Michael and Scott's pseudo code:
Http://www.cs.rochester.edu/research/synchronization/pseudocode/queues.html
Michael & Scott lockless queue C ++ implementation (tested in Linux ):
Http://www.cnblogs.com/napoleon_liu/archive/2010/08/07/1794566.html
Lock free queue implementation in C ++ and C # (depending on the Windows platform ):
Http://www.codeproject.com/Articles/23317/Lock-Free-Queue-implementation-in-C-and-C