Linux sync mechanism-basic concepts (deadlock, live lock, starve, priority reversal, escort phenomenon)

Last Update:2017-04-16 Source: Internet

Author: User

Tags inheritance mutex

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Deadlock (deadlock)

Refers to two or more than two processes in the execution process, because of the contention for resources caused by a mutual waiting phenomenon, if there is no external force, they will not be able to proceed. At this point the system is in a deadlock state or the system generates a deadlock, and these processes, which are always waiting on each other, are called deadlock processes.

Although the process may occur during the operation of the deadlock, but the occurrence of deadlocks must also have certain conditions, the occurrence of deadlocks must have the following four necessary conditions.

1) mutually exclusive condition: refers to the process of allocating resources to the exclusive use, that is, for a period of time a resource is occupied by only one process. If there are other processes requesting resources at this time, the requestor can wait until the resource-occupying process is freed.
2) request and hold condition: means that the process has maintained at least one resource, However, a new request for resources has been made, and the resource has been occupied by other processes, when the request process is blocked, but the other resources that have been obtained are kept.
3

4) Loop wait condition: Refers to the deadlock, there must be a process -the resources of the ring chain, that is, the process set {P0,P1,P2, etc.,Pn} P0 is waiting for a P1 occupied resources; P1 is waiting for P2 to occupy the resources,...,PN is waiting for resources that have been consumed by P0.

Understanding the causes of deadlocks, especially the four necessary conditions that generate deadlocks, can prevent, prevent, and unlock deadlocks to the maximum possible. Therefore, in the system design, process scheduling and other aspects of how to not let these four necessary conditions to set up, how to determine the rational allocation of resources algorithm, to avoid the process of permanent occupation of system resources. In addition, to prevent the process in the waiting state to occupy resources, during the system operation, each system issued by the process can be satisfied with the resource request for dynamic check, and based on the results of the check to determine whether to allocate resources, if the system can deadlock after the allocation, it is not allocated, otherwise allocated. Therefore, the allocation of resources should be given reasonable planning.

Avoid deadlock algorithm 1:

The method of orderly resource allocation
This algorithm resource is the uniform number of all resources in a certain rule system (for example, the printer is 1, the tape drive is 2, the disk is 3, etc.), the application must be in ascending order. The system requires the application process:
1.all resources that must be used in the same category must be completed at one time;
2, in the application of different types of resources, must be in accordance with the number of various types of equipment to apply.

For example:

Process PA, the order of using resources is R1,R2;

Process PB, the order of using resources is R2,R1;

If dynamic distribution is used, it is possible to form a loop condition, resulting in a deadlock.
Using the ordered resource allocation method:R1 is numbered 1,R2 is numbered 2;
PA: The order of application should be:R1,R2
PB: The order of application should be:R1,R2
This destroys the loop condition and avoids the deadlock.

Avoid deadlock algorithm 2:
Bank algorithm
The most representative algorithm for avoiding deadlock algorithms is the banker algorithm proposed by DIJKSTRAE.W in 1968:
The algorithm needs to check the applicant's maximum demand for resources, if the existing resources of the system can satisfy the request of the applicant, it will satisfy the request of the applicant.
This allows the requester to quickly complete its calculations and then release the resources it occupies, thus ensuring that all processes in the system can be completed, thus avoiding deadlocks.

Live Lock (livelock)

Refers to things 1 can use resources, but it lets other things first use resources;

Things 2 can use resources, but it also allows other things to use resources first, so both have been modest and unable to use resources.

A simple way to avoid a live lock is to use a first-come-first-serve strategy. When multiple transaction requests block the same data object, the blocking subsystem queues the transaction in the order in which it is blocked, and once the lock on the data object is released, the first transaction in the request queue is approved for a lock.

Hunger (hungry)

The so-called hunger, refers to if the transaction t1 blocked the data r, transaction t2 and request blockade r, so t2 wait. T3 also requested the blockade r, when t1 released the r on the blockade, the system first approved t3 's request, t2 still waiting. Then T4 and requested the blockade r, when t3 released the r on the blockade, the system also approved t4 request ... T2 may wait forever, and that is hunger.

priority reversal (inversion)

Priority reversal refers to a low-priority task that holds a shared resource that is required by a high-priority task. High-priority tasks are blocked until a low-priority task releases resources because of a lack of resources. While low priority gets less CPU time, if there is a task between the two at this time, and the shared resource is not needed, then the task in that priority will exceed both tasks to get CPU time. If a high priority waits for a resource instead of blocking the wait, instead of a busy loop, the resource may never be available because the low-priority process cannot compete with the high-priority process for CPU time and cannot be executed to free up resources. The consequence is that high-priority tasks do not have access to resources and continue to move forward.

Solution:
(1) set the upper priority, give the critical section a high priority, the process of entering the critical section will be given this high priority, if other attempts to enter the critical section of the process priority is lower than this high priority, then the priority reversal will not occur.

(2) Priority inheritance, when a high-priority process waits for resources held by a low-priority process, the low-priority process temporarily gets the priority of the high-priority process, and when the shared resource is freed, the low-priority process returns to its original priority level. Embedded System VxWorks is the use of this strategy.

There's another gossip here,1997Mars rover of the United States of America (usingvxworks) Encounters a problem caused by a priority reversal issue. Simply put, the Mars rover has an information bus that has a high-priority bus task that is responsible for accessing the bus data, access to the bus through a mutex (shared resources appear), and a low-priority, not very frequent meteorological gathering task, it needs to write data on the bus, also need to access the mutex Finally, there is a medium-priority communication task that runs longer. Normally this system runs without problems, but one day, when the meteorological task obtains the mutual exclusion lock to the bus writes the data, an interrupt occurrence causes the communication task to be dispatched to be ready, the communication task has preempted the low priority meteorological task, but Wuqiaobuchengshu is, At this point the high-priority bus task is waiting for the meteorological task to finish writing the data back to the mutex, but because the communication task preempted the cpu and the running time is longer, resulting in the meteorological task can not be cpu time also can not release the mutex, originally a high-priority bus task can not be executed, bus tasks can not be implemented in a timely manner the results of the Pathfinder is considered a serious error, and finally the entire system is restarted. vxworks allow precedence inheritance, However, the unfortunate engineers turned this option off.

/span>

(3) The third method is to use the interrupt prohibition, to protect the critical section by prohibiting interrupts, the system using this strategy has only two priorities: preemption priority and interrupt-prevention priority. The former is the priority of the general process runtime, which is the priority of running on the critical section. The Mars Pathfinder is due to the interruption of the meteorological task running in the critical section of the communication task is preempted to lead to failure, if there is a critical section of the protection of non-interruption, this problem will not occur.

Escort Phenomenon (Lock convoys)

Lock convoys is a performance degradation problem caused by the use of locks in a multithreaded concurrency environment.

When multiple threads with the same priority frequently scramble for the same lock can cause lockconvoys problems, in general, Lockconvoys does not cause application logic to stop like deadlock or livelock, instead, systems or applications that suffer from lock convoys still run forward, but because threads frequently scramble for locks causing excessive thread-environment switching, the system runs much less efficiently, and if there are threads that do not participate in lock scrambling at the same priority level, They can obtain relatively more processor resources, which results in the unfairness of system scheduling.

This article will explain the cause of the lockconvoys problem.

Suppose a group of threads acquires locks frequently (so-called frequent, which acquires locks multiple times during the execution cycle of a time slice), such as in Windows applications where a critical section (criticalsection) is used to protect a shared variable or prevent a piece of code from being re-entered. This is very likely to happen.

Assume that the threadA gets to the lock, then a thread scheduling interrupt occurs, its time slice ran out, so, the system scheduler to the next thread execution, it is advisable to set the threadb obtained the right to execute. Because this lock is threada gets, so when the thread b perform the operation to acquire the lock, the time slice is not exhausted, but the execution must be abandoned. As such, all threads that have equal priority and are competing for this lock are blocked. The scheduler returned to thread a, and soon the ground a released the lock. In the operating system, releasing a lock means that if the thread is waiting for the lock in the kernel, its state can become operational. For example, thread b the fetch operation succeeds. At this point, however, the kernel simply marks the thread b as the owner of the lock, while the thread a continues execution. Soon, the thread a again acquires the lock, because the lock has been marked to thread a had to give up time slices and give control to the scheduler. The scheduler can finally pick up the thread b, handing over the processor's execution to it. Wait until the thread b release the lock, the next thread gets ownership of the lock, and waits until the thread a, thus continuing the next round of scramble. During this time, these threads will always have to abandon execution when they do not execute a full movie. The following figure illustrates the execution of three threads in the scramble for a lock.

Assuming that a thread acquires/releases a lock multiple times during the execution of a full time slice, once the lock is released, it means that, as long as there is a lock contention, it cannot regain its lock in the current time slice allocated to it. Therefore, it can only execute until its next fetch operation. For example, the thread that participates in the competition averages 1/3 time slices to acquire the lock, then the actual execution time of the thread becomes 1/3 time slices. The scheduling granularity of the system becomes the original 1/3 time interval. This causes a 3 times-fold number of thread transitions. From the right half you can see that each thread in a round loop has only 1/3 time slices of opportunity. This results in 3 times times the thread switching.

In addition to causing the scheduling granularity to become smaller,another problem with Lockconvoys is that the scheduler's time allocation is unfair. Suppose another thread x is also running on equal priority, but not participating in lock contention. Thus, in each round of lock competition, thread X has the opportunity to be allocated a complete time slice, so that these competing threads get 1/3 time slices in one round, while non-competing threads can get complete time slices. Of course, you can say that this unfairness is caused by their lock-in, but in terms of time distribution, it is unfair to participate in the competition and not compete with the thread. Illustrates the difference in execution time between threads X and A, B, and C.

As can be seen from the above description, theexistence condition of lockconvoys is that the competing thread acquires the lock frequently, the lock is released by one thread and then its ownership falls into the hand of the other thread. In the operating system, threads of the same priority are dispatched and executed in the order of the FIFO, and the threads competing for the same lock are successively successfully acquired to the lock in order of FIFO. These conditions can be met in modern operating systems, including Windows.

Lock convoys, although not a fatal problem, may also occur in the actual system. Sue Loh shows the lock convoy issue that occurs in Windows CE in her blog post . She also discussed a reasonable solution to alleviate Lockconvoy, requiring that each thread acquire a lock (try) first, andif the attempt is unsuccessful, it will block again.

References:

[1] Sue Loh, Lock convoys and how torecognize them,http://blogs.msdn.com/b/sloh/archive/2005/05/27/ Lock-convoys-and-how-to-recognize-them.aspx, 2005.

[2] Lock convoys,Http://en.wikipedia.org/wiki/Lock_convoy

Linux sync mechanism-basic concepts (deadlock, live lock, starve, priority reversal, escort phenomenon)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More