"Turn" deeply explores the difference between mutex and semaphore (bottom)

Source: Internet
Author: User

Original URL: http://blog.chinaunix.net/uid-23769728-id-3173282.html

This blog post is very long, although this is the next article, but it is not finished, benchmark aspects of things are in progress, there are some problems I myself are talking with others ... So I think in addition to the "concluding" article (in fact, this article I basically finished, but not the final conclude, some problems still need further discussion), there should be a "real performance test" chapter ... The ideal is good, but the real time is always not enough to drip. So, I would like to say, if this blog last rotten end, please netizens-bury me, buried in the spring ...
======================================================================

The call chain for Mutex_lock's slow path is:
Mutex_lock---__mutex_lock_slowpath--and __mutex_lock_common, all the performance-optimized code is concentrated in __mutex_lock_common, which is a bit long, Wait, we may as well dismember to look slowly ...

The basic principle of mutex_lock in slow path is that the process of owning a mutex lock is always released in the shortest possible time. Based on this, Mutex_lock's slow path section tries to avoid going to sleep, trying to wait for the process that owns the mutex to release it through a brief spin. The main structure of the __mutex_lock_common is two for loops, in which the judgment logic of whether the lock can be acquired again is added.


/*
* Lock a mutex (possibly interruptible), Slowpath:
*/
static inline int __sched
__mutex_lock_common (struct mutex *lock, long state, unsigned int subclass,
struct Lockdep_map *nest_lock, unsigned long IP)
{
struct Task_struct *task = current;
First, we can reach here, indicating that the current process has failed in a war for mutexes, where the owner of the mutex mutex is the Task_struct object that successfully acquired the lock ...
The kernel preemption is turned off, because in subsequent processing we do not want to be preempted by another process because the 1st priority thing is to get a lock on the current process. The high-priority process occurs at this time and does not preempt the current process, because the current process is either sleeping (then not being preempted), or acquiring a lock and reopening the preemption again.
Preempt_disable ();
Kernel configuration option, currently the kernel default configuration
#ifdef Config_mutex_spin_on_owner
The following is an important note in the source code, and the core point is the realistic basis on which the mutex optimization is based: a process that obtains a mutex is extremely likely to release it in a very short period of time. So unlike the implementation of Semaphore, when a mutex is not acquired for the first time, if it finds that the process that owns the lock is running on a different processor and that there is no other waiting person (that is, only the current process is waiting for the lock), the current process tries to spin (that is, busy waiting). This is a huge probability that you can eliminate the overhead of two process switching, and regain access to a previously competitive but unsuccessful mutex----performance improvement. Theoretically, in order to obtain the spin of the lock, its time should not exceed two times the process switching time overhead, otherwise the optimization here will have no meaning.

Here is the first for loop, with two break in the loop, and a direct return of 0
for (;;) {
struct Task_struct *owner;

/*
* If there ' s an owner, wait for it to either
* Release the lock or go to sleep.
*/
Owner = access_once (Lock->owner);
if (owner &&!mutex_spin_on_owner (lock, owner))
Break

if (Atomic_cmpxchg (&lock->count, 1, 0) = = 1) {
Lock_acquired (&lock->dep_map, IP);
Mutex_set_owner (lock);
Preempt_enable ();
return 0;
}

/*
* When there's no owner, we might have preempted between the
* Owner acquiring the lock and setting the Owner field. If
* we ' re an RT task that would live-lock because we won ' t let
* The owner complete.
*/
if (!owner && (need_resched () | | rt_task (Task)))
Break

/*
* The Cpu_relax () call is a compiler barrier which forces
* Everything in this loop to be re-loaded. We don ' t need
* Memory barriers as we ' ll eventually observe the right
* Values at the cost of a few extra spins.
*/
Arch_mutex_cpu_relax ();
}//the first for loop
First, the first break,if condition, the owner is basically satisfied, if owner=null that the current lock process may have freed the lock, so you can immediately exit the loop. The Mutex_spin_on_owner () in the IF condition is a very interesting function that ensures that the while loop can be break in a very short period of time by means of per-jiffies. This code design is so imaginative that it allows the function to jump out of the while loop at the jiffies level through if (need_resched ()), but the performance boost for code optimization is reflected in owner_running because the process that owns the lock is in a very short time ( Must be lower than jiffies this level, may be in the US level or even lower) release the lock, if through if (need_resched ()) exit the loop, it basically illustrates the failure of this optimization, in fact, also led to the performance of the regression (because even in the hz=1000 system, The jiffies level is also very coarse, the modern processor's process switching overhead may only be in a few us or dozens of us, if a process to get a mutex lock spin a few jiffies, then this is overtaxing Tewoo, if this time let the current process sleep, Then other processes can get CPU resources, and 1 jiffies can be worth a few hundred process switching time overhead, there is no need to care about two process switching time overhead. But IBM's Paul students think that if you let a process run a few jiffies in the case of a mutex lock and then release lock, this could be a bug. I don't think it's right that mutex lock is different from spin lock and should not be limited to what the code execution time between Mutex_lock and Mutex_unlock is. The code pays close attention to the process running with the lock through the while loop in Mutex_spin_on_owner (), and once it jumps out of the while, it indicates that the current process has released the lock (through owner_running), or that the current lock process is running long enough ( May be several jiffies), the last to return to check Lock->owner, if it is null, the source notes are also very clear, "which is a sign for heavy contention", before the current process before the start, Lock has been a cross-knife to win love, in this case it is best to sleep, or spin time is enough to be worth the cost of a process switch ...

The middle return 0 is a very good situation, the current process spin, the owner of the lock has released the lock, the simplest, two times to obtain a successful lock and return directly.

The next break is the current process in the other side preemption to the lock before the time to set the owner to seize it, or the current process is a real-time process, at this time need to enter the second half of processing.

Before the second for loop, it was obvious from the code that the designer had made the final preparations for the current process to enter sleep (if the code went into the second for loop, it actually meant the failure of this optimization, and from a performance standpoint, the performance on this path certainly didn't semaphore high, At least there is no advantage, because you before after all spin a bit, and finally sleep, but semaphore is not spin, the first time did not get the lock of the words directly to sleep), but it before entering the second for loop or to do a atomic_xchg action, The main thing is to handle two break in the first for loop to see if you're lucky enough to get the lock again.

The second for Loop code is basically the same as Semaphore's slow path implementation, so we see the optimization of the mutex centered in the first for loop, and there is a great chance that the lock will be re-acquired there.

We see that optimizations for mutexes follow the general principle of code optimization, which is to centrally optimize the hot-spot (extended to high probability spot) that occurs throughout the execution of the code. Because in practice, most of the time, the code between Mutex_lock and Mutex_unlock is shorter, allowing the lock process to release the lock very quickly (thus, from a performance optimization standpoint, this can also be used as a general principle for using mutexes). If the majority of processes with mutexes in the system have a long execution time between mutex_lock and unlock, I believe that using a mutex can degrade system performance relative to the use of semaphore: because of the large probability that the mutex has passed through a spin ( Although this period of time is very short) and eventually into sleep, and semaphore directly into sleep, without the process of spin.

"Turn" deeply explores the difference between mutex and semaphore (bottom)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.