Synchronization Mechanism in Linux-futex

Source: Internet
Author: User
Synchronization Mechanism in Linux (1) -- futex

Introduction
When compiling the 2.6 kernel, you will see [*] Enable futex support in the compilation options, some materials will tell you "if you do not select this kernel, you may not be able to correctly run the program using glibc". What is futex? What is the relationship with glibc?

1. What is futex?
Futex isFast userspace mutexesThe abbreviation is jointly designed by Hubertus Franke, Matthew Kirkwood, Ingo Molnar and rusty Russell. Several of them are Linux experts. Maybe Ingo Molnar is more familiar to everyone. After all, they are real-time O (1) schedulers and CFS.

Futex is translated in EnglishQuick user space mutex. The design concept is not hard to understand. In traditional Unix systems, System v ipc (Inter Process Communication), such as semaphores, msgqueues, sockets, and flock ()) the inter-process synchronization mechanism is to perform operations on a kernel object. This kernel object is visible to the process to be synchronized and provides shared state information and atomic operations. To synchronize processes, you must use a system call (such as semop () in the kernel. However, studies have found that many synchronization tasks do not compete, that is, a process enters
The time from the mutex to a mutex is usually when no process needs to enter the mutex or request the same synchronization variable. However, in this case, the process also needs to fall into the kernel to see if anyone is competing with it. When it exits, it also needs to go into the kernel to see if there are processes waiting on the same synchronization variable. These unnecessary system calls (or kernel traps) cause a large amount of performance overhead. To solve this problem, futex came into being. futex is a synchronization mechanism that combines user and kernel states. First, the synchronization process shares a piece of memory through MMAP. The futex variable is located in the shared memory and the operation is atomic. When the process tries to enter the mutex or exit the mutex, first, check the futex variable in the shared memory. If no competition occurs, modify the futex variable instead
And then execute the system call. When the futex variable is accessed to inform the process of competition, you must execute a system call to complete the corresponding processing (wait or wake up ). In short, futex checks the user State,(Motivation) if you know that there is no competition, you don't have to fall into the kernel, which greatly improves the efficiency of Low-contention.. Linux supports futex from 2.5.7.

2. futex system call
Futex is a hybrid mechanism between user State and kernel state. Therefore, two parts need to be completed in cooperation. on Linux, sys_futex system calls are provided to support synchronization processing in the case of process competition.
Its prototype and system call number
# Include <Linux/futex. h>
# Include <sys/time. h>
Int futex (int * uaddr, int op, int Val, const struct timespec * timeout, int * uaddr2, int val3 );
# Definely _ nr_futex 240

Although the parameters are a bit long, the first three parameters are commonly used. You can understand the timeouts behind them, and others are often called ignore.
Uaddr is the address of shared memory in user mode, which stores an aligned integer counter.
OP stores the operation type. Five of the definitions are defined. Here I will briefly introduce the two types, and the rest will go to man futex.
Futex_wait: Atomicity check whether the counter value in uaddr is Val. If yes, let the process sleep until futex_wake or timeout (time-out ). That is, the process is mounted to the waiting queue corresponding to the uaddr.
Futex_wake: Wake up a maximum of Val processes waiting on the uaddr.

It can be seen that futex_wait and futex_wake are only used to suspend or wake up processes. Of course, this part of work can only be completed in kernel mode. Some people try to directly use the futex system to implement process synchronization and hope to gain the performance advantages of futex. This is a problem. The futex synchronization mechanism and the futex system call should be distinguished. The futex synchronization mechanism also includes operations in user mode, which we will mention in the next section.

3. futex synchronization mechanism
All futex synchronization operations should start from the user space. First, create a futex synchronization variable, that is, an integer counter located in the shared memory.
When a process attempts to hold a lock or enters the mutex, it performs the "down" operation on futex, that is, the atomicity of the futex synchronization variable minus 1. If the synchronization variable is changed to 0, no competition occurs and the process is executed as usual. If the synchronization variable is a negative number, it means competition occurs. You need to call the futex_wait operation called by the futex system to sleep the current process.
When the process releases the lock or leaves the mutex, it performs the "up" operation on futex, that is, adding 1 to the futex synchronization variable atomically. If the synchronization variable is changed from 0 to 1, no competition occurs and the process is executed as usual. If the synchronization variable is negative before addition, it means competition occurs. You need to call the futex_wake operation called by the futex system to wake up one or more waiting processes.

Here, the atomic addition and subtraction is usually completed using CAS (compare and swap), which is related to the platform. The basic form of CAS is: CAS (ADDR, old, new). When the value stored in ADDR is equal to old, replace it with new. On the X86 platform, there is a dedicated Command to complete it: cmpxchg.

It can be seen that futex starts from the user State and is coordinated by the user State and the core state.

4. The import/thread uses futex for synchronization.
Both processes and threads can use futex for synchronization.
For a thread, the situation is relatively simple. Because the thread shares the virtual memory space, the virtual address can uniquely identify the futex variable, that is, the thread uses the same virtual address to access the futex variable.
For processes, the situation is relatively complex, because processes have independent virtual memory space, and the futex variable is used only when MMAP () allows them to share an address space. The virtual addresses used by each process to access futex can be different, as long as the system knows that all these virtual addresses are mapped to the same physical memory address, and the physical memory address is used to uniquely identify the futex variable.


Summary:
1. Features of The futex variable: 1) located in the shared user space 2) is a 32-bit integer 3) its operations are atomic
2. futex achieves better performance than the traditional synchronization mechanism in the low-contention program.
3. Do not use the futex system directly.
4. The futex synchronization mechanism can be used for inter-process synchronization or inter-thread synchronization.

Thread Synchronization Mechanism in Linux (2) -- in glibc

Synchronization is an unavoidable problem in multi-threaded development in Linux.The POSIX standard defines three thread synchronization mechanisms: mutexes (mutex), condition variables (condition variable), and POSIX semaphores (semaphore). Nptl basically implements POSIX, while glibc uses nptl as its own thread library. Therefore, glibc includes the implementation of these three synchronization mechanisms (of course, other synchronization mechanisms, such as the read/write lock mentioned in apue ).

Examples of commonly used Thread Synchronization Methods in glibc:

Semaphore
Variable definition: sem_t SEM;
Initialization: sem_init (& SEM, 0, 1 );
Enter the lock: sem_wait (& SEM );
Exit and unlock: sem_post (& SEM );

Mutex
Variable definition: pthread_mutex_t mut;
Initialization: pthread_mutex_init (& Mut, null );
Enter the lock: pthread_mutex_lock (& MUT );
Exit unlock: pthread_mutex_unlock (& MUT );

What is the relationship between these functions for synchronization and futex? Let's take a look:
Take semaphores as an example,
When entering the mutex, sem_wait (sem_t * SEM) is executed. The implementation of sem_wait is as follows:
Int sem_wait (sem_t * SEM)
{
Int * futex = (int *) SEM;
If (atomic_decrement_if_positive (futex)> 0)
Return 0;
Int err = lll_futex_wait (futex, 0 );
Return-1;
)
The semantics of atomic_decrement_if_positive () is that if the input parameter is a positive number, the atomic Subtraction is returned immediately. If the semaphores is positive, the semantics of semaphores means that no competition has occurred. If there is no competition, the semaphores will be directly returned after being subtracted by one.

If the input parameter is not a positive number, it means there is competition. Call lll_futex_wait (futex, 0) and lll_futex_wait is a macro:
# Define lll_futex_wait (futex, Val )\
({\
...
_ ASM _ volatile (lll_ebx_load \
Lll_enter_kernel \
Lll_ebx_load \
: "= A" (_ Status )\
: "0" (sys_futex), lll_ebx_reg (futex), "S" (0 ),\
"C" (futex_wait), "D" (_ Val ),\
"I" (offsetof (tcbhead_t, sysinfo ))\
: "Memory ");\
...\
})
We can see that when competition occurs, sem_wait will call the sys_futex System Call and execute futex_wait when val = 0 to sleep the current thread.

From this example, we can see that futex is used in the implementation of semaphores, not only does it mean that it uses the futex System Call (it is not enough to re-declare it again and only use the futex system for calling), but it is built on the futex mechanism, this includes operations in user mode and operations in core mode. In fact, the same is true for other glibc synchronization mechanisms, all of which adopt futex as their basis. Therefore, the Manual of futex said: For most programmers, they do not need to use futexes directly. Instead, they rely on the system library built on futex, such as the nptl thread Library (most programmers will in fact ).
Not be using futexes directly but instead rely on system libraries built on them, such as the nptl pthreads implementation ). Therefore, if you do not enable futex support when compiling the kernel, "you may not be able to correctly run the program using glibc ".

Summary:
1. the thread synchronization methods provided in glibc, such as the well-known mutex and semaphore, are mostly constructed on futex. Except for special cases, you do not need to implement your own futex synchronization primitives.
2. what you need to do seems to be as follows in manual of futex: correct use of the synchronization methods provided by glibc, and in the process of using them, realize that they use the futex mechanism and Linux to complete synchronization operations.

Thread Synchronization Mechanism in Linux (iii) -- Practice

In glibc (nptl), the thread synchronization methods such as mutex and semaphore all use futex as the basis. So what are the actual usage and problems?
Let's take a look at an example of Using semaphore synchronization.

Sem_t sem_a;
Void * task1 ();

Int main (void ){
Int ret = 0;
Pthread_t thrd1;
Sem_init (& sem_a, 0, 1 );
Ret = pthread_create (& thrd1, null, task1, null); // create a subthread
Pthread_join (thrd1, null); // wait until the sub-thread ends.
}

Void * task1 ()
{
Int sval = 0;
Sem_wait (& sem_a); // hold the semaphore
Sleep (5); // do_nothing
Sem_getvalue (& sem_a, & sval );
Printf ("SEM value = % d \ n", sval );
Sem_post (& sem_a); // release the semaphore
}

The program is very simple. We create a thread in the main thread (the thread that executes the main) and join it to wait for its end. In the child thread, hold the semaphore first, then rest for a while, then release the semaphore, and end.
Because only one thread in this Code uses a semaphore, that is, there is no competition between threads. According to the futex theory, because there is no competition, all lock operations will be completed in the user State, instead of executing system calls, the system is stuck in the kernel. We useStraceTo track the system calls during the execution of this program:
...
20533 futex (0xb7db1be8, futex_wait, 20534, null <unfinished...>
20534 futex (0x8049870, futex_wake, 1) = 0
20533 <... futex resumed>) = 0
...
20533 is the ID of the main thread, and 20534 is the ID of its subthread. To our surprise, this program still has two futex system calls. Let's analyze the causes of these two calls.

1.Unexpected "sem_post ()"
20534 futex (0x8049870, futex_wake, 1) = 0
The sub-thread still executes the futex_wake system call, that is, when sem_post (& sem_a);, requests the kernel to wake up a thread waiting on sem_a, And the return value is 0, it indicates that no thread is waiting in sem_a (this is of course, because such a thread is using sem_a). This futex system calls white. This seems to be different from the futex theory. Let's take a look at the implementation of sem_post.
Int sem_post (sem_t * SEM)
{
Int * futex = (int *) SEM;
Int Nr = atomic_increment_val (futex );
Int err = lll_futex_wake (futex, NR );
Return 0;
}
We can see that after glibc adds 1 to the futex atomicity when implementing sem_post, no matter what the futex value is, it executes lll_futex_wake (), that is, the futex (futex_wake) System Call.
In the second part (see the previous article), we have analyzed the implementation of sem_wait. When there is no competition, there will be no futex calls. Now it seems like this, however, in sem_post, sys_futex () is called no matter whether there is competition or not. Why? I think it should be understood in combination with the semantics of semaphore.In the semantics of semaphore, sem_wait () means: "To suspend the current process until the value of semaphore is not 0, it will reduce the semaphore Count value atomically. "
We can see that semaphore uses 0 or non-0 to determine the blocked or non-blocking threads. That is, no matter how many threads compete for this lock, As long as semaphore is used, the semaphore value will be 0. In this way,When the thread releases the mutex, executes sem_post (), and releases semaphore, it changes its value from 0 to 1, and does not know whether a thread is blocked on this semaphore, therefore, you have to run futex (uaddr, futex_wake, 1) to wake up a process.On the contrary, when sem_wait (), if semaphore is changed from 1 to 0, there is no competition, so you do not have to execute the futex system call.
. Let's assume that if we leave this Semantics aside and allow the semaphore value to be negative, we can also implement the futex Mechanism During sem_post.

2. "pthread_join ()"
So how is the other futex system called? Because pthread_join ();
In glibc, pthread_join is also implemented using the futex system call. Pthread_join (thrd1, null); corresponds
20533 futex (0xb7db1be8, futex_wait, 20534, null <unfinished...>
The main thread needs to call futex (futex_wait) when the sub-thread (on ID 20534) ends ), set the VaR parameter to the sub-thread number (20534) to be waited for, and then wait for it on a futex variable with the address 0xb7db1be8. When the sub-thread ends, the system will wake up the main thread. So the main thread is
20533 <... futex resumed>) = 0
Resume running.
Note that if the thread to join is finished when pthread_join () is executed, futex () will no longer be called to block the current process.

3. More competition.
We changed the above program slightly:
In the main function:
Int main (void ){
...
Sem_init (& sem_a, 0, 1 );
Ret = pthread_create (& thrd1, null, task1, null );
Ret = pthread_create (& thrd2, null, task1, null );
Ret = pthread_create (& thrd3, null, task1, null );
Ret = pthread_create (& thrd4, null, task1, null );
Pthread_join (thrd1, null );
Pthread_join (thrd2, null );
Pthread_join (thrd3, null );
Pthread_join (thrd4, null );
...
}

In this way, more threads are involved in the competition for sem_a. Let's analyze how many futex system calls will occur in such a program.
1) sem_wait ()
The first thread will not call futex, and other threads will call futex_wait three times because of blocking.
2) sem_post ()
All threads call futex during sem_post, which may cause four futex (futex_wake) calls.
3) pthread_join ()
Do not forget pthread_join (). We join by thread1, thread2, thread3, and thread4, but thread scheduling is random. If thread1 is finally scheduled, only the futex call of thread1 is called. Therefore, the futex call caused by pthread_join () is between 1 and 4 times. (Although not inevitable, four times are more common)

Therefore, this program may cause at most 3 + 4 + 4 = 11 futex system calls. We used strace tracking to verify our ideas.
19710 futex (0xb7df1be8, futex_wait, 19711, null <unfinished...>
19712 futex (0x8049910, futex_wait, 0, null <unfinished...>
19713 futex (0x8049910, futex_wait, 0, null <unfinished...>
19714 futex (0x8049910, futex_wait, 0, null <unfinished...>
19711 futex (0x8049910, futex_wake, 1 <unfinished...>
19710 futex (0xb75f0be8, futex_wait, 19712, null <unfinished...>
19712 futex (0x8049910, futex_wake, 1 <unfinished...>
19710 futex (0xb6defbe8, futex_wait, 19713, null <unfinished...>
19713 futex (0x8049910, futex_wake, 1 <unfinished...>
19710 futex (0xb65eebe8, futex_wait, 19714, null <unfinished...>
19714 futex (0x8049910, futex_wake, 1) = 0
(19710 is the main thread, 19711,19712, 19713,19714 is the four subthreads)

4. More questions
Is it over now?If we replace semaphore with mutex, try. You will find that when there is no competition from the beginning and end, mutex will fully comply with the futex mechanism, and neither lock nor unlock will call the futex system call. When there is competition, the first pthread_mutex_lock will not call the futex call, and it looks normal. However, during the last pthread_mutex_unlock, although no thread is waiting for mutex, futex (futex_wake) can still be called ). Why? Welcome to the discussion !!!

Summary:
1. Although semaphore, mutex and other Synchronization Methods are built on the futex synchronization mechanism. However, due to its semantic limitations, it is not fully implemented according to the initial design of futex.
2. pthread_join () and other functions are also implemented by calling futex.
3. Different Synchronization Methods have different semantics and different performance features, which are suitable for different scenarios. We need to know their commonalities and their differences during use. In this way, we can better understand multi-threaded scenarios and write high-quality multi-threaded programs.

Reprinted address:

Http://blog.csdn.net/Javadino/archive/2008/09/06/2891385.aspx

Http://blog.csdn.net/Javadino/archive/2008/09/06/2891388.aspx

Http://blog.csdn.net/Javadino/archive/2008/09/06/2891399.aspx

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.