[Kernel synchronization] analysis of Linux kernel synchronization mechanism

Last Update:2016-09-25 Source: Internet

Author: User

Tags mutex semaphore

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Transferred from: http://blog.csdn.net/fzubbsc/article/details/37736683?utm_source=tuicool&utm_medium=referral

It's been a long time since I've been in touch with the concept of synchronization, but it's always been a blur, and without deep learning, there's time to spend studying the Linux kernel Standard tutorial and the "Deep Linux device driver kernel mechanism" section of these two books. When you have just finished reading, we will summarize the relevant contents. In order to figure out what's going on, you must understand the following three questions:

What is mutex and synchronization?
Why do I need a synchronization mechanism?
What methods does the Linux kernel provide to implement mutual exclusion and synchronization mechanisms?

1, what is mutual exclusion and synchronization? (Popular understanding)

Mutual exclusion and synchronization mechanisms are mechanisms in a computer system that control the access of a process to certain specific resources.
Synchronization refers to the mechanism used to control the access of certain system resources by multiple processes in accordance with certain rules or sequences.
Mutual exclusion refers to the mechanism used to control that certain system resources can only allow one process access at any time. Mutual exclusion is a special case in the synchronization mechanism.
The synchronization mechanism is an important mechanism for the Linux operating system to run efficiently and stably.

2, why does Linux need synchronization mechanism?

When the operating system introduces the process concept and the process becomes the scheduling entity, the system has the ability to execute multiple processes concurrently, but it also leads to the competition and sharing of resources among the various processes in the system. In addition, these kernel execution paths (processes) run in a staggered manner due to interrupts, the introduction of the exception mechanism, and the kernel-state preemption. For the kernel paths performed by these interleaved paths, if the necessary synchronization measures are not taken, some key data structures will be interleaved and modified, resulting in inconsistencies in the state of these data structures, resulting in a system crash. Therefore, in order to ensure that the system runs efficiently and stably, Linux must adopt a synchronization mechanism.

3. What synchronization mechanism does the Linux kernel provide?

Before learning the Linux kernel synchronization mechanism, you need to understand the following prerequisites: (Critical resources and concurrency sources)
In a Linux system, the snippets of code that we access to shared resources are called critical sections. The reason why multiple processes are causing access to the same shared resource is called a concurrency source.

The main sources of concurrency under Linux systems are:

Interrupt handling: For example, when a process is interrupted while accessing a critical resource, it then enters the interrupt handler and, if it is in the interrupt handler, accesses the critical resource. Although not strictly concurrency, it also creates a race for the resource.
Kernel preemption: For example, a kernel preemption occurs when a process accesses a critical resource and then enters a high-priority process that, if the process also accesses the same critical resource, causes concurrency between the process and the process.
Multiprocessor Concurrency: Between processes and processes on multiprocessor systems is strictly concurrency, and each processor can schedule a process to run on its own, with multiple processes running concurrently at the same time.

As mentioned earlier, the purpose of the synchronization mechanism is to avoid concurrent concurrent access to the same critical resource by multiple processes.

Linux kernel synchronization mechanism:

(1) Disable interrupt (single processor non-preemptive system)

As can be known from the front, for a single-processor non-preemptive system, the system and originates mainly interrupt processing. Therefore, in the case of critical resource access, disabling/enabling interrupts can achieve the goal of eliminating asynchrony and originating. Two macros local_irq_enable and local_irq_disable are available in the Linux system to enable and disable interrupts. In a Linux system, when the two macros are used to switch interrupts to protect, make sure that the code execution time between them is not too long, otherwise it will affect the performance of the system. (Cannot respond to external interrupts in a timely manner)

(2) Spin lock

application background: The spin lock was originally designed to provide protection for shared data in multiprocessor systems.

Spin lock Design idea : A global variable v is set between multiple processors, indicating a lock. and defines the lock state when v=1, and v=0 when it is unlocked. The spin-lock synchronization mechanism is designed for multiprocessor and is a busy mechanism. The spin lock mechanism allows only one execution path to hold a spin lock. If the code on processor a enters the critical section, the value of V is read first. If the v!=0 description is locked, indicating that the code of the other processor is accessing the shared data, then processor a enters a busy state (spin), and if v=0 indicates that no code on the other processor is currently entering the critical section, then processor a can access the critical resource. The V is then set to 1 and then to the critical section, and the V is set to 0 when the access is complete and leaves the critical section.

Note: It is important to make sure that processor a "reads V, half the value of V and Updates V" is an atomic operation. The so-called atomic operation means that once execution is started, it cannot be interrupted until the end of execution.

Classification of spin Locks:

2.1, ordinary spin lock

The normal spin lock is represented by the data structure spinlock_t, which is defined in the file src/include/linux/spinlock_types.h. Defined as follows:

typedef struct {raw_spinklock_t raw_lock;

#ifdefined (config_preempt) && defined (CONFIG_SMP)

unsigned int break_lock;

#endif

} spinlock_t;

Member Raw_lock: This member variable is the core of the spin lock data type, which is essentially a variable of type volatileunsigned after it is expanded. The specific locking process is closely related to it, and the variable depends on the kernel option CONFIG_SMP. (Multi-symmetric processors are supported)

Member Break_lock: dependent on kernel options CONFIG_SMP and config_preempt (whether kernel preemption is supported), this member variable is used to indicate whether the current spin lock is being competed and accessed by multiple kernel execution paths simultaneously.

Under a single processor system: When CONFIG_SMP is not selected, the variable type raw_spinlock_t degenerate into an empty struct. The corresponding interface functions have also been degraded. The corresponding lock function spin_lock () and Unlock function spin_unlock () degenerate to only complete the forbidden kernel state preemption, enable kernel state preemption.

Under Multiprocessor systems: When CONFIG_SMP is selected, the data type of the core variable raw_lock raw_lock_t defined in the file Src/include/asm-i386/spinlock_types.h as follows:

typedef struct {volatileunsigned int slock;} raw_spinklock_t;

As you can see from the definition, the data structure defines a kernel variable for counting work. When the value of the member variable Slock in the structure is 1 o'clock, the spin lock is in a non-locked state and can be used. Otherwise, the representation is in a locked state and cannot be used.

interface functions for normal spin locks:

Spin_lock_init (Lock)//DECLARE spin lock Yes, initialize to locked state

Spin_lock (Lock)//Lock the spin lock, return successfully, otherwise the loop waits for the spin lock to become idle

Spin_unlock (Lock)//release spin lock, reset to unlocked status

Spin_is_locked (lock)//Determines whether the current lock is locked. If yes, return 1.

Spin_trylock (Lock)//Attempt to lock the spin lock lock, returns 0 if unsuccessful, otherwise returns 1

Spin_unlock_wait (lock)//loop wait until the spin lock lock becomes available.

Spin_can_lock (lock)//Determines whether the spin lock is idle.

General Spin lock Summary: Spin lock is designed for multiprocessor systems. When the system is a single processor system, the locking and unlocking process of the spin lock is divided into non-degenerate to prohibit the kernel state preemption and enable the kernel state to preempt. In multiprocessor systems, when a spin lock is locked, it is necessary to first disable the kernel preemption, then attempt to lock the spin lock, perform a dead loop while the lock fails and wait for the spin lock to be released, and when unlocking a spin lock, first release the current spin lock and then enable the kernel state to preempt.

2.2. Spin lock variants

The previous discussion of Spin_lock is a good solution to the concurrency problem between multiple processors. However, consider the following scenario: The current process A on the processor operates on a global linked list g_list, so Spin_lock acquires the lock before the operation and then enters the critical section. If an external hardware interrupt occurs on the processor where process a resides in the critical section code, then the system must pause the execution of the current process A into the interrupt handler. If the interrupt handler also operates G_list, because it is a shared resource, it must obtain a lock before the operation can be accessed. Therefore, when the interrupt handler attempts to call Spin_lock to acquire the lock, the interrupt handler will enter a busy state (spin) because the lock has already been held by process a. There is a big problem: The interrupt program cannot be returned because it cannot get a lock, the status of a busy (spin) state is not able to return, and because the interrupt handler cannot return, process A is in a state that is not finished and does not release the lock. Thus this led to the deadlock of the system. That is, Spin_lock has a flaw in the presence of an interrupt source, so it introduces its variants.

SPIN_LOCK_IRQ (Lock)

SPIN_UNLOCK_IRQ (Lock)

Compared to the previous normal spin lock, it adds the ability to disable interrupts before locking, enabling interrupts after unlocking.

2.3. Read-write spin lock Rwlock

Application background: The function of the normal spin lock Spin_lock class is not subdivided in the critical section when it enters the critical section. The lock operation is performed whenever the shared resource is accessed. But sometimes, for example, some critical section code simply reads the shared data and does not overwrite it, and if the Spin_lock () function is used, it means that only one process can read the shared data at any given time. If the system has a large number of read operations on these shared resources, it is clear that Spin_lock will degrade the performance of the system. Therefore, the concept of read-write spin lock Rwlock is proposed. Compared with the normal spin lock, the read-write spin lock allows multiple reader processes to enter the critical section at the same time, interleaving access to the same critical resource, improving the system's concurrency and increasing the throughput of the system.

The read-write spin lock has a data structure rwlock_t to represent. defined in the/spinlock_types.h.

interface functions for reading and writing spin locks:

Define_rwlock (Lock)//Declare read-write spin lock lock and initialize to unlocked state

Write_lock (lock)//write lock, if successful return, otherwise loop wait

Write_unlock (lock)//Unlock for write mode, reset to unlocked state

Read_lock (Lock)//Lock in read mode, return if successful, otherwise loop wait

Read_unlock (lock)//unlock in read mode, reset to unlocked state

How read and write spin locks work:

For read-write spin lock rwlock, it allows any number of readers to enter the critical section at the same time, but the writer must have mutually exclusive access. For a process to be read, it must first check if there is a process being written, and if so, spin (busy, etc.), otherwise get the lock. For a process to be written, it must first check if there is a process reading or writing, and if so, spin (busy, etc.) to obtain the lock. The application rules for read-write spin locks are as follows:

(1) If a process is currently being written, other processes cannot read or write.

(2) If a process is currently being read, then other programs can read, but not write.

2.4. Sequential spin lock Seqlock

Application background: Sequential spin lock is mainly used to solve the spin lock synchronization mechanism, in the case of a large number of reader processes, the write process due to long time unable to hold the lock and starve to death, the main idea is: to write the process to improve the priority, when the write lock request appears, immediately meet the write lock request, Regardless of whether a read process is accessing a critical resource at this time. However, the new write lock request does not, nor can it preempt the write lock of the existing write process.

Sequential lock Design Idea: The reading of a shared data without lock, write when the lock. In order to ensure that the reading process does not cause this shared data to be updated because of the writer's appearance, it is necessary to introduce an shaping variable between the reader and the writer, called the sequential value sequence. The reader reads the sequence before starting the read, reads the value again, and, if inconsistent with the previously read value, indicates that the read operation has failed with the data update. The writer is therefore required to update at the beginning of the write.

The sequential spin lock is represented by the data structure seqlock_t, defined in the Src/include/linux/seqlcok.h

Sequential spin Lock Access interface functions:

Seqlock_init (Seqlock)//Initialize to unlocked state

Read_seqbgin (), read_seqretry ()//Ensure data consistency

Write_seqlock (Lock)//attempt to lock sequential lock with write lock

Write_sequnlock (Lock)//unlocks the write lock on the sequential lock and resets it to an unlocked state.

Sequential Spin lock works: The write process is not blocked by the read process, that is, the write process to the critical resources of the sequential spin lock protection, immediately lock and complete the update work, without having to wait for the read process to complete the Read access. However, the write process and the write process are still mutually exclusive, if there is a write process in the write operation, the other write process must loop wait until the previous write process freed the spin lock. The sequential spin lock requires that the shared resource being protected does not contain pointers, because the write process may invalidate the pointer and an error will occur if the read process is about to access the pointer. At the same time, if the reader is writing during the read operation, the reader must re-read the data to ensure that the resulting data is complete.

(3) Semaphore mechanism (semaphore)

application background: The spin-lock synchronization mechanism described earlier is a "busy" mechanism, which is effective when the critical resource is locked for a short time. However, when the critical resource is held for a long time or is uncertain, the busy mechanism wastes a lot of valuable processor time. In this case, the Linux kernel provides a semaphore mechanism, this type of synchronization mechanism in the case that the process cannot get to the critical resource, immediately release the right to use the processor, and sleep on the critical resources accessed on the corresponding waiting queue, when the critical resource is released, and then wake up the blocking process on the critical resource. In addition, the semaphore mechanism does not disable kernel preemption, so the process of holding semaphores can be preempted, which means that the semaphore mechanism does not have a negative impact on the system's responsiveness and real-time capability.

Semaphore Design Idea: In addition to initialization, semaphores can only be accessed by two atomic operations P () and V (), also known as Down () and up (). The Down () atomic operation is requested to obtain a semaphore by reducing the semaphore counter by 1. If the result of the operation is 0 or greater than 0, to obtain a semaphore lock, the task can enter the critical section. If the result is negative after the operation, the task is placed in the waiting queue, the processor performs other tasks, and when access to the critical resource is complete, the atomic operation up () can be called to release the semaphore, which increases the counter of the semaphore. If the wait queue on the semaphore is not empty, the process that is blocking the semaphore is awakened.

The classification of the semaphore:

3.1, ordinary signal volume

The common semaphore is represented by the data structure struct semaphore, defined in the src/inlcude/asm-i386/semaphore.h.

The semaphore (semaphore) is defined as follows:

struct semaphore{

spinlock_t lock; Spin lock for atomic operation on Count

unsigned int count; Indicates the number of execution paths allowed to enter the critical section through the semaphore

struct List_head wait_list; A process for managing sleep on this semaphore

};

interface functions for common semaphores:

Sema_init (Sem,val)//Initialize semaphore counter with a value of Val

Int_mutex (SEM)//initialization semaphore is a mutually exclusive semaphore

Down (SEM)//Lock semaphore, if unsuccessful, sleep on waiting queue

Up (SEM)//release semaphore and wake up the process on the waiting queue

Down operation: In the Linux kernel, there are several types of down operations for semaphores:

void down (struct semaphore *sem); Non-disruptive

int down_interruptible (struct semaphore *sem);//Can be interrupted

int down_killable (struct semaphore *sem);//The process of sleep can be awakened by a fatal signal and interrupt the operation of acquiring semaphores.

int Down_trylock (struct semaphore *sem);//attempt to get semaphore, if not available, return 1 without sleep. A return of 0 means that the semaphore is acquired.

int down_timeout (struct semaphore *sem,long jiffies);//indicates that the sleep time is limited, and if the semaphore is still not available at the time specified by Jiffies, an error code will be returned.

Of the above four functions, the most frequent use of the driver is the down_interruptible function

Up operation: The Linux kernel provides only one up function

void up (struct semaphore *sem)

Lock processing Process: The lock process is done by the function down (), which is responsible for testing the state of the semaphore, in case the semaphore is available, to obtain the right to use the semaphore, otherwise the current process is inserted into the current semaphore corresponding to the waiting queue. The function call relationship is as follows: Down ()->__down_failed ()->__down. The function is described as follows:

The Down () function is introduced: This function is used to lock the signal volume SEM, in the lock success is to obtain the right to use the signal is, directly exit, otherwise, call function __down_failed () sleep to the signal volume of the SEM waiting queue. __down () function: This function is called when the lock fails, it is responsible for inserting the process into the wait queue of the semaphore SEM, and then invoking the scheduler to release the processor's right to use.

Unlocking process: The unlock procedure for normal semaphores is done by the function up (), which is responsible for increasing the value of the signal counter count by 1, indicating that the semaphore is released, and wakes up the sleep process in the waiting queue when there is a process blocking the semaphore.

3.2 Read and write semaphores (Rwsem)

Application background: To improve kernel concurrency, the kernel provides read-in semaphores and writer semaphores. Their concepts and implementation mechanisms are similar to read-write spin locks.

How it works: This semaphore mechanism allows all read processes to access the critical resources of semaphore protection at the same time. When the process attempts to lock the read-write semaphore is unsuccessful, these processes are inserted into a first-in-one-out queue; When a process accesses the critical resource, the process is responsible for waking up the process in that queue by a certain rule.

Wake-up rule: Wake up the first-in-the-first-out queue squadron, in the case of the wake process as the write process, no longer wakes up other processes, in the wake process as a read process, wake up the other read process until a write process (the write process is not awakened)

Read and write semaphores are defined as follows:

STURCT rw_semaphore{

__S32 activity; Used to represent the number of readers or written persons

spinlock_t Wait_lock;

struct List_head wait_list;

};

Corresponding interface functions for reading and writing semaphores

Reader up, down operation function:

void Up_read (STURCT rw_semaphore *sem);

void __sched Down_read (sturct rw_semaphore *sem);

Int Down_read_trylock (sturct rw_semaphore *sem);

Writer up, down operation function:

void Up_write (STURCT rw_semaphore *sem);

void __sched down_write (sturct rw_semaphore *sem);

int Down_write_trylock (STURCT rw_semaphore *sem);

3.3. Mutual exclusion Signal Volume

In a Linux system, a common use of semaphores is to implement mutually exclusive mechanisms, in which case the semaphore count value is 1, which means that only one process is allowed to enter the critical section at any given time. To this end, the Linux kernel source provides a macro Declare_mutex, specifically for this purpose, the semaphore definition and initialization

#define DECLARE_MUTEX (name) \

Structsemaphore Name=__semaphore_initializer (name,1)

(4) Mutex mutex

The Linux kernel redefined a new data structure struct mutex for count=1 semaphores, generally referred to as mutexes. Depending on the usage scenario, the kernel optimizes and expands the down and up operations for semaphores on the struct mutex, specifically for this new data type.

(5) RCU

RCU Concept: The RCU full name is read-copy-update (read/write-copy-update), which is a lock-free synchronization mechanism provided in the Linux kernel. RCU and read-write spin lock Rwlock discussed above, read and write Semaphore Rwsem, sequential lock, it also applies to the reader, writer coexistence system. However, the difference is that the read and write operations in RCU do not have to consider the mutual exclusion between the two. However, the mutex between the writers is still to be considered.

RCU principle: Simply put, the reader and the writer to access the shared data in a pointer p, the reader through p to access the data, and the reader by modifying p to update the data. To achieve lock-free, read and write both sides must abide by certain rules.

Operation of the Reader (RCU critical section)

For the reader, if you want to access the shared data. The first thing to call the Rcu_read_lock and Rcu_read_unlock functions is to construct a critical section of the reader side (Read-side critical), and then get a pointer to the shared data area in the critical section, where the actual read operation is a reference to the pointer.

The rules the reader will follow are: (1) The reference to the pointer must be done in the critical section, and no reference to the pointer should occur after leaving the critical section. (2) The code within the critical section should not cause any form of process switching (typically shutting down kernel preemption, which can be left off).

Action by writer

For writers, to write data, first reassign a new memory space as a shared data area. The data in the old data area is then copied to the new data area, the new data area is modified as needed, and the pointer to the old data area is replaced with the new data area pointer. After the writer replaces the pointer in the shared area, the space where the old pointer points to the shared data area is not immediately released (explained later). The writer needs to work with the kernel to release the memory space that the old pointer points to when it determines that all references to the old pointer end. To do this, the writer's action is to call the CALL_RCU function to register a callback function with the kernel, which invokes the callback function when it determines that all references to the old pointer end, and the function of the callback function is to free up the memory space pointed to by the old pointer. The CALL_RCU function is prototyped as follows:

Void call_rcu (struct rcu_head *head,void (*func) (struct rcu_head *rcu));

The kernel determines that no reader references to old pointers are based on the following conditions: At least one process switch has occurred on all processors in the system. Because all possible inconsistent references to a shared data area pointer must occur in the reader's RCU critical section, and the critical section must not have a process switch. So if a process switch switch occurs on the CPU, all references to the old pointer end, and then the reader enters the RCU critical section to see the new pointer.

The reason why the old pointer cannot be released immediately: This is because love in the system may have a reference to the old pointer, the main occurrence in the following two cases: (1) One is in the single processor range, assuming that the reader after entering the RCU critical section, after the pointer has just acquired a shared area, an interrupt occurred, If the writer happens to be the behavior in the interrupt handler, the old pointer continues to be referenced when the interrupt process resumes execution in the RCU critical section after the interrupt is returned. (2) Another possibility is that in multiprocessor systems, when a reader on processor a enters the RCU critical section and obtains a pointer in the shared data area, a writer on processor B updates a pointer to the shared data area when it is not available to reference the pointer, so that the reader on processor A is also hungry to refer to the old pointer.

RCU features: As can be known from the previous discussion, RCU is essentially an optimization of the reader and writer Spin lock Rwlock. RCU can allow multiple readers and writers to work simultaneously. However, the RCU of the writer is much more expensive to operate. Generally less used in drivers.

In order to use RCU in code, all RCU related operations should use the RCU API functions provided by the kernel to ensure the proper use of the RCU mechanism, which focuses on pointers and linked list operations.

Here is a typical usage example of a RCU:

[CPP]View PlainCopy

<span style="FONT-SIZE:14PX;" >//<span style= "FONT-SIZE:14PX;" > Suppose struct shared_data is a protected data that is shared between the reader and the writer
Struct shared_data{
Int A;
Int b;
Struct Rcu_head RCU;
};
Code on the reader's side
Static void Demo_reader (struct shared_data *ptr)
{
Struct Shared_data *p=null;
Rcu_read_lock ();
P=rcu_dereference (PTR);
If (P)
DO_SOMETHING_WITHP (P);
Rcu_read_unlock ();
}
Code on the writer's side
Static void demo_del_oldptr (struct rcu_head *rh) //callback function
{
struct Shared_data *p=container_of (RH,struct SHARED_DATA,RCU);
Kfree (P);
}
Static void Demo_writer (struct shared_data *ptr)
{
Struct shared_data *new_ptr=kmalloc (...);
...
new_ptr->a=10;
new_ptr->b=20;
Rcu_assign_pointer (PTR,NEW_PTR); //update old pointer with new pointer
Call_rcu (PTR->RCU,DEMO_DEL_OLDPTR); Registers a callback function with the kernel to remove the memory space that the old pointer points to
}
</span></span>

(6) Complete interface completion

The Linux kernel also provides a synchronization mechanism called "Completion interface Completion", which is used to synchronize multiple execution paths, which is the order in which multiple execution paths are executed. It will not unfold here.

[Kernel synchronization] analysis of Linux kernel synchronization mechanism

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More