Linux Kernel synchronization mechanism

Source: Internet
Author: User

I. Introduction

In modern operating systems, Multiple kernel execution streams may be executed at the same time, therefore, like multi-process and multi-thread programming, the kernel also needs some Synchronous Machine mechanisms to synchronize access to shared data from each execution unit. In a multi-processor system, synchronization mechanisms are required to synchronize the access of execution units on different processors to shared data.

In the mainstream Linux kernel, there are almost all synchronization mechanisms available in modern operating systems, including atomic operations, semaphores semaphore), and read/write semaphores rw_semaphore), spinlock, BKL (Big Kernel Lock), rwlock, brlock only included in the 2.4 Kernel), RCU only included in the 2.6 Kernel), and seqlock only included in the 2.6 Kernel ).

Ii. Atomic operations

The so-called atomic operation means that the operation will never be interrupted by any other task or event before execution is completed. That is to say, its smallest execution unit cannot have a smaller execution unit than it, so here the atom actually uses the concept of material particles in physics.

Atomic operations require hardware support, so they are architecture-related. Their APIs and atomic types are defined in the kernel source code tree include/asm/atomic. in the H file, they are all implemented in assembly language, because the C language cannot implement such operations.

Atomic operations are mainly used to count resources. Many reference counting (refcnt) is implemented through atomic operations. The atomic type is defined as follows:

Typedef struct {volatile int counter;} atomic_t;

The volatile modifier field tells gcc not to optimize the data of this type, and its access is to the memory instead of the register.

Atomic operation APIs include:

Atomic_read (atomic_t * v );

This function performs atomic read operations on atomic variables. It returns the value of atomic variable v.

Atomic_set (atomic_t * v, int I );

This function sets the v value of the atomic type to I.

Void atomic_add (int I, atomic_t * v );

This function is used to add value to variable v of the atomic type.

Atomic_sub (int I, atomic_t * v );

This function deducts I from variable v of the atomic type.

Int atomic_sub_and_test (int I, atomic_t * v );

This function deducts I from the variable v of the atomic type, and determines whether the result is 0. If it is 0, true is returned. Otherwise, false is returned.

Void atomic_inc (atomic_t * v );

This function increases the value of v to 1.

Void atomic_dec (atomic_t * v );

This function is used to subtract 1 from the v atom variable of the atomic type.

Int atomic_dec_and_test (atomic_t * v );

This function subtract 1 from the v atom variable of the atomic type and determines whether the result is 0. If it is 0, true is returned. Otherwise, false is returned.

Int atomic_inc_and_test (atomic_t * v );

This function increases the value of v to 1 and determines whether the result is 0. If the value is 0, the system returns true. Otherwise, the system returns false.

Int atomic_add_negative (int I, atomic_t * v );

This function increases I on the v atom of the atomic type variable and determines whether the result is negative. If yes, it returns true; otherwise, it returns false.

Int atomic_add_return (int I, atomic_t * v );

This function adds I to the v atom variable of the atomic type and returns a pointer to v.

Int atomic_sub_return (int I, atomic_t * v );

This function deducts I from variable v of the atomic type and returns a pointer to v.

Int atomic_inc_return (atomic_t * v );

This function increases the value of v to 1 and returns a pointer to v.

Int atomic_dec_return (atomic_t * v );

This function reduces the value of v atom by 1 and returns a pointer to v.

Atomic operations are usually used to implement reference counting of resources. In the IP Fragment processing of the TCP/IP protocol stack, reference counting is used, and the fragmentation queue structure struct ipq describes an IP fragment, the refcnt field is the reference counter. Its type is atomic_t. When an IP fragment is created in the ip_frag_create function), use the atomic_set function to set it to 1. When the IP fragment is referenced, use the atomic_inc function to add 1 to the reference count.

When you do not need to reference the IP fragment, you can use the ipq_put function to release the IP Fragment. ipq_put uses the atomic_dec_and_test function to reduce the reference count by 1 and determine whether the reference count is 0, if yes, the IP fragmentation will be released. The ipq_kill function deletes IP fragments from the ipq queue and reduces the reference count of the deleted IP fragments by 1 by using the atomic_dec function ).

Iii. semaphores semaphore)

The Linux kernel semaphore is the same as the user-state System v ipc Mechanism semaphore in terms of concept and principle, but it cannot be used outside the kernel, therefore, it has nothing to do with the IPC Mechanism semaphores of System V.

When creating a semaphore, you need to set an initial value, indicating that there are several tasks that can access the shared resources protected by the semaphore at the same time. When the initial value is 1, it becomes Mutex ), that is, only one task can access the shared resources protected by semaphores.

To access shared resources, a task must first obtain a semaphore. The semaphore acquisition operation will reduce the semaphore value by 1. If the current semaphore value is negative, it indicates that the semaphore cannot be obtained, the task must be suspended in the wait queue of the semaphore to wait for the semaphore to be available. If the current semaphore value is not negative, it means that the semaphore can be obtained, so that the shared resources protected by the semaphore can be accessed immediately.

After a task accesses a shared resource protected by semaphores, it must release the semaphores. By adding the semaphores value to 1, if the semaphores value is not a positive number, it indicates that a task is waiting for the current semaphore, so it also wakes up all tasks waiting for the semaphore.

Semaphore APIs include:

DECLARE_MUTEX (name)

This macro declares a semaphore name and initializes its value to 0, that is, it declares a mutex lock.

DECLARE_MUTEX_LOCKED (name)

This macro declares a mutex lock name, but sets its initial value to 0, that is, the lock is in the locked state when it is created. Therefore, the lock is usually obtained after being released.

Void sema_init (struct semaphore * sem, int val );

This function is used to initialize and set the initial value of the semaphore. It sets the sem value of the semaphore to val.

Void init_MUTEX (struct semaphore * sem );

This function is used to initialize a mutex lock, that is, it sets the semaphores sem value to 1.

Void init_MUTEX_LOCKED (struct semaphore * sem );

This function is also used to initialize a mutex lock, but it sets the semaphores sem value to 0, that is, it is in the locked state at the beginning.

Void down (struct semaphore * sem );

This function is used to obtain semaphores sem, which causes sleep and therefore cannot be used in the interrupt context including IRQ context and softirq context. This function will reduce the sem value by 1. If the semaphores sem value is not negative, it will return directly. Otherwise, the caller will be suspended until other tasks release the semaphores to continue running.

Int down_interruptible (struct semaphore * sem );

This function is similar to the down function. The difference is that the down function is not interrupted by the signal, but down_interruptible can be interrupted by the signal, therefore, this function has a return value to identify whether the signal is normal or interrupted. If 0 is returned, it indicates that the semaphore is returned normally. If the signal is interrupted, the-EINTR is returned.

Int down_trylock (struct semaphore * sem );

This function tries to obtain the semaphores sem. If it can be obtained immediately, it obtains the semaphores and returns 0. Otherwise, it indicates that the semaphores sem cannot be obtained, and the return value is not 0. Therefore, it does not cause the caller to sleep and can be used in the interrupted context.

Void up (struct semaphore * sem );

This function releases the semaphores sem, that is, adding the sem value to 1. If the sem value is not a positive number, it indicates that a task is waiting for the semaphores, so these waiting persons are awakened.

Semaphores are used as mutex locks in most cases. The following uses the console driver system as an example to describe how to use semaphores.

In kernel/printk. c of the kernel source code tree, the macro DECLARE_MUTEX is used to declare a mutex console_sem, which is used to protect the console driver list lele_drivers and synchronize access to the entire console driver system.

The function acquire_console_sem is defined to obtain the le_sem, The release_console_sem is defined to release the le_sem, and the function try_acquire_console_sem is defined to obtain the le_sem. These three functions are actually simple packaging of the down, up, and down_trylock functions.

To access the console_drivers driver list, use acquire_console_sem to protect the console_drivers list. After accessing this list, call release_console_sem to release the semaphores lele_sem.

The console_unblank, console_device, console_stop, console_start, register_console, and unregister_console functions all need to access lele_drivers. Therefore, they both use functions to protect lele_drivers.

Iv. read/write semaphores rw_semaphore)

The read/write semaphores segment visitors, or are readers or writers. Readers can only read and access the shared resources protected by the read/write semaphores while maintaining the read/write semaphores, if a task requires reading and writing, it must be classified as a writer. before accessing shared resources, it must first obtain the writer identity, the writer can downgrade to a reader if he finds that he does not need to write access. The number of readers of a read/write semaphore is unlimited. That is to say, multiple readers can have a read/write semaphore at the same time.

If a read/write semaphore is not owned by the writer and is not waiting for the reader to release the semaphore, any reader can successfully obtain the read/write semaphore. Otherwise, the reader must be suspended until the writer releases the semaphore. If a read/write semaphore is not owned by a reader or writer and is not waiting for the semaphore, a writer can successfully obtain the read/write semaphore. Otherwise, the writer will be suspended until no visitor exists. Therefore, writers are exclusive and dedicated.

Read/write semaphores can be implemented in two ways. One is universal and independent from the hardware architecture. Therefore, you do not need to re-implement the new architecture, but the disadvantage is that the performance is low, the overhead for obtaining and releasing read/write semaphores is high; the other is architecture-related. Therefore, the overhead for obtaining and releasing read/write semaphores is low, but the new architecture needs to be implemented again. During Kernel configuration, you can use the options to control which implementation to use.

APIS related to read/write semaphores include:

DECLARE_RWSEM (name)

This macro declares a read/write semaphore name and initializes it.

Void init_rwsem (struct rw_semaphore * sem );

This function initializes the read/write semaphores sem.

Void down_read (struct rw_semaphore * sem );

The reader calls this function to obtain the Read and Write semaphores sem. This function will cause the caller to sleep and therefore can only be used in the process context.

Int down_read_trylock (struct rw_semaphore * sem );

This function is similar to down_read, but it does not cause the caller to sleep. It tries its best to get the Read and Write semaphores sem. If it can get it immediately, it will get the Read and Write semaphores and return 1. Otherwise, it indicates that it cannot get the semaphores immediately and returns 0. Therefore, it can also be used in the interrupt context.

Void down_write (struct rw_semaphore * sem );

The writer uses this function to obtain the read/write semaphores sem, which also causes the caller to sleep and can only be used in the process context.

Int down_write_trylock (struct rw_semaphore * sem );

This function is similar to down_write, but it does not cause the caller to sleep. This function tries its best to obtain the read/write semaphores. If it can be obtained immediately, it obtains the read/write semaphores and returns 1. Otherwise, it indicates that it cannot be obtained immediately and returns 0. It can be used in the interrupt context.

Void up_read (struct rw_semaphore * sem );

The reader uses this function to release the read/write semaphores sem. It is used in combination with down_read or down_read_trylock. If down_read_trylock returns 0, you do not need to call up_read to release the read/write semaphores, because the semaphores are not obtained at all.

Void up_write (struct rw_semaphore * sem );

The writer calls this function to release the semaphores sem. It is used in combination with down_write or down_write_trylock. If down_write_trylock returns 0, you do not need to call up_write because 0 indicates that the read/write semaphore is not obtained.

Void downgrade_write (struct rw_semaphore * sem );

This function is used to downgrade the writer to a reader, which is sometimes necessary. Because the writer is exclusive, no reader or writer can access the shared resources protected by the read/write semaphores when the writer maintains the read/write semaphores, for those who do not need to write access under the current conditions, the downgrading will enable the readers waiting for access to access immediately, thus increasing concurrency and improving efficiency.

The read/write semaphores are suitable for reading and writing less data. In the Linux kernel, the read/write semaphores are used to protect access to the memory image description structure of processes.

In Linux, each process is described by a structure of task_t or struct task_struct. The field mm of the Structure of struct mm_struct describes the memory image of the process, in particular, the mmap field in the mm_struct structure maintains the memory block list of the entire process. This list will be greatly exploited or modified during the process's survival.

Therefore, the mm_struct structure has a field mmap_sem to protect mmap access. mmap_sem is a read/write semaphore, and there are many interfaces for process memory usage in the proc file system, they can be used to view the memory usage of a process. The free, ps, and top commands obtain the memory usage information through proc, the proc interface uses down_read and up_read to read the mmap information of the process.

When a process dynamically allocates or releases memory, you need to modify mmap to reflect the memory image allocated or released, therefore, the dynamic memory allocation or release operation requires obtaining the read/write semaphore mmap_sem as the writer to update mmap. The system calls brk and munmap and uses down_write and up_write to protect access to mmap.

5. spin lock)

The spin lock is similar to the mutex lock, but the spin lock does not cause the caller to sleep. If the spin lock has been maintained by other execution units, the caller always loops there to see if the lock owner has released the lock. Therefore, the word "Spin" is named.

Because spin locks are usually kept for a very short period of time, it is necessary to choose spin instead of sleep. The efficiency of spin locks is much higher than that of mutex locks.

Semaphores and read/write semaphores are suitable for long periods of time. They can cause the caller to sleep, so the _ trylock variant can only be used in the context of the process ), the spin lock can be used in any context when the holding time is very short.

If the protected shared resource is accessed only in the context of the process, it is very suitable to use semaphores to protect the shared resource. If the access time to the shared resource is very short, the spin lock can also be used. However, if the protected shared resources need to be accessed in the interrupt context, including the bottom half, namely the Interrupt Processing handle and the top half, that is, the Soft Interrupt), the spin lock must be used.

During the spin lock holding period, the preemption fails, while the semaphore and read/write semaphores can be preemptible. The spin lock is required only when the kernel can be preemptible or SMP. In a single CPU and kernel that cannot be preemptible, all operations of the spin lock are null.

Like a mutex lock, an execution unit must first obtain a lock to access the shared resources protected by the spin lock. After accessing the shared resources, the lock must be released. If no execution unit keeps the lock when obtaining the spin lock, the lock will be obtained immediately. If the lock already has the lock when obtaining the spin lock, the get lock operation will spin there, until the lock is released by the holder of the spin lock.

No matter whether it is a mutex lock or a spin lock, there can be at most one lock at any time. That is to say, at most one execution unit can get the lock at any time.

The spin lock APIs include:

Spin_lock_init (x)

This macro is used to initialize the spin lock x. The spin lock must be initialized before it is actually used. This macro is used for dynamic initialization.

DEFINE_SPINLOCK (x)

The macro declares a spin lock x and initializes it. The Macro was defined for the first time in 2.6.11 and is not in the previous kernel.

SPIN_LOCK_UNLOCKED

This macro is used to initialize a spin lock statically.

DEFINE_SPINLOCK (x) is equivalent to spinlock_t x = SPIN_LOCK_UNLOCKEDspin_is_locked (x)

This macro is used to determine whether the spin lock x has been kept by an execution unit.) If yes, true is returned. Otherwise, false is returned.

Spin_unlock_wait (x)

The macro is used to wait for the spin lock x to become not maintained by any execution unit. If no execution unit keeps the spin lock, the macro returns immediately; otherwise, the loop will be there, the spin lock is released until it is retained.

Spin_trylock (lock)

The macro tries its best to get the spin lock. If it can get the lock immediately, it gets the lock and returns the true value. Otherwise, it cannot get the lock immediately and returns false immediately. It does not spin to wait for the lock to be released.

Spin_lock (lock)

This macro is used to obtain the spin lock. If the lock can be obtained immediately, it will return immediately. Otherwise, it will spin there until the holder of the spin lock is released, it acquires the lock and returns it. In short, only the lock is returned.

Spin_lock_irqsave (lock, flags)

When this macro obtains the spin lock, it saves the value of the Flag register to the variable flags and invalidates local interruptions.

Spin_lock_irq (lock)

The macro is similar to the spin_lock_irqsave, but the macro does not save the value of the Flag register.

Spin_lock_bh (lock)

This macro fails the local Soft Interrupt while obtaining the spin lock.

Spin_unlock (lock)

This macro releases the spin lock, which is used in pairs with the spin lock or the spin lock. If spin_trylock returns false, it indicates that the spin lock is not obtained, so you do not need to use the spin_unlock release.

Spin_unlock_irqrestore (lock, flags)

When the macro releases the spin lock, it also restores the value of the Flag register to the value saved by the variable flags. It is used in combination with spin_lock_irqsave.

Spin_unlock_irq (lock)

This macro releases the spin lock and also enables local interruption. It is paired with the spin_lock_irq.

Spin_unlock_bh (lock)

This macro releases the spin lock and also enables local soft interruptions. It is used together with spin_lock_bh.

Spin_trylock_irqsave (lock, flags)

If the macro gets the spin lock, it will also save the value of the Flag register to the variable flags, and the local interrupt will be invalidated. If the macro does not get the lock, it will do nothing.

Therefore, if the lock can be obtained immediately, it is equivalent to the spin_lock_irqsave. If the lock cannot be obtained, it is equivalent to the spin_trylock. If the macro gets the spin lock, use the spin_unlock_irqrestore to release it.

Spin_trylock_irq (lock)

The macro is similar to the spin_trylock_irqsave, but the macro does not save the mark register. If the macro gets the spin lock, use the spin_unlock_irq to release it.

Spin_trylock_bh (lock)

If the macro gets a spin lock, it also invalidates the local Soft Interrupt. If the lock is not obtained, it does not do anything. Therefore, if a lock is obtained, it is equivalent to spin_lock_bh. If no lock is obtained, it is equivalent to spin_trylock. If the macro gets the spin lock, use the spin_unlock_bh to release it.

Spin_can_lock (lock)

This macro is used to determine whether the spin lock can be locked. It is actually the inverse of the spin lock. If the lock is not locked, it returns true. Otherwise, false is returned. The Macro was defined for the first time in 2.6.11 and is not in the previous kernel.

There are several versions of getting a spin lock and releasing a spin lock, so it is necessary to let the reader know under what circumstances to use the macro for getting and releasing the lock.

If the protected shared resource only accesses the shared resource in the process context and the Soft Interrupt context, when the shared resource is accessed in the process context, it may be interrupted by the Soft Interrupt, in this case, access to protected shared resources may be interrupted from context to context. In this case, access to shared resources must be protected using spin_lock_bh and spin_unlock_bh.

Of course, you can also use spin_lock_irq, spin_unlock_irq, and spin_lock_irqsave and spin_unlock_irqrestore. They do not have local hard breaks, and hard breaks do not have soft interruptions implicitly. But the use of spin_lock_bh and spin_unlock_bh is the most appropriate, it is faster than the other two.

If the protected shared resources are only accessed in the process context and in the tasklet or timer context, use the same macro as above to obtain and release the lock, because tasklet and timer are implemented with soft interruptions.

If the protected shared resources are accessed only in one tasklet or timer context, no spin lock protection is required because the same tasklet or timer can only run on one CPU, this is true even in an SMP environment. In fact, when tasklet calls tasklet_schedule to mark that it needs to be scheduled, it has bound the tasklet to the current CPU. Therefore, the same tasklet may never run on other CPUs at the same time.

Timer is also bound to the current CPU when it is added to the timer queue using add_timer, so the same timer cannot run on other CPUs. Of course, it is impossible for two instances of the same tasklet to run on the same CPU at the same time.

If the protected shared resources are only accessed in two or more tasklet or timer contexts, you only need to use the spin_lock and spin_unlock to access the shared resources. You do not need to use the _ bh version, this is because when tasklet or timer is running, there cannot be other tasklet or timer running on the current CPU.

If the protected shared resource is accessed only in the context except for one Soft Interrupt tasklet and timer), the shared resource needs to be protected using spin_lock and spin_unlock, because the same Soft Interrupt can run on different CPUs at the same time.

If the protected shared resource has two or more soft interruptions to context access, the shared resource must be protected by the use of spin_lock and spin_unlock, different soft interrupts can run on different CPUs at the same time.

If the protected shared resources are in Soft Interrupt, including tasklet and timer) or process context and hard interrupt context access, the soft interrupt or process context access may be interrupted by hard interrupt, to access shared resources in a hard interrupt context, you must use spin_lock_irq and spin_unlock_irq in the process or Soft Interrupt context to protect access to shared resources.

The version used in the interrupt handling handle depends on the situation. If only one interrupt handling handle accesses the shared resource, in the interrupt handling handle, only the spin_lock and spin_unlock are required to protect access to shared resources.

Because it is impossible to be interrupted by a soft interrupt or process on the same CPU during the execution of the interrupt processing handle. However, if different interrupt handling handles are used to access the shared resources, use the spin_lock_irq and spin_unlock_irq In the interrupt handling handle to protect access to the shared resources.

In the case of the use of spin_lock_irq and spin_unlock_irq, you can replace it with spin_lock_irqsave and spin_unlock_irqrestore. Which of the following statements should be used depends on the actual situation, if you are sure that the service is enabled before access to shared resources is interrupted, it is better to use spin_lock_irq.

It is faster than spin_lock_irqsave, but if you are not sure whether the Enable is interrupted, it is better to use spin_lock_irqsave and spin_unlock_irqrestore, because it will restore the interruption mark before accessing shared resources, rather than directly enabling interruption.

Of course, in some cases, the access to shared resources must be interrupted and the access must be interrupted. In this case, it is best to use spin_lock_irq and spin_unlock_irq.

You need to remind the reader that the spin_lock is used to prevent non-synchronous access to shared resources caused by simultaneous access of execution units on different CPUs to shared resources and mutual preemption of different process contexts, the interrupt failure and Soft Interrupt failure are designed to prevent soft interruptions on the same CPU or to interrupt non-synchronous access to shared resources.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.