Linux kernel synchronization semaphore, sequential lock, RCU, completion, shutdown interrupt "turn"

Source: Internet
Author: User
Tags semaphore

Transferred from: http://blog.csdn.net/goodluckwhh/article/details/9006065

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Directory (?) [-]

    1. A signal volume
      1. The concept of signal volume
      2. Semaphore data structures and related APIs
        1. Data
        2. Initialization
        3. Acquiring and releasing semaphores
      3. The concept of Read and write semaphores
      4. Data structures for reading and writing semaphores
    2. Two sequential locks
      1. The concept of sequential locks
      2. Data
        1. Write operations
        2. Read operation
    3. Three read-copy Update RCU
      1. Write operations
      2. Read operation
      3. Release the old version
    4. Completions of four complete quantities
      1. Concept of Completion Quantity
      2. Data structures and related APIs
    5. Five off local interrupts
    6. Six enable and close delay functions
    7. Seven-choice synchronization technology
First, the signal volume

1. The semaphore concept Semaphore is also a lock, and when the semaphore is not available, the task of trying to get the semaphore will hang until it gets the semaphore. The Interrupt service program and the deferred function cannot use semaphores because the task that is trying to get the semaphore may hang.

For semaphores, it is important to note:

    1. Only the operation of the signal-meter value is atomic.
    2. The semaphore's spin lock is only used to protect the semaphore waiting queue
    3. Semaphores are relatively special, and the up operation must not be initiated by the caller of the down operation. If the semaphore is also regarded as a lock, then the lock is very special, it is not necessarily the task of holding the lock release, any other task can do the release action call up
Thus the down and up of semaphores can be executed concurrently. However, due to the protection of the semaphore count and the wait queue, this concurrency does not cause problems with up and down itself.
Because a down operation can cause the caller to hibernate, the scene that cannot hibernate is not allowed to invoke the function, such as an interrupt context, and up can be called in any context. If you want to use Down_trylock in an interrupt context call, it tries to get the semaphore, but returns 1 instead of waiting when it cannot be fetched, and returns 0 if it is available.
2. The data structure of the semaphore and the related API1. The structure semaphore is represented by a data structure semaphore, which contains the following fields:
    • Count: The count value of the resource protected by the signal wolf, if greater than 0 indicates that the resource is available; The operation on it must be atomic. If there is a check and update operation, the combination of the two operations must also be atomic, such as the comparison in down and minus 1.
    • Wait_list: A linked list of task queues that are available for resources that are waiting for this semaphore to be protected.
    • Lock: Protects the spin lock on the list of waiting tasks.
2. Initialize Init_mutex () to initialize the Count field of the semaphore to 1, indicating that the resource is currently available
Init_mutex_locked () Initializes the Count field of the semaphore to 0, indicating that the resource is currently unavailable
Declare_mutex completes and Init_mutex a similar operation, but it also has one more static allocation of a semaphore action
Declare_mutex_locked and init_mutex_locked are similar, but it also has one more static assignment of a semaphore action
Of course, you can also set the initial semaphore to a different positive value.
3. Acquiring and releasing semaphores

The up () function is used to release the semaphore if the current semaphore's waiting queue is empty, that is, no task waits for the semaphore to be released, then it increments the semaphore count value and then returns, otherwise it wakes up the first task on the waiting queue.
Down is used to get the semaphore, and if the semaphore's value is greater than 0, it will subtract the value of count by 1 and return it, otherwise the caller will be added to the end of the wait queue and wait until it is awakened that the task gets the resource.
void down (struct semaphore *sem)
int Down_trylock (struct semaphore *sem)//attempts to get the semaphore, but returns 1 instead of waiting when it cannot be obtained, and returns 0 if available
int down_interruptible (struct semaphore *sem)//Gets the semaphore, but can be interrupted while waiting for the semaphore, and if interrupted during the wait, returns-EINTR
int down_timeout (struct semaphore *sem, long jiffies)//is used to get semaphores, but waits up to jiffies for a long time, and returns-etime if the semaphore is not acquired within the specified time period.
void up (struct semaphore *sem)

3. The concept of Read and write signal volume is similar to read and write spin lock, it is for the read and write less scenes do optimized signal volume, unlike the spin lock is, when the signal volume is not available, it hangs instead of spin.
There can be more than one task concurrently reading a read and write semaphore, but only one task at a time can have a write-read semaphore. Therefore, when there is no task to hold the semaphore for read or write, the new operation to get the semaphore for write is successful.
The kernel FIFO stores the tasks in the wait queue:
    1. If the reader or writer cannot obtain read and write semaphores, they are added to the end of the waiting queue
    2. When the semaphore is released, it wakes up the first task in the wait queue (who first wakes up depends on the implementation of the policy, the code best describes the problem, it is better to consult the actual code)
    3. If the first task that wakes up is write, the other task continues to sleep, and if the first task that wakes up is read, it wakes up all the read tasks after it until it touches a write task, the write task, and all the tasks behind it continue to sleep
Because it is also a semaphore, its application scenarios have the same limitations and semaphores.
4. Data structures for reading and writing semaphores RW_SEMAPHORE data structures are used to represent read and write semaphores, which contain fields such as the following:
    • ACTIVITY:0 indicates that no task is reading or writing, greater than 0 indicates that a task is being read, 1 indicates that a task is being written
    • Wait_list: A linked list of tasks awaiting the semaphore
    • Wait_lock: Spin lock to protect the list of waiting tasks
Read/write semaphores can be initialized with INIT_RWSEM (SEM), or a semaphore can be declared and initialized with Declare_rwsem (name)
void Down_read (struct rw_semaphore *sem)
int Down_read_trylock (struct rw_semaphore *sem)//attempts to get a semaphore for read, but returns 1 instead of waiting when it cannot be obtained and returns 0 if available
void Down_write (struct rw_semaphore *sem)
int Down_write_trylock (struct rw_semaphore *sem)//attempts to get a semaphore for read, but returns 1 instead of waiting when it cannot be obtained and returns 0 if available
void Up_read (struct rw_semaphore *sem)
void Up_write (struct rw_semaphore *sem)
void Downgrade_write (struct rw_semaphore *sem)//It degrades a write signal to the read semaphore and wakes up the read task on the waiting queue. So its callers should hold a write semaphore
Second, sequential Lock 1. The concept of sequential locks the read and write locks have the same precedence when using read and write locks. The 2.6 kernel introduces sequential and read-write spin locks in the same way, except that it gives the writer a higher priority: The writer can write when the sequential lock is used, even if the reader is in the act of reading. The advantage of a read-write lock is that the writer is never waiting for a reader to read, and the disadvantage is that the reader may need to try to read it several times to read the legitimate data.

Not all data types can be protected with sequential locks, and if you want to use sequential locks, the following principles must be followed:

    1. Protected data structures cannot contain pointers that are freed by the reader by the writer's protection
    2. The code of the reader's critical section must not have side effects
    3. The Reader critical section should be very small, and the writer should try to get fewer sequential locks, otherwise repeated reading will cause some performance damage
2. The data structure sequence lock uses a data structure, seqlock_t, that contains two domains:
    • Spin lock Lock
    • Integer serial number
The serial number exists as an order counter. Each reader must read this value at least two times, one time before reading the data, one time after reading, if the two reads get the same sequence number, then read a valid value, otherwise the reader will read the process to update the data, so readers need to re-read.
There are two ways to initialize sequential locks:
    1. seqlock_t lock1 = seqlock_unlocked;
    2. seqlock_t Lock2; Seqlock_init (&LOCK2);
Both of these methods will initialize the sequential lock to unlocked state.

1. The writer must first acquire the lock, then manipulate it, and then release the lock.
Write_seqlock: Used to get the sequential lock for write, it gets the spin lock in the sequential lock, and then the sequential lock sequence number is added 1
Write_sequnlock: Used to release sequential locks, it also increases sequential lock sequence numbers, and then releases spin locks in sequential locks
This design ensures that the writer is writing the data and that the serial number is odd when it is not completed, and that no writer has an even number when modifying the data.

2. Read operation for the reader, it needs to take the following form of sequence of operations:
unsigned int seq;
do {
Seq = Read_seqbegin (&seqlock);
/* ... CRITICAL Region ... * *
} while (Read_seqretry (&seqlock, seq));
Read_seqbegin: Returns the value of the sequential lock current sequence number
Read_seqretry: Returns 1 if the specified value and sequential lock sequence number are unequal or the sequence number of sequential locks is odd.
It is important to note that the reader does not turn off kernel preemption, and because the writer acquires a spin lock in the sequential lock, it disables kernel preemption.

Third, Read-copy Update (RCU) RCU is another data synchronization technology designed to protect the primary operation in SMP environments as read operations. RCU allows multiple readers and multiple writer concurrent operations concurrently. RCU does not use any locks or any counters shared by multiple CPUs, which is a significant advantage compared to read-write spin locks and sequential locks.
RCU has a great advantage, but it also has a big limit, which limits the data structures it can protect:
    • The protected resources should be dynamically allocated, accessed through pointers, and all references to those resources must be held by atomic code
    • Cannot hibernate when entering a critical section protected by RCU
The RCU operation consists of a read operation, a write operation, and the release of an older version of the operation.
1. Write operation the principle is that when the data structure needs to be changed, the write thread makes a copy, changes the copy (this requires a memory barrier to ensure that the update can be seen by the other CPU), and then the relevant pointer to the new version, when the kernel confirms that no CPU is still referencing the old version of the old version can be released .
2. Read operation when the kernel code wants to read a data structure protected by RCU, it
    1. Call Rcu_read_lock (equivalent to preempt_disable)
    2. For Read access
    3. Call Rcu_read_unlock (equivalent to preempt_enable)
It is important to note that the code between 1 and 3 does not allow hibernation.
3. Release the old version in RCU the key is when the old version is released. Because the code on the other processor might also have references to the old data, it cannot be released immediately. The kernel must release the old version when it ensures that there are no references to the old version. In fact, the old copy can be released only if all the readers have called Rcu_read_unlock. The kernel requires each reader to call the Rcu_read_unlock macro before starting the following actions:
    • CPU for process switching
    • The CPU is starting to shift to user mode
    • The CPU starts to execute the idle process
The CALL_RCU function is called by the writer to clear the old data structure. The function is prototyped as follows:
void Call_rcu (struct rcu_head *head, void (*func) (void *arg), void *arg);
    • Head: Where head is a data structure pointer of type Rcu_head, it is usually embedded in the protected data structures.
    • Func: It is called to release the old data structure when the RCU data structure can be freed
The CALL_RCU function stores the address of the callback function in the Rcu_head descriptor and its arguments, and then inserts the descriptor into a list of per-CPU callback links. The kernel periodically checks to see if the RCU data structure can be freed, and if so, a callback function is called by a tasklet.

Iv. completion (completions) 1. The concept of the completion amount a common scenario in the kernel is to start another task in the current task, and then wait for the task to finish doing something. Consider using semaphores to do the job:
struct semaphore sem;
Init_mutex_locked (&sem);
Start_external_task (&sem);
Down (&SEM);
It calls up (&sem) when the outside completes the action we expect.
But the semaphore is not particularly suitable for this scenario, and the semaphore is optimized for "usable" situations where it is expected to be available in most cases when using semaphores. In the above scenario, however, it is clear that the down is necessarily the branch of the semaphore that is not available.
Completion is a mechanism used to solve this problem. Completion allows a task to tell another task that the work has been done.
2. Data structures and related API cores use data structure completion to represent completion
void Init_completion (struct completion *x)
You can use this function to complete completion initialization or to declare and initialize a completion,init_completion (x) macro by Declare_completion (work) to initialize completion x.
void Wait_for_completion (struct completion *c);
The function waits on C and cannot be interrupted. If Wait_for_completion is called and no task is called complete, the wait_for_completion will always wait.
Wait_for_completion_interruptible (struct completion *x)
The function waits on X, but may be interrupted during the wait, and returns a value of-erestartsys when returned due to interruption; 0
void complete (struct completion *c);
void Complete_all (struct completion *c);
These two functions can be used to wake up a task waiting on C. The difference is that complete only wakes up a waiting task, and Complete_all wakes up all of it.
The typical use of the completion mechanism is when the module exits with the termination of the kernel thread. In this prototype example,
void Complete_and_exit (struct completion *c, long retval);
Wake up the task waiting on C, and exit this task with retval.

V. Close local interrupt shutdown interrupts are a way to set up a critical section. When an interrupt is turned off, even a hardware interrupt cannot interrupt the operation of the code, so it can protect the data structure that is accessed by the Interrupt service program at the same time. However, it is important to note that shutting down local interrupts does not protect data structures that may be accessed by multiple CPUs. Therefore, under the SMP architecture, it is often necessary to protect the shared resources used by the interrupted service program by means of a closed interrupt and a spin lock.
Local_irq_disable () macro used to shut down interrupts on local CPU
Local_irq_enable () macros are used to open interrupts on the local CPU, which makes use of the of the STI assembly language instruction, enables them. As stated in the problem with using these two functions is that we can simply turn off interrupts when needed, but it's not necessarily true that a simple, rude open interrupt is possible, because when we close the interrupt, the interrupt may be closed, and if we simply open the interrupt it can cause problems.
Local_irq_save: Turn off interrupt and save interrupt status word
Local_irq_restore: Resuming interrupts with the specified interrupt status word
These two macros solve the problem very well because local_irq_restore just restores the interrupt to the state we called Local_irq_save.
The Enable and shutdown delay functions are also protected by a data structure that can be accessed by a deferred function because the deferred function is executed at an unexpected point in time.
The simplest way to disallow deferred function execution is to shut down local interrupts, because the interrupt service program is not able to execute, and there is no way to start a deferred function.
Because soft interrupts are not executed when they are in the interrupt state, and the Tasklet is based on a soft interrupt, the deferred function can be disabled on the local CPU as long as the local soft interrupt is disabled.
The local_bh_disable macro is used to add 1 to the soft interrupt count of the local CPU, thereby prohibiting local soft interrupts.
The local_bh_enable macro is used to reduce the soft interrupt count of the local CPU by 1 to open local soft interrupts
Both of these functions can be called repeatedly, but the number of times local_bh_disable is called, and the corresponding number of times local_bh_enable can be called to open a soft interrupt
VII. Selection of synchronization technology

There are many techniques that can be used to avoid the means of race when accessing shared data. However, the impact of various means on system performance is different. But as a rule, you should use a technique that allows you to get the maximum concurrency level or the number of concurrent numbers in this scenario. The concurrency level of the system depends on:

    • Number of I/O devices that can operate concurrently
    • Number of CPUs doing effective work
In order to maximize I/O swallowing metrics, it should be as short as possible to shut down interrupts.

To improve CPU efficiency, you should avoid using spin locks whenever possible. Because it not only causes the spin CPU to be in a busy state, but also adversely affects the cache.

Depending on the type of kernel task that accesses the shared data, there are differences in the synchronization techniques that are required:

Task Synchronization techniques used in single-processor environments Additional synchronization techniques used in multiprocessor environments
Dormant tasks (inline thread, system call) Signal Volume No need for additional synchronization technology
Interrupt Turn off local interrupts Spin lock
Deferred function Don't need No need for or use of spin locks (depending on whether different tasklet will access the same data structure)
Dormant tasks + interrupts Turn off local interrupts Spin lock
Dormant Task + deferred function Turn off local soft interrupts Spin lock
Interrupt + Delay function Turn off local interrupts Spin lock
Dormant task + interrupt + Delay function Turn off local interrupts Spin lock

Linux kernel synchronization semaphore, sequential lock, RCU, completion, shutdown interrupt "turn"

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.