Linux kernel synchronization per CPU variable, atomic operation, memory barrier, Spin lock "turn"

Source: Internet
Author: User
Tags mutex

Transferred from: http://blog.csdn.net/goodluckwhh/article/details/9005585

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Directory (?) [-]

    1. One per CPU variable
    2. Two-atom operation
    3. Three optimizations and memory barriers
    4. Four spin lock
      1. Spin lock
      2. Data structure and macro functions for spin locks
      3. Read/write Spin lock
      4. Correlation function of read-write spin lock

The various "Tasks" in the Linux kernel can see the kernel address space, so they also need to be synchronized and mutually exclusive. The Linux kernel supports synchronous/mutex methods including:

Technology Function Function range
Per CPU variable Replicate one copy of data per CPU All CPUs
Atomic operation Atomic read-Modify-write an instruction for a counter All CPUs
Memory barrier Avoid orders being reordered Local CPU or all CPUs
Spin lock Locked up and busy waiting All CPUs
Signal Volume Lock and block Wait (sleep) All CPUs
Sequential lock Lock based on access counters All CPUs
RCU Access shared data structures with pointers without locking All CPUs
Completion Notification/(wait for another) a task is completed All CPUs
Turn off local interrupts Turn off interrupts on a single CPU (this CPU) Local CPU
Turn off local soft interrupts Prohibit the execution of a deferred function on a single CPU (this CPU) Local CPU

One, each CPU variable must first clear the best synchronization/Mutex technology is not allowed to synchronize/mutex. All synchronous/Mutex technologies have a performance cost.
Each-CPU variable is the simplest synchronous means, which is actually an array of data structures, and each CPU of the system corresponds to an element in the array.
With each CPU variable, each CPU can only access the elements associated with it, so each-CPU variable can only be used in special cases.
Each-CPU variable is in main memory to ensure that they map to different hardware cashe rows. This ensures that concurrent access to each-CPU variable does not result in caching of snooping and invalidation, which can incur high overhead.
While each CPU variable can protect concurrent access from different CPUs, it does not protect asynchronous access, such as interrupts and deferred functions. In addition, if kernel preemption is supported, there may be a race condition for each CPU variable. Thus the kernel should disallow kernel preemption when accessing each CPU variable.
Use macros and functions for each CPU variable:
    • DEFINE_PER_CPU (type, name): The macro statically assigns a name to each-CPU variable named the type.
    • PER_CPU (name, CPU): the macro selects the element that corresponds to the specified CPU for each CPU variable named name
    • _ _get_cpu_var (name): the macro selects the element that corresponds to the local CPU for each CPU variable named name
    • Get_cpu_var (name): The macro closes the kernel preemption and then selects the element that corresponds to the local CPU for each CPU variable named name
    • Put_cpu_var (name): The macro opens kernel preemption, not using name
    • ALLOC_PERCPU (Type): The macro dynamically assigns a per-CPU variable of type A and returns its address
    • FREE_PERCPU (pointer): This macro releases the dynamically allocated per-CPU variable, pointer the address of each CPU variable
    • PER_CPU_PTR (pointer, CPU): The macro returns the address of the element that corresponds to the CPU for each CPU variable stored in address pointer
Second, the atomic operation has a lot of assembly instructions are "read-Modify-write" type, that is, this instruction to access memory two times, one time to read to get the old value, write to write a new value. If there are two or more than two CPUs initiating this type of operation at the same time, the final structure may be wrong (each CPU reads the old value, then make the modification and then write, so that the final write will win, if it is two times plus 1, in this case, the end will only add 1). The simplest way to avoid this problem is to ensure that the operation is atomic at the chip level.
When we write code, we cannot ensure that the compiler will use atomic instructions. So Lnux provides a special type of atomic_t and some special functions and macros, so that functions and Macros Act on the type of atomic_t and are implemented as separate, atomic assembly instructions.
Atomic operations in Linux:
    • Atomic_read (V): Returns the value of the *v
    • Atomic_set (v,i): Set the value of *v to I
    • Atomic_add (i,v): Add the value of *v I
    • Atomic_sub (I,V): Reduce the value of *v I
    • Atomic_sub_and_test (i, v): Subtract the value of *v and check if the updated *v is 0, or 0 returns 1
    • Atomic_inc (v): Add the value of *v to 1
    • Atomic_dec (v): Subtract the value of *V by 1
    • Atomic_dec_and_test (v): Subtract the value of *v by 1 and check if the updated *v is 0, or 0 returns 1
    • Atomic_inc_and_test (v): Adds a value of *v to 1 and checks if the updated *v is 0, or 0 returns 1
    • Atomic_add_negative (i, V): Adds the value of *v and checks if the updated *v is negative, or returns 1 if it is
    • Atomic_inc_return (v): Adds a value of *V to 1 and returns the value of the updated *v
    • Atomic_dec_return (v): Subtract the value of *v by 1 and return the value of the updated *v
    • Atomic_add_return (i, V): Adds the value of *v to I and returns the value of the updated *v
    • Atomic_sub_return (i, v): Subtract the value of *v by I and return the value of the updated *v
There are also some atomic operations acting on bitmask:
    • Test_bit (NR, addr): Returns the NR bit of *addr
    • Set_bit (NR, addr): Sets the Nr bit of *addr to 1
    • Clear_bit (NR, addr): *addr nr Bitteching 0
    • Change_bit (NR, addr): Reverse the NR bit of *addr
    • Test_and_set_bit (NR, addr): Sets the Nr bit of *ADDR to 1 and returns its old value
    • Test_and_clear_bit (NR, addr): Sets the Nr bit of *addr to 0 and returns its old value
    • Test_and_change_bit (NR, addr): Reverse the NR bit of *addr and return its old value
    • Atomic_clear_mask (Mask, addr): all bits corresponding to mask in *ADDR are cleared 0
    • Atomic_set_mask (Mask, addr): Sets all bits corresponding to mask in *addr to 1
Iii. optimization and memory barriers If compiler optimizations are enabled, the order in which directives are executed and their order in code is not necessarily the same. In addition, modern CPUs typically execute multiple instructions in parallel and may reschedule memory access.
However, when synchronization is involved, the command reflow can be problematic, and if the instructions placed after the synchronization primitive are executed before the synchronization primitive, there may be a problem. In fact all synchronization primitives are optimized and the memory barrier functions.
The optimization barrier primitive is used to tell the compiler that all memory addresses in the CPU register that are valid before the barrier are invalidated after the barrier. As a result, the compiler does not process any read-write requests after the barrier until the read-write request is made before the barrier. The barrier () macro is the optimized barrier primitive in Linux. Note that this primitive does not guarantee that the CPU executes their order (because of the nature of the parallel execution, the instructions that are executed may end first).
The memory barrier primitive ensures that statements placed before the primitive are executed before the start of the statement after the primitive.
Linux uses several memory barrier primitives, which can also be used as an optimization barrier. The read memory barrier is only available for read operations, and the write memory barrier is only available for write operations.
    • MB (): Used as a memory barrier on single-processor and multiprocessor architectures
    • RMB (): Used as a memory read barrier on single processor and multiprocessor architectures
    • WMB (): Used as a memory write barrier on single-processor and multiprocessor architectures
    • SMP_MB (): Used as a memory barrier on multiprocessor architectures
    • SMP_RMB (): Used as a memory read barrier on multiprocessor architectures
    • SMP_WMB (): Used as a memory write barrier on multiprocessor architectures
Four, Spin lock 1. Spin lock spin Lock is a widely used synchronization technology, when the kernel to access the shared data structure or enter the critical section, it is necessary to acquire a lock. When the kernel wants to access a lock-protected resource, it will attempt to acquire the lock, and if no one is currently holding the lock, then it will be able to access the resource, and if someone has already held the lock, it will not be able to access the lock. It is clear that locks are collaborative in nature, i.e. all tasks that require access to resources follow the principle of first obtaining permission, then using, and then releasing resources.
Spin locks are special locks used in multi-processing environments. When using a spin lock, if the current lock is locked and the lock cannot be acquired, the task that requests the lock waits for the lock to be released (showing that the current CPU is waiting for the lock to be released).
In general, a critical section protected by a spin lock is forbidden to preempt the kernel. On a single-processor system, the spin lock does not act as a lock, at which point the spin-lock primitive simply disables or enables kernel preemption. It is also important to note that kernel preemption is still valid during spin lock busy, so tasks that wait for the spin lock to be freed may be replaced by higher priority tasks.
Spin lock In addition to the busy, there is another need to pay attention to the impact: since the spin lock is mainly in the synchronization between SMP, so the operation of the spin-lock CPU needs to see the memory of the spin lock the latest value, so it has an impact on the cache. Spin locks are only available to protect short code fragments.
2. Spin lock data structure and macros, functions The Linux spin lock is represented by the spinlock_t data structure, which mainly includes a domain:
    • Slock: Indicates the state of the spin lock, 1 means "unlocked", and 0 and negative mean "lock" status
Spin Lock related macros (these macros are based on atomic operations):
    • Spin_lock_init (): Initialize the spin lock to 1
    • Spin_lock (): Gets the spin lock and, if no method is obtained, waits until it gets to the spin lock.
    • Spin_unlock (): Release spin lock
    • Spin_unlock_wait (): Wait for spin lock to be released
    • Spin_is_locked (): If the spin lock is locked, return 0, otherwise 1
    • Spin_trylock (): Attempts to acquire a spin lock and returns without blocking if it cannot be obtained. Not 0 is returned when the lock is obtained; otherwise 0
In addition to these versions, there are versions available for both interrupt and soft-break environments (Interrupt version: SPIN_LOCK_IRQ, which holds the interrupt version of the interrupt status word: Spin_lock_irqsave, soft interrupt version: SPIN_LOCK_BH).
3. Read-write spin lock read/write spin lock is to improve the concurrency of the kernel. As long as no kernel path modifies the data structure, multiple kernel paths can be allowed to read the data structure at the same time. If you have a kernel path to write the data structure, you must obtain a write lock. Simply put is write exclusive, read share.
The read-write spin lock is represented by the rwlock_t data structure, and its lock field is a 32-bit field and can be divided into two sections:
    • A 24-bit counter that represents the number of kernel control paths that read access to a protected data structure concurrently, and the counter is stacked at bit 0-23.
    • The "unlocked" flag field, when no kernel-controlled path is set on read or write, otherwise clear 0. Located at Bit 24
Thus 0x1000000 said unlocked, 0x00000000 means write lock, 0x00ffffff represents a reader, 0xfffffe represents two readers ...
4. Read-write spin lock correlation function

    • Read_lock: Gets the spin lock for Read, which is similar to Spin_lock (which also disables kernel preemption), except that it runs concurrent reads. It reduces the value of the spin lock by 1, and if a non-negative number is obtained, the spin lock is obtained, otherwise the atom increases the value of the spin lock to cancel the minus 1, then the loop waits for the lock's value to be positive, and the lock's value becomes positive and continues to attempt to acquire the read spin lock.
    • Read_unlock: Spin lock for read release. It atomically reduces the value of the Lock field and then re-enables the kernel to preempt.

Note: The kernel may not support preemption, which can be ignored by the disable and enable kernel preemption action

    • Write_lock: Gets the spin lock for write, which is similar to Spin_lock () and Read_lock () (also disables kernel preemption). It subtracts 0x1000000 from the lock field, and if a 0 is obtained, the write lock is obtained, otherwise the function atom is 0x1000000 on the value of the spin lock to cancel the subtraction operation. It then waits for the value of lock to become 0x01000000, and then continues to attempt to get read spin after the condition is satisfied.
    • Write_unlock: For write release spin lock, it atoms to the lock field plus 0x1000000, and then re-enable the kernel to preempt.
Similar to spin locks, read-write spin locks also exist for interrupts and soft interrupts (interrupt version: READ_LOCK_IRQ, which holds the interrupt version of the interrupt status word: Read_lock_irqsave, soft interrupt version: READ_LOCK_BH).

Linux kernel synchronization per CPU variable, atomic operation, memory barrier, Spin lock "turn"

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.