The Linux kernel's queued spin lock (FIFO Ticket spinlock) __linux

Source: Internet
Author: User
Tags volatile ticket
Turn from: http://www.ibm.com/developerworks/cn/linux/l-cn-spinlock/index.html?utm_source=twitterfeed&utm_ Medium=twitter
Introduction

Spin Lock (Spinlock) is a kind of low-level synchronization mechanism widely used in Linux kernel. A spin lock is a special lock that works in a multiprocessor environment, and a spin lock operation in a single processing environment is replaced with an empty operation. When a kernel executing thread on a processor requests a spin lock, if the lock is available, the lock is obtained, then the critical section action is performed, the lock is released, and if the lock is occupied, the thread does not go to sleep, but is busy waiting for the lock, and once the lock is freed, the first thread that perceives this information will acquire the lock.

For a long time, people always pay attention to the safety and efficiency of spin lock, but ignore the "fair" nature of spin lock. The traditional spin lock is essentially represented by an integer, with a value of 1 representing that the lock is not occupied. The nature of this disorderly competition causes the execution thread to be unable to guarantee when the lock will be taken, and some threads may have to wait a long time. With the increasing number of computer processors, this "unfairness" problem will become increasingly serious.

The queued spin lock (FIFO Ticket spinlock) is a new kind of spin lock introduced in Linux kernel version 2.6.25, which solves the problem of "unfairness" of traditional spin lock by saving the order information of executing thread request lock. The code for the queued spin lock is implemented by the Linux kernel developer Nick Piggin, which is currently targeted only at the x86 architecture (including IA32 and x86_64) and is believed to be ported to other platforms soon.

the realization and insufficiency of the traditional spin lock in the back page

The underlying data structure of the Linux kernel spin lock is raw_spinlock_t defined as follows: Listing 1 raw_spinlock_t data structure

typedef struct {
	unsigned int slock;
} raw_spinlock_t;

Slock, though defined as unsigned integers, is actually used as a signed integer. A slock value of 1 means that the lock is not occupied, and a value of 0 or a negative number represents the lock being occupied. Slock is set to 1 when initialized.

The thread requests a spin lock through a macro spin_lock. If kernel preemption is not considered, Spin_lock calls the __raw_spin_lock function, as shown in the following code: Listing 2. __raw_spin_lock function

static inline void __raw_spin_lock (raw_spinlock_t *lock)
{
	asm volatile ("\n1:\t"
		     lock_prefix; DECB%0\n\t "" Jns 3f\n "2:\t" "rep;nop\n\t"
		     "CMPB $0,%0\n\t" "
		     jle 2b\n\t"
		     "jmp
		     1b\n" " 3:\n\t "
		     :" +m "(lock->slock)::" Memory ");
}

The definition of Lock_prefix is as follows: listing 3. Lock_prefix Macros

#ifdef CONFIG_SMP
#define LOCK_PREFIX \
		". Section. smp_locks,\" A\ "\
		_asm_align" \ n "			\
		_ Asm_ptr "661f\n" * address * * "
		previous\n"			\
		"661:\n\tlock; "

#else * *! CONFIG_SMP * *
#define LOCK_PREFIX ""
#endif

Lock_prefix is actually defined as a "LOCK" prefix in a multiprocessor environment.

The x86 processor uses the "lock" prefix to provide a means of locking the bus during instruction execution. There is a lead LOCK on the chip, if in an assembly instruction (ADD, ADC, and, BTC, BTR, BTS, Cmpxchg, cmpxch8b, DEC, INC, NEG, not, OR, SBB, SUB, XOR, Xadd, Xchg Preceded by the "lock" prefix, the compiled machine code causes the processor to hold down the potential of the lead lock when executing the instruction, thus locking the bus so that other processors or DMA-enabled peripherals temporarily fail to access the memory through the same bus.

Starting with the P6 processor, if the memory area that the instruction accesses already exists in the processor's internal cache, the lock prefix does not pull down the potential of the lead lock, but instead locks the processor's internal cache and then relies on the cache consistency protocol to guarantee the atomic nature of the operation.

The DECB assembly instruction will reduce the Slock value by 1. Because "Minus 1" is a read-change-write operation, not an atomic operation, it may be interfered with by threads on other processors that apply for the lock at the same time, so the "lock" prefix must be added.

The JNS assembly instruction checks the EFlags register for the SF (symbol) bit, if 0, slock the original value is 1, the thread obtains the lock and then jumps to label 3 to end this function call. If the SF bit is 1, the original value of Slock is 0 or negative, and the lock is occupied. Then the thread goes to label 2 constantly test Slock and 0 of the size of the relationship, if slock less than or equal to 0, jump to the location of the label 2 continue to Busy waiting, if slock greater than 0, the lock has been released, then jump to the location of the label 1 to apply for the lock.

The thread releases the spin lock through the macro Spin_unlock, which calls the __raw_spin_unlock function: listing 4. __raw_spin_unlock function

static inline void __raw_spin_unlock (raw_spinlock_t *lock)
{
	asm volatile ("Movb $1,%0": "+m" (Lock->slock): : "Memory");
}

The visible __raw_spin_unlock function executes only one assembly instruction: The Slock is placed to 1.

Despite the advantages of simple and convenient use and good performance, spin lock also has its own shortcomings:

Due to the nature of the traditional spin lock disorder, the kernel execution thread cannot guarantee when the lock can be taken, and some execution threads may have to wait for a long time, resulting in "unfair" problems. There are two reasons for this:

As the number of processors increases, the competition for spin locks intensifies, which naturally results in longer waiting times.

A reset operation that frees a spin lock invalidates all other cache of busy processors, so a processor near a spin lock owner in the processor topology may refresh the cache faster, thus increasing the chance of acquiring a spin lock.

Because every processor that requests a spin lock is busy waiting on the global variable Slock, the system bus will cause heavy traffic due to the synchronization of the cache between the processors, thus reducing the overall performance of the system.

design principle of back-page first-line spin lock

The "unfair" problem with traditional spin locks is particularly serious in the highly competitive server system, so the Linux kernel developer Nick Piggin has introduced a queued spin lock in the Linux kernel version 2.6.25: Solving the "unfairness" problem by saving sequential information about the execution thread request lock.

The queued spin lock still uses the original raw_spinlock_t data structure, but gives the slock domain a new meaning. To preserve the order information, the Slock domain is divided into two parts, which hold the ticket number of the lock holder and the future lock requester (Ticket numbers), as shown in the following figure: Figure 1. Next and Owner Fields

If the number of processors is not more than 256, the owner domain is slock 0-7 bits, the Next field is Slock 8-15 bits, Slock high 16 bits are not used; If the number of processors exceeds 256, the owner and Next fields are 16 digits, where O The Wner domain is a low 16-bit slock. Visible Queued Spin locks support up to 216 = 65,536 processors.

The lock is not in use unless the Next field is equal to the Owner field (the lock is not requested at this time). The Slock is set to 0 when the queued spin lock is initialized, that is, Owner and Next are placed to 0. When the kernel executes a thread to request a spin lock, it adds 1 to the Next field and returns the original value as its own ticket number. If the returned ticket is equal to the Owner value at the time of the request, the spin lock is in the unused state and the lock is obtained directly; otherwise, the thread is busy waiting to check whether the Owner domain equals the ticket number held by itself, and once it is equal, it indicates that the lock turns to itself. When the thread releases the lock, the Owner field is added to 1, and the next thread will discover the change and exit from the busy waiting state. Threads will strictly follow the application sequence to obtain the queued spin lock, thus completely solve the "unfair" problem.

implementation of Back-page first-line self-rotating lock

The queued spin lock does not change the calling interface of the original spin lock, which is provided to the developer in the form of a C language macro. The following table lists the 6 main APIs and corresponding underlying implementation functions: table 1. Queued Spin lock API

__raw_spin_unlock_wait
macro Bottom implementation function description
Spin_lock_init none locks to initial unused State (value 0)
Spin_lock __raw_spin_lock Busy etc Wait until the Owner field equals the local ticket ordinal
Spin_unlock __raw_spin_unlock< /td> Owner domain plus 1, passing locks to subsequent wait threads
spin_unlock_wait does not request a lock, waits until the lock is in the unused state
Spin_ is_locked __raw_spin_is_locked tests whether the lock is in use
__raw_spin_trylock If the lock is in an unused state, obtain a lock; otherwise, return directly to

The implementation details of the 3 underlying functions are described below, assuming the number of processors is no more than 256.

__raw_spin_is_locked Listing 5. __raw_spin_is_locked function

static inline int __raw_spin_is_locked (raw_spinlock_t *lock)
{
	int tmp = * (volatile signed int *) (& Lock)-& Gt;slock);
	Return (((TMP >> 8) & 0xff)!= (TMP & 0xff);
}

This function determines whether the Next and Owner fields are equal, and if they are equal, the spin lock is not in use, returns 0, or 1.

The complex assignment of TMP is to take value directly from memory and avoid the effect of processor caching.

__raw_spin_lock listing 6. __raw_spin_lock function

static inline void __raw_spin_lock (raw_spinlock_t *lock)
{short
	inc = 0X0100;

	__asm__ __volatile__ (
		lock_prefix "Xaddw%w0,%1\n" "1:\t" "CMPB%h0
		,%b0\n\t"
		"Je 2f\n\t" "Rep
		; Nop\n\t "
		" "Movb%1,%b0\n\t"
		/* don t need lfence here, because loads are in-order * *
		"jmp 1b\n"
		"2:" 
  : "+q" (inc), "+m" (Lock->slock)
		:
		: "Memory", "CC");
}

Lock_prefix macros are described in the previous article, which is the "lock" prefix.

The XADDW assembly instruction swaps the value of Slock and Inc, and then adds the two values and saves them to the slock. That is, after the instruction has been executed, Inc has the original Slock value as the ticket number, and the Next field of Slock is added by 1.

Comb compares the high and low byte of the INC variable to be equal, if equal, indicates that the lock is in an unused state, and jumps directly to the position of label 2 to exit the function.

If the lock is in use, keep the current slock's Owner field copied to the low byte (MOVB Directive) in Inc, and then repeat the C step. At this point, however, the high and low byte parity of the INC variable indicates that the turn itself acquires the spin lock.

__raw_spin_unlock Listing 7. __raw_spin_unlock function

static inline void __raw_spin_unlock (raw_spinlock_t *lock)
{
	__asm__ __volatile__ (
		unlock_lock_prefix) INCB%0 "
		:" +m "(Lock->slock)
		:
		:" Memory "," CC ");
}

Under the IA32 architecture, if a Ppro SMP system is used or X86_oostore is enabled, the Unlock_lock_prefix is defined as the "LOCK" prefix, otherwise it is defined as null.

The INCB instruction will slock the lowest bit byte, which is the Owner domain plus 1.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.