Mutual Exclusion in the kernel

Source: Internet
Author: User

Some personal understandings about the spinlock in the kernel
Because the 2.6 kernel can be preemptible, it should be used in the driverPreempt_disable() And preempt_enable () to prevent code segments from being preemptible (IRQ is also implicitly prohibited ).

Here, I mainly write down some of my understanding of the spinlock in the kernel, not to tell you anything (because I am not sure about what I said ), instead, I hope that you will be able to tell me the correct things and the wrong things.

There are two main files related to the spinlock, one is include/Linux/spinlock. h. It mainly provides several external main functions related to the hardware-independent spinlock. One is include/ASM-xxx/spinlock. h, used to provide functions related to hardware. In addition, in the 2.6 kernel, another file is added, including/Linux/preempt. h, to provide some services for the newly added preemptible multitasking function.

The role of spinlock: the series of functions are mainly used to protect critical data (very important data) from simultaneous access (Lock critical data), to achieve multi-task synchronization. If a data cannot be accessed, wait until it is accessible.

Prerequisites for using the spinlock function: first, the spinklock function can only be used in the kernel, or can only be used in the kernel state. kernels earlier than 2.6 cannot be preemptible. That is to say, when running in the kernel state, it is not allowed to switch to other processes. In the kernel after 2.6, an option is added during kernel compilation to configure whether the kernel can be preemptible. That is why preempt is added to the kernel after 2.6. h.

Spinlock mainly includes the following functions:
Spin_lock
Spin_unlock
Spin_lock_irqsave
Spin_lock_irq
Spin_unlock_irqrestore
Spin_unlock_irq
There are many others, such as a set of functions for readers and a set of bottom half functions (I have not read the code for bottom half ), also, we provide a set of bit-based locking functions. Since they all mean the same, I will not talk about them here (I Just Want To Talk About It briefly. I didn't expect many things, my hands are almost frozen, and the winter in the south of the Yangtze River really cannot stand it :)
The spinlock function is divided into two sets based on machine configurations: single CPU and multi-CPU. Let's first look at the single CPU situation.
In the case of a single CPU, the functions of the spin_lock and the spin_unlock functions are defined as null operations (do {} while (0), because we mentioned above, the reason why the kernel cannot be preemptible. Therefore, in the case of a single CPU, as long as you can ensure that the critical data you want to protect will not be used during the interruption, your data is already protected, you do not need to perform any operations. In the 2.6 kernel, these two functions are no longer so simple, because the kernel may be interrupted by other programs, so we need to protect the data and temporarily disable the scheduling program to schedule this program, that is to say, the preemptible task scheduling function is temporarily disabled. Therefore, one morePreempt_disableAnd preempt_enable. The functions of spin_lock_irq are similar to the functions provided by the above spin_lock, but it does a step further, that is, to close the interrupt, it is mainly used when the currently protected data is used in possible interrupt programs. The functions of spin_lock_irqsave are the same as those of spin_lock_irq. However, after calling this function, you can write down the current broken state for future recovery.

In a multi-CPU environment, it is complicated because several programs may be running at the same time, so a variable must be defined as the lock function, linux stipulates that when this variable is 1, the protected variables can be accessed. When the value is 0, the protected critical data cannot be accessed, among them, it is also learned to change the value of the variable lock, that is, the charge will not be synchronized if several CPUs cannot be modified at the same time. For example, when multiple CPUs are used, is the spin_lock? /

It should be clarified that the choice of mutex is not based on the size of the critical section, but based on the nature of the critical section and
Which part of the code is competing for the kernel execution path.

Strictly speaking, semaphore and spinlock_xxx are mutually exclusive methods at different levels.
The implementation depends on the latter. This is a bit like the relationship between HTTP and TCP. Both are protocols, but the layers are different.

Semaphore is process-level, used for mutual exclusion of resources between multiple processes.
In the kernel, but the execution path of the kernel is a process, representing the process to compete for resources. If
No competition, there will be context switch, the process can sleep, but the CPU will not stop, will continue to run
Other execution paths. In terms of concept, this is not directly related to a single CPU or multiple CPUs, just in
To guarantee the atomicity of semaphore structure access
Use a spinlock to obtain mutual exclusion.

In the kernel, it is necessary to ensure that the data access between the various execution paths of the kernel is mutually exclusive, which is the most basic
Mutual Exclusion, that is, maintaining the atomicity of data modification. The implementation of semaphore also depends on this. In a single CPU
The main problems are interruptions and bottom_half. Therefore, the switch can be interrupted. In multiple CPUs,
Added the interference of other CPUs, so you need to use the spinlock to help. These two parts are combined,
It forms the spinlock_xxx. It features that once the CPU enters the spinlock_xxx, it will not
Wait until the lock is successful. Therefore, this determines
The critical zone locked by the spinlock_xxx cannot be stopped, or the context switch cannot be used. You need to access the data and hurry up.
So that other execution paths in the idling can obtain the spinlock. This is also the principle of spinlock.
. If context switch is required for the current execution path, it must be before schedule ().
Release the spinlock. Otherwise, it is prone to deadlocks. Because there is no context in the interrupt and BH, it cannot be performed
Context switch, which can only be idling and waiting for the spinlock. If your context switch is gone, who knows the Year of the Monkey?
Ma Yue can only return.

Because the original intention and purpose of spinlock is to ensure the atomicity of data modification, there is no reason
Stay in the locked critical section.

Spinlock_xxx has many forms, including


spin_lock()/spin_unlock(),

spin_lock_irq()/spin_unlock_irq(),

spin_lock_irqsave/spin_unlock_irqrestore()

spin_lock_bh()/spin_unlock_bh()



local_irq_disable/local_irq_enable

local_bh_disable/local_bh_enable



So under what circumstances should we use it? It depends on the kernel execution path and the kernel
Execution paths are mutually exclusive. We know that the main execution paths in the kernel are:


1. There is a process context in the kernel state of the user process, which mainly indicates that the process is executing the system call.

.

2. interrupted, abnormal, or self-trapped. In terms of concept, there is no process context and it cannot be performed.

Context switch.

3 bottom_half. In terms of concept, there is no process context at this time.

4 At the same time, the same execution path may run on other CPUs.




In this way, considering these four factors, the data that we want to mutually exclusive will be identified by these four factors.
Which of the following statements can be used to access the system. If you only need
If it is mutually exclusive, you need to use spin_lock/spin_unlock. If it is mutually exclusive with IRQ and other CPUs, you need to use
Spin_lock_irq/spin_unlock_irq. If it is mutually exclusive with IRQ and other CPUs, it must be saved
For the eflag status, use the spin_lock_irqsave/spin_unlock_irqrestore. If
To be mutually exclusive with BH and other CPUs, use spin_lock_bh/spin_unlock_bh.
Other CPUs are mutually exclusive. If they are mutually exclusive with IRQ, use local_irq_disable/local_irq_enable,
If you do not need to be mutually exclusive with other CPUs, you can useLocal_bh_disable/Local_bh_enable,
And so on. It is worth noting that the mutex on the same data is in Different kernel execution paths,
The format may be different (see the example below ).

For example. In the interrupt section, there is an irq_desc_t type structure array variable irq_desc [],
Each member of the array corresponds to an IRQ description structure, which contains the IRQ response function.
There is a spinlock in the irq_desc_t structure to ensure the mutex of access (modification.

For an IRQ Member, irq_desc [IRQ] has two paths for accessing the kernel. One is
Setup_irq usually occurs during module initialization, or
The initialization phase of the system; the second is in the interrupt response function (do_irq ). The Code is as follows:


int setup_irq(unsigned int irq, struct irqaction * new)

{

int shared = 0;

unsigned long flags;

struct irqaction *old, **p;

irq_desc_t *desc = irq_desc + irq;



/*

* Some drivers like serial.c use request_irq() heavily,

* so we have to be careful not to interfere with a

* running system.

*/

if (new->flags & SA_SAMPLE_RANDOM) {

/*

* This function might sleep, we want to call it first,

* outside of the atomic block.

* Yes, this might clear the entropy pool if the wrong

* driver is attempted to be loaded, without actually

* installing a new handler, but is this really a problem,

* only the sysadmin is able to do this.

*/

rand_initialize_irq(irq);

}



/*

* The following block of code has to be executed atomically

*/

[1] spin_lock_irqsave(&desc->lock,flags);

p = &desc->action;

if ((old = *p) != NULL) {

/* Can't share interrupts unless both agree to */

if (!(old->flags & new->flags & SA_SHIRQ)) {

[2] spin_unlock_irqrestore(&desc->lock,flags);

return -EBUSY;

}



/* add new interrupt at end of irq queue */

do {

p = &old->next;

old = *p;

} while (old);

shared = 1;

}



*p = new;



if (!shared) {

desc->depth = 0;

desc->status &= ~(IRQ_DISABLED | IRQ_AUTODETECT | IRQ_WAITING);

desc->handler->startup(irq);

}

[3] spin_unlock_irqrestore(&desc->lock,flags);



register_irq_proc(irq);

return 0;

}



asmlinkage unsigned int do_IRQ(struct pt_regs regs)

{

/*

* We ack quickly, we don't want the irq controller

* thinking we're snobs just because some other CPU has

* disabled global interrupts (we have already done the

* INT_ACK cycles, it's too late to try to pretend to the

* controller that we aren't taking the interrupt).

*

* 0 return value means that this irq is already being

* handled by some other CPU. (or is disabled)

*/

int irq = regs.orig_eax & 0xff; /* high bits used in ret_from_ code */

int cpu = smp_processor_id();

irq_desc_t *desc = irq_desc + irq;

struct irqaction * action;

unsigned int status;



kstat.irqs[cpu][irq]++;

[4] spin_lock(&desc->lock);

desc->handler->ack(irq);

/*

REPLAY is when Linux resends an IRQ that was dropped earlier

WAITING is used by probe to mark irqs that are being tested

*/

status = desc->status & ~(IRQ_REPLAY | IRQ_WAITING);

status |= IRQ_PENDING; /* we _want_ to handle it */



/*

* If the IRQ is disabled for whatever reason, we cannot

* use the action we have.

*/

action = NULL;

if (!(status & (IRQ_DISABLED | IRQ_INPROGRESS))) {

action = desc->action;

status &= ~IRQ_PENDING; /* we commit to handling */

status |= IRQ_INPROGRESS; /* we are handling it */

}

desc->status = status;



/*

* If there is no IRQ handler or it was disabled, exit early.

Since we set PENDING, if another processor is handling

a different instance of this same irq, the other processor

will take care of it.

*/

if (!action)

goto out;



/*

* Edge triggered interrupts need to remember

* pending events.

* This applies to any hw interrupts that allow a second

* instance of the same irq to arrive while we are in do_IRQ

* or in the handler. But the code here only handles the _second_

* instance of the irq, not the third or fourth. So it is mostly

* useful for irq hardware that does not mask cleanly in an

* SMP environment.

*/

for (;;) {

[5] spin_unlock(&desc->lock);

handle_IRQ_event(irq, &regs, action);

[6] spin_lock(&desc->lock);


if (!(desc->status & IRQ_PENDING))

break;

desc->status &= ~IRQ_PENDING;

}

desc->status &= ~IRQ_INPROGRESS;

out:

/*

* The ->end() handler has to deal with interrupts which got

* disabled while the handler was running.

*/

desc->handler->end(irq);

[7] spin_unlock(&desc->lock);



if (softirq_pending(cpu))

do_softirq();

return 1;

}

In setup_irq (), because other CPUs may simultaneously run setup_irq () or setup_irq,
The local IRQ is interrupted. Run do_irq () to modify desc-> status. To prevent both
Interference with local IRQ interruptions, as shown in [1] [2] [3], using the spin_lock_irqsave/spin_unlock_irqrestore ()

In do_irq (), because do_irq () itself is interrupted, and no interrupt is enabled yet
Other CPUs may be running setup_irq () or interrupted,
There is no difference in the impact on the local do_irq (). They all come from interference from other CPUs. Therefore, you only need to use the spin_lock/spin_unlock,
See [4] [5] [6] [7. It is worth noting that in [5], the spinlock is released first and then the specific response function is called.

Another example:


static void tasklet_hi_action(struct softirq_action *a)

{

int cpu = smp_processor_id();

struct tasklet_struct *list;



[8] local_irq_disable();

list = tasklet_hi_vec[cpu].list;

tasklet_hi_vec[cpu].list = NULL;

[9] local_irq_enable();



while (list) {

struct tasklet_struct *t = list;



list = list->next;



if (tasklet_trylock(t)) {

if (!atomic_read(&t->count)) {

if (!test_and_clear_bit(TASKLET_STATE_SCHED, &t->state))

BUG();

t->func(t->data);

tasklet_unlock(t);

continue;

}

tasklet_unlock(t);

}



[10] local_irq_disable();

t->next = tasklet_hi_vec[cpu].list;

tasklet_hi_vec[cpu].list = t;

__cpu_raise_softirq(cpu, HI_SOFTIRQ);

[11] local_irq_enable();

}

}

Here, the modifications to tasklet_hi_vec [CPU] do not involve competition between CPUs, because each CPU has its own data,
To prevent IRQ interference, use local_irq_disable/local_irq_enable, as shown in [8] [9] [10] [11 ].
.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.