關於linux 的 spinlock

來源:互聯網
上載者:User

關於linux 的 spinlock  

互斥手段的選擇,不是根據臨界區的大小,而是根據臨界區的性質,以及 

有哪些部分的代碼,即哪些核心執行路徑來爭奪。

從嚴格意義上說,semaphore和spinlock_XXX屬於不同層次的互斥手段,前者的
實現有賴於後者,這有點象HTTP和TCP的關係,都是協議,但層次是不同的。

先說semaphore,它是進程級的,用於多個進程之間對資源的互斥,雖然也是在
核心中,但是該核心執行路徑是以進程的身份,代表進程來爭奪資源的。如果
競爭不上,會有context switch,進程可以去sleep,但CPU不會停,會接著運行
其他的執行路徑。從概念上說,這和單CPU或多CPU沒有直接的關係,只是在
semaphore本身的實現上,為了保證semaphore結構存取的原子性,在多CPU中需要
spinlock來互斥。

在核心中,更多的是要保持核心各個執行路徑之間的資料訪問互斥,這是最基本的
互斥問題,即保持資料修改的原子性。semaphore的實現,也要依賴這個。在單CPU
中,主要是中斷和bottom_half的問題,因此,開關中斷就可以了。在多CPU中,
又加上了其他CPU的幹擾,因此需要spinlock來協助。這兩個部分結合起來,
就形成了spinlock_XXX。它的特點是,一旦CPU進入了spinlock_XXX,它就不會
幹別的,而是一直空轉,直到鎖定成功為止。因此,這就決定了被
spinlock_XXX鎖住的臨界區不能停,更不能context switch,要存取完資料後趕快
出來,以便其他的在空轉的執行路徑能夠獲得spinlock。這也是spinlock的原則
所在。如果當前執行路徑一定要進行context switch,那就要在schedule()之前
釋放spinlock,否則,容易死結。因為在中斷和bh中,沒有context,無法進行
context switch,只能空轉等待spinlock,你context switch走了,誰知道猴年
馬月才能回來。

因為spinlock的原意和目的就是保證資料修改的原子性,因此也沒有理由在spinlock
鎖住的臨界區中停留。

spinlock_XXX有很多形式,有

  spin_lock()/spin_unlock(),
  spin_lock_irq()/spin_unlock_irq(),
  spin_lock_irqsave/spin_unlock_irqrestore()
  spin_lock_bh()/spin_unlock_bh()

  local_irq_disable/local_irq_enable
  local_bh_disable/local_bh_enable

 

那麼,在什麼情況下具體用哪個呢?這要看是在什麼核心執行路徑中,以及要與哪些核心
執行路徑相互斥。我們知道,核心中的執行路徑主要有:

1  使用者進程的核心態,此時有進程context,主要是代表進程在執行系統調用
    等。
2  中斷或者異常或者自陷等,從概念上說,此時沒有進程context,不能進行
    context switch。
3  bottom_half,從概念上說,此時也沒有進程context。
4  同時,相同的執行路徑還可能在其他的CPU上運行。

 

這樣,考慮這四個方面的因素,通過判斷我們要互斥的資料會被這四個因素中
的哪幾個來存取,就可以決定具體使用哪種形式的spinlock。如果只要和其他CPU
互斥,就要用spin_lock/spin_unlock,如果要和irq及其他CPU互斥,就要用
spin_lock_irq/spin_unlock_irq,如果既要和irq及其他CPU互斥,又要儲存
EFLAG的狀態,就要用spin_lock_irqsave/spin_unlock_irqrestore,如果
要和bh及其他CPU互斥,就要用spin_lock_bh/spin_unlock_bh,如果不需要和
其他CPU互斥,只要和irq互斥,則用local_irq_disable/local_irq_enable,
如果不需要和其他CPU互斥,只要和bh互斥,則用local_bh_disable/local_bh_enable,
等等。值得指出的是,對同一個資料的互斥,在不同的核心執行路徑中,
所用的形式有可能不同(見下面的例子)。

舉一個例子。在中斷部分中有一個irq_desc_t類型的結構陣列變數irq_desc[],
該數組每個成員對應一個irq的描述結構,裡面有該irq的響應函數等。
在irq_desc_t結構中有一個spinlock,用來保證存取(修改)的互斥。

對於具體一個irq成員,irq_desc[irq],對其存取的核心執行路徑有兩個,一是
在設定該irq的響應函數時(setup_irq),這通常發生在module的初始化階段,或
系統的初始化階段;二是在中斷響應函數中(do_IRQ)。代碼如下:

int setup_irq(unsigned int irq, struct irqaction * new)
{
        int shared = 0;
        unsigned long flags;
        struct irqaction *old, **p;
        irq_desc_t *desc = irq_desc + irq;

        /*
         * Some drivers like serial.c use request_irq() heavily,
         * so we have to be careful not to interfere with a
         * running system.
         */
        if (new->flags & SA_SAMPLE_RANDOM) {
                /*
                 * This function might sleep, we want to call it first,
                 * outside of the atomic block.
                 * Yes, this might clear the entropy pool if the wrong
                 * driver is attempted to be loaded, without actually
                 * installing a new handler, but is this really a problem,
                 * only the sysadmin is able to do this.
                 */
                rand_initialize_irq(irq);
        }

        /*
         * The following block of code has to be executed atomically
         */
[1]     spin_lock_irqsave(&desc->lock,flags);
        p = &desc->action;
        if ((old = *p) != NULL) {
                /* Can't share interrupts unless both agree to */
                if (!(old->flags & new->flags & SA_SHIRQ)) {
[2]                     spin_unlock_irqrestore(&desc->lock,flags);
                        return -EBUSY;
                }

                /* add new interrupt at end of irq queue */
                do {
                        p = &old->next;
                        old = *p;
                } while (old);
                shared = 1;
        }

        *p = new;

        if (!shared) {
                desc->depth = 0;
                desc->status &= ~(IRQ_DISABLED | IRQ_AUTODETECT |IRQ_WAITING);
                desc->handler->startup(irq);
        }
[3]     spin_unlock_irqrestore(&desc->lock,flags);

        register_irq_proc(irq);
        return 0;
}

asmlinkage unsigned int do_IRQ(struct pt_regs regs)
{       
        /*
         * We ack quickly, we don't want the irq controller
         * thinking we're snobs just because some other CPU has
         * disabled global interrupts (we have already done the
         * INT_ACK cycles, it's too late to try to pretend to the
         * controller that we aren't taking the interrupt).
         *
         * 0 return value means that this irq is already being
         * handled by some other CPU. (or is disabled)
         */
        int irq = regs.orig_eax & 0xff; /* high bits used in ret_from_ code  */
        int cpu = smp_processor_id();
        irq_desc_t *desc = irq_desc + irq;
        struct irqaction * action;
        unsigned int status;

        kstat.irqs[cpu][irq]++;
[4]     spin_lock(&desc->lock);
        desc->handler->ack(irq);
        /*
           REPLAY is when Linux resends an IRQ that was dropped earlier
           WAITING is used by probe to mark irqs that are being tested
           */
        status = desc->status & ~(IRQ_REPLAY | IRQ_WAITING);
        status |= IRQ_PENDING; /* we _want_ to handle it */

        /*
         * If the IRQ is disabled for whatever reason, we cannot
         * use the action we have.
         */
        action = NULL;
        if (!(status & (IRQ_DISABLED | IRQ_INPROGRESS))) {
                action = desc->action;
                status &= ~IRQ_PENDING; /* we commit to handling */
                status |= IRQ_INPROGRESS; /* we are handling it */
        }
        desc->status = status;

        /*
         * If there is no IRQ handler or it was disabled, exit early.
           Since we set PENDING, if another processor is handling
           a different instance of this same irq, the other processor
           will take care of it.
         */
        if (!action)
                goto out;

        /*
         * Edge triggered interrupts need to remember
         * pending events.
         * This applies to any hw interrupts that allow a second
         * instance of the same irq to arrive while we are in do_IRQ
         * or in the handler. But the code here only handles the _second_
         * instance of the irq, not the third or fourth. So it is mostly
         * useful for irq hardware that does not mask cleanly in an
         * SMP environment.
         */
        for (;;) {
[5]             spin_unlock(&desc->lock);
                handle_IRQ_event(irq, &regs, action);
[6]             spin_lock(&desc->lock);
               
                if (!(desc->status & IRQ_PENDING))
                        break;
                desc->status &= ~IRQ_PENDING;
        }
        desc->status &= ~IRQ_INPROGRESS;
out:
        /*
         * The ->end() handler has to deal with interrupts which got
         * disabled while the handler was running.
         */
        desc->handler->end(irq);
[7]     spin_unlock(&desc->lock);

        if (softirq_pending(cpu))
                do_softirq();
        return 1;
}

 

在setup_irq()中,因為其他CPU可能同時在運行setup_irq(),或者在運行setup_irq()時,
本地irq中斷來了,要執行do_IRQ()以修改desc->status。為了同時防止來自其他CPU和
本地irq中斷的幹擾,如[1][2][3]處所示,使用了spin_lock_irqsave/spin_unlock_irqrestore()

而在do_IRQ()中,因為do_IRQ()本身是在中斷中,而且此時還沒有開中斷,本CPU中沒有
什麼可以中斷其運行,其他CPU則有可能在運行setup_irq(),或者也在中斷中,但這二者
對本地do_IRQ()的影響沒有區別,都是來自其他CPU的幹擾,因此只需要用spin_lock/spin_unlock

如[4][5][6][7]處所示。值得注意的是[5]處,先釋放該spinlock,再調用具體的響應函數。

再舉個例子:

static void tasklet_hi_action(struct softirq_action *a)
{
        int cpu = smp_processor_id();
        struct tasklet_struct *list;

[8]     local_irq_disable();
        list = tasklet_hi_vec[cpu].list;
        tasklet_hi_vec[cpu].list = NULL;
[9]     local_irq_enable();

        while (list) {
                struct tasklet_struct *t = list;

                list = list->next;

                if (tasklet_trylock(t)) {
                        if (!atomic_read(&t->count)) {
                                if (!test_and_clear_bit(TASKLET_STATE_SCHED,&t->state))
                                        BUG();
                                t->func(t->data);
                                tasklet_unlock(t);
                                continue;
                        }
                        tasklet_unlock(t);
                }

[10]            local_irq_disable();
                t->next = tasklet_hi_vec[cpu].list;
                tasklet_hi_vec[cpu].list = t;
                __cpu_raise_softirq(cpu, HI_SOFTIRQ);
[11]            local_irq_enable();
        }
}

 

這裡,對tasklet_hi_vec[cpu]的修改,不存在CPU之間的競爭,因為每個CPU有各自獨立的資料,
所以只要防止irq的幹擾,用local_irq_disable/local_irq_enable即可,如[8][9][10][11]處

相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.