linux核心wait_queue深入分析

最後更新：2018-12-05 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

前幾天在看驅動的時候碰到了等待隊列,上網去搜了一下,再結合代碼看了一下,深有體會.在 kernel 裡，wait_queue 的應用很廣，舉凡 device driver,semaphore 等方面都會使用到 wait_queue 來 implement。所以，它算是 kernel 裡蠻基本的一個資料結構。

首先,我們得明白,linux中的所有的進程都由task_struct這個結構管理。在產生進程的時候將會分配一個task_struct結構，之後將通過這個結構對進程進行管理。 task_struct結構存在於平坦地址空間內，任何時候Linux核心都可以參照所有進程的所有管理情報。核心堆棧也同樣位於平坦地址空間內。(平坦的意思是"獨立的連續區間")

下面是tesk_struct的主要成員:

--------------------------------------------------------------------------------

struct task_struct {

struct files_struct* files; //檔案描述符

struct signal_struct* sig; //訊號控制signal handler

struct mm_struct* mm; //記憶體管理模組

long stat //進程狀態

struct list_head runlist; //用於連接RUN隊列

long priority; //基本優先權

long counter; //變動優先權

char comm[]; //命令名

struct thread_struct tss; //上下文儲存領域

...

};

我們現在只需瞭解它裡面的state就可以,state有下面幾種狀態:

狀態說明

TASK_RUNNING 執行可能狀態

TASK_INTERRUPTIBLE 等待狀態。可接受訊號

TASK_UNINTERRUPTIBLE 等待狀態。不能接受訊號

TASK_ZOMBIE 殭屍狀態。exit後的狀態

TASK_STOPPED 延緩狀態

我們要知道核心沒有多進程，就只有一個進程(SMP就不清楚了),這跟在user space下是不同的.在使用者空間裡,我們可以使一個進程跑起while(1),其他的進程也能用,但是在核心中就不行了，原因在上面。

假設我們在 kernel 裡產生一個 buffer，user 可以經由 read，write 等 system call 來讀取或寫資料到這個 buffer 裡。如果有一個 user 寫資料到 buffer 時，此時 buffer 已經滿了。那請問你要如何去處理這種情形呢 ? 第一種，傳給 user 一個錯誤訊息，說 buffer 已經滿了，不能再寫入。第二種，將 user 的要求 block 住，等有人將 buffer 內容讀走，留出空位時，再讓 user 寫入資料。但問題來了，你要怎麼將 user 的要求 block 住。難道你要用

while ( is_full );

write_to_buffer;

這樣的程式碼嗎? 想想看，如果你這樣做會發生什麼事? 第一，kernel會一直在這個 while 裡執行。第二個，如果 kernel 一直在這個 while 裡執行，表示它沒有辦法去 maintain系統的運作。那此時系統就相當於當掉了。在這裡 is_full 是一個變數，當然，你可以讓 is_full 是一個 function，在這個 function裡會去做別的事讓 kernel 可以運作，那系統就不會當。這是一個方式。還有,你說可以在while裡面把buffer裡的內容讀走,再把is_full的值改了,但是我們會可能把重要的資料在我們不想被讀的時候被讀走了,那是比較麻煩的,而且很不靈活.如果我們使用 wait_queue 的話，那程式看起來會比較漂亮，而且也比較讓人瞭解，如下所示:

struct wait_queue_head_t wq; /* global variable */

DECLARE_WAIT_QUEUE_HEAD (wq);

while ( is_full ){

interruptible_sleep_on( &wq );

} write_to_buffer();

interruptible_sleep_on( &wq ) 是用來將目前的 process，也就是要求寫資料到buffer 的 process放到 wq 這個 wait_queue 裡。在 interruptible_sleep_on 裡，則是最後會呼叫 schedule() 來做 schedule 的動作,誰調用了schedule誰就趴下,讓別人去運行,醒來就原地起來，執行schedule()後的代碼。那那個調用了schedule的傢伙什麼醒過來呢?這時候就需要用到另一個函數了wake_up_interruptible()了,如下所示:

if ( !is_empty ) {

read_from_buffer();

wake_up_interruptible( &wq );

}

這就wait_queue的用法,挺好懂的.那wait_queue到底是怎麼工作的呢?wait_queue_head_t是一個相單簡單的結構,在中,代碼如下:

--------------------------------------------------------------------------------

struct __wait_queue_head {

wq_lock_t lock;

truct list_head task_list;

#if WAITQUEUE_DEBUG

long __magic;

long __creator;#endif

};

typedef struct __wait_queue_head wait_queue_head_t;

其中task_list是一個正在睡眠的進程的鏈表,鏈表中的各個資料項目的類型是wait_queue_t，鏈表就是在中定義的通用鏈表,wait_queue_t代碼如下:

struct __wait_queue {

unsigned int flags;

#define WQ_FLAG_EXCLUSIVE 0x01

struct task_struct * task;

struct list_head task_list;

#if WAITQUEUE_DEBUG

long __magic;

long __waker;#endif

};

typedef struct __wait_queue wait_queue_t;

其實,主要的結構是wait_queue_t.讓我們來看一下interruptible_sleep_on的代碼中，代碼如下：

--------------------------------------------------------------------------------

#define SLEEP_ON_VAR /

unsigned long flags; /

wait_queue_t wait; /

init_waitqueue_entry(&wait, current); //用當前進程產生一個wait_queue_t

#define SLEEP_ON_HEAD /

spin_lock_irqsave(&q->lock,flags); /

__add_wait_queue(q, &wait); //把 wait 放到 q 所屬的wait_queue_t list 的開頭

spin_unlock(&q->lock);

#define SLEEP_ON_TAIL /

spin_lock_irq(&q->lock); /

__remove_wait_queue(q, &wait); /

spin_unlock_irqrestore(&q->lock, flags);

void interruptible_sleep_on(wait_queue_head_t *q)

{

SLEEP_ON_VAR

current->state = TASK_INTERRUPTIBLE;

SLEEP_ON_HEAD

schedule(); //狀態為TASK_INTERRUPTIBLE的進程是不會執行的

SLEEP_ON_TAIL

}

static inline void __add_wait_queue(wait_queue_head_t *head, wait_queue_t *new)

{

#if WAITQUEUE_DEBUG

if (!head || !new)

WQ_BUG();

CHECK_MAGIC_WQHEAD(head);

CHECK_MAGIC(new->__magic);

if (!head->task_list.next || !head->task_list.prev)

WQ_BUG();

#endif

list_add(&new->task_list, &head->task_list);

}

static inline void __remove_wait_queue(wait_queue_head_t *head, wait_queue_t *old)

{

#if WAITQUEUE_DEBUG

if (!old)

WQ_BUG();

CHECK_MAGIC(old->__magic);

#endif

list_del(&old->task_list);

}

static inline void __list_del(struct list_head *prev, struct list_head *next)

{

next->prev = prev;

prev->next = next;

}

/*** list_del - deletes entry from list.

* @entry: the element to delete from the list

* Note: list_empty on entry does not return true after this, the entry is in an undefined state.

static inline void list_del(struct list_head *entry)

{

__list_del(entry->prev, entry->next);

entry->next = (void *) 0;

entry->prev = (void *) 0;

}

上面的代碼都應該比較好懂.我們先用當前進程產生了一個wait_queue_t,把當前進程的state改成TASK_INTERRUPTIBLE,然後把這個wait_queue_t加到我們已經聲明並初始化好的全域變數q中去.這時調用shedule,current 所指到的 process 會被放到 scheduling queue 中等待被挑出來執行。執行完 schedule() 之後，current 就沒辦法繼續執行了。而當 current 以後被 wake up 時，就會從 schedule() 之後，也就是從 SLEEP_ON_TAIL 開始執行。我們現在當然明白wake_up_interruptible所需做的是把進程的狀態改成Running的,其代碼如下:

--------------------------------------------------------------------------------

#define wake_up_interruptible(x) __wake_up((x),TASK_INTERRUPTIBLE, 1)

void __wake_up(wait_queue_head_t *q, unsigned int mode, int nr_exclusive)

{

unsigned long flags;

if (unlikely(!q))

return;

spin_lock_irqsave(&q->lock, flags);

__wake_up_common(q, mode, nr_exclusive, 0);

spin_unlock_irqrestore(&q->lock, flags);

}

static inline void __wake_up_common(wait_queue_head_t *q, unsigned int mode, int nr_exclusive, int sync)

{

struct list_head *tmp;

unsigned int state;

wait_queue_t *curr;

task_t *p

list_for_each(tmp, &q->task_list) {

curr = list_entry(tmp, wait_queue_t, task_list);

p = curr->task;

state = p->state;

if ((state & mode) && try_to_wake_up(p, mode, sync) && ((curr->flags & WQ_FLAG_EXCLUSIVE) && !--nr_exclusive))

break;

}

static int try_to_wake_up(task_t * p, unsigned int state, int sync)

{

unsigned long flags;

int success = 0;

long old_state;

runqueue_t *rq;

sync &= SYNC_WAKEUPS;repeat_lock_task:

rq = task_rq_lock(p, &flags);

old_state = p->state;

if (old_state & state)

{ //狀態相同的就改

if (!p->array) {

* Fast-migrate the task if it's not running or runnable * currently. Do not violate hard affinity.

if (unlikely(sync&&!task_running(rq, p) && (task_cpu(p)!= smp_processor_id())&& (p->cpus_allowed & (1UL << smp_processor_id()))))

{

set_task_cpu(p, smp_processor_id());

task_rq_unlock(rq, &flags);

goto repeat_lock_task;

}

if (old_state == TASK_UNINTERRUPTIBLE)

rq->nr_uninterruptible--;

if (sync)

__activate_task(p, rq);

else {

activate_task(p, rq);

resched_task(rq->curr);

}

success = 1;

}

if (p->state >= TASK_ZOMBIE)

BUG();

p->state = TASK_RUNNING;

}

task_rq_unlock(rq, &flags);

return success;

}

由於 schedule的代碼量比較大,就不貼出來了。

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More