Linux kernel component analysis (11)-waitqueue and thread Blocking

Source: Internet
Author: User

When you need a complex system, coordinate all aspects of the system, and flexibly support various mechanisms and policies, even simple problems will become complicated. Linux is such a complicated system. So we need to understand it, try to understand the transaction processing process from the perspective of principle, try to avoid interference from various details, and try to avoid the giants that are enough to crush ourselves. (Although the detail and Blockbuster are likely to be part of the Linux flash, we should be careful .)

Principle

Now let's consider thread blocking in Linux. The principle is simple. We have a thread a to be blocked and the thread B to wake it up (of course, it can also be the Interrupt Processing Routine ISR ), there is a queue Q (maybe this queue belongs to a semaphore or something ). First, thread a is blocked. To add a queue Q, you must first apply for a queue node N, which contains a pointer to thread a's thread control block (TCB, then a can set its thread status to blocking and call schedule () to kick itself out of the CPU ready queue. After a certain period of time, thread B wants to wake up the thread in the queue Q. It only needs to obtain the TCB pointer of thread a and set the thread a status to ready. Wait for thread a to resume running, and then exit node n to wait for queue Q to complete the entire process from blocking to recovery.

The principle is boring. Let's look at the code below. We avoid complicated task status transition and scheduling. Even the analysis of the waiting queue is in the order from basic to extended. The code appears in three places: Include/Linux/Wait. H, kernel/Wait. C, kernel/sched. C. Needless to say, wait. H is the header file, and wait. c is the implementation, while sched. C is an application of waitqueue (completion ). To better analyze completion, we also need to include/Linux/completion. h.

Waitqueue implementation

We still look at the data structure first.

struct __wait_queue_head {spinlock_t lock;struct list_head task_list;};typedef struct __wait_queue_head wait_queue_head_t;typedef int (*wait_queue_func_t)(wait_queue_t *wait, unsigned mode, int flags, void *key);int default_wake_function(wait_queue_t *wait, unsigned mode, int flags, void *key);struct __wait_queue {unsigned int flags;#define WQ_FLAG_EXCLUSIVE0x01void *private;wait_queue_func_t func;struct list_head task_list;};typedef struct __wait_queue wait_queue_t;

Wait_queue_head_t indicates the waiting queue header, and wait_queue_t indicates the queue node.

Wait_queue_head_t includes a spin lock and a bidirectional cyclic queue task_list, Which is expected.

Wait_queue_t includes a large number of features. Let's take a look at it first.

The flags variable can only be 0 or wq_flag_exclusive. The flags flag only affects the operation when waiting for the queue to wake up the thread. If it is set to wq_flag_exclusive, only one thread can be awakened at a time. If it is set to 0, there is no limit.

The private pointer is actually a pointer to TCB.

Func is a function pointer pointing to the function used to wake up threads in the queue. Although the default wake-up function default_wake_function is provided, the wake-up function of the queue can be set flexibly.

Task_list is a two-way cyclic linked list node used to link the linked list of waiting queues.

According to the old example, waitqueue provides a variety of initialization functions after the data structure. Because there are too many parts, we have to list them in segments.

#define __WAITQUEUE_INITIALIZER(name, tsk) {\.private= tsk,\.func= default_wake_function,\.task_list= { NULL, NULL } }#define DECLARE_WAITQUEUE(name, tsk)\wait_queue_t name = __WAITQUEUE_INITIALIZER(name, tsk)#define __WAIT_QUEUE_HEAD_INITIALIZER(name) {\.lock= __SPIN_LOCK_UNLOCKED(name.lock),\.task_list= { &(name).task_list, &(name).task_list } }#define DECLARE_WAIT_QUEUE_HEAD(name) \wait_queue_head_t name = __WAIT_QUEUE_HEAD_INITIALIZER(name)

This is defined by a macro and initialized when a variable is declared.

void __init_waitqueue_head(wait_queue_head_t *q, struct lock_class_key *key){spin_lock_init(&q->lock);lockdep_set_class(&q->lock, key);INIT_LIST_HEAD(&q->task_list);}#define init_waitqueue_head(q)\do {\static struct lock_class_key __key;\\__init_waitqueue_head((q), &__key);\} while (0)#ifdef CONFIG_LOCKDEP# define __WAIT_QUEUE_HEAD_INIT_ONSTACK(name) \({ init_waitqueue_head(&name); name; })# define DECLARE_WAIT_QUEUE_HEAD_ONSTACK(name) \wait_queue_head_t name = __WAIT_QUEUE_HEAD_INIT_ONSTACK(name)#else# define DECLARE_WAIT_QUEUE_HEAD_ONSTACK(name) DECLARE_WAIT_QUEUE_HEAD(name)#endif

This piece of code is actually not acceptable, but we also covered it because it is simple details.

Init_wait_queue_head () initializes the waiting queue header.

The macro declare_wait_queue_head_onstack is also defined. Determine whether to use config_lockdep Based on the configuration.

The spinlock is very complex. When config_lockdep is configured, a local static variable _ key is defined to check the correctness of the use of the spinlock. The check process is very complicated, but since it is a check, it can

Yes. Because local static variables are used, only the variables defined on the stack can be checked. Therefore, declare_wait_queue_head_onstack is used. You can see a lot of places where you use spinlock.

This check.

static inline void init_waitqueue_entry(wait_queue_t *q, struct task_struct *p){q->flags = 0;q->private = p;q->func = default_wake_function;}static inline void init_waitqueue_func_entry(wait_queue_t *q,wait_queue_func_t func){q->flags = 0;q->private = NULL;q->func = func;}static inline int waitqueue_active(wait_queue_head_t *q){return !list_empty(&q->task_list);}

Init_waitqueue_entry () and init_waitqueue_func_entry () are functions used to initialize waitqueue.

Waitqueue_active () check whether there are waiting threads in the queue.

static inline void __add_wait_queue(wait_queue_head_t *head, wait_queue_t *new){list_add(&new->task_list, &head->task_list);}/* * Used for wake-one threads: */static inline void __add_wait_queue_tail(wait_queue_head_t *head,wait_queue_t *new){list_add_tail(&new->task_list, &head->task_list);}static inline void __remove_wait_queue(wait_queue_head_t *head,wait_queue_t *old){list_del(&old->task_list);}

_ Add_wait_queue () adds the node to the waiting queue header.

_ Add_wait_queue_tail (): add the node to the end of the waiting queue.

_ Remove_wait_queue () deletes the node from the waiting queue.

These three operations are simply implemented using the linked list operation.

void add_wait_queue(wait_queue_head_t *q, wait_queue_t *wait){unsigned long flags;wait->flags &= ~WQ_FLAG_EXCLUSIVE;spin_lock_irqsave(&q->lock, flags);__add_wait_queue(q, wait);spin_unlock_irqrestore(&q->lock, flags);}void add_wait_queue_exclusive(wait_queue_head_t *q, wait_queue_t *wait){unsigned long flags;wait->flags |= WQ_FLAG_EXCLUSIVE;spin_lock_irqsave(&q->lock, flags);__add_wait_queue_tail(q, wait);spin_unlock_irqrestore(&q->lock, flags);}void remove_wait_queue(wait_queue_head_t *q, wait_queue_t *wait){unsigned long flags;spin_lock_irqsave(&q->lock, flags);__remove_wait_queue(q, wait);spin_unlock_irqrestore(&q->lock, flags);}

Add_wait_queue () adds the node to the waiting queue header.

Add_wait_queue_exclusive (): add the node to the end of the waiting queue.

Remove_wait_queue () deletes the node from the waiting queue.

The biggest difference between the three functions and the first three functions is that the spin locks that prohibit interruption are added here. We can also see a special feature of Linux code. Functions prefixed with double underscores are often called internally. You must be aware of the functions of this function even if it is used by the outside world. For example, the preceding functions include _ add_wait_queue, it can only be called when a spin lock with Guanzhong disconnection is added, with the aim of eliminating repeated locks. While add_wait_queue () and other functions are more stable.

Maybe you think it's incredible, but waitqueue is that simple. Let's take a look at how to use it to implement completion.

Waitqueue usage-completioncompletion is a semaphore mechanism built on waitqueue. Its interface is simple and its function is simpler. It is the best encapsulation example on waitqueue.
struct completion {unsigned int done;wait_queue_head_t wait;};

The completion structure is very simple. Done is used for counting and wait is used to save the waiting queue.

# Define completion_initializer (work) \ {0, _ wait_queue_head_initializer (work ). wait) }# define completion_initializer_onstack (work) \ ({init_completion (& work); work;}) # define declare_completion (work) \ struct completion work = completion_initializer (work) # ifdef config_lockdep # define declare_completion_onstack (work) \ struct completion work = completion_initializer_onstack (work) # else # define declare_completion_onstack (work) declare_completion (work) # endifstatic inline void init_completion (struct completion * X) {X-> done = 0; init_waitqueue_head (& X-> wait );}
/* reinitialize completion */

# Define init_completion (x). Done = 0)

The above is the initial macro definition and initialization function of the completion structure. We can see config_lockdep again, which is already familiar.

/** * wait_for_completion: - waits for completion of a task * @x:  holds the state of this particular completion * * This waits to be signaled for completion of a specific task. It is NOT * interruptible and there is no timeout. * * See also similar routines (i.e. wait_for_completion_timeout()) with timeout * and interrupt capability. Also see complete(). */void __sched wait_for_completion(struct completion *x){wait_for_common(x, MAX_SCHEDULE_TIMEOUT, TASK_UNINTERRUPTIBLE);}static long __schedwait_for_common(struct completion *x, long timeout, int state){might_sleep();spin_lock_irq(&x->wait.lock);timeout = do_wait_for_common(x, timeout, state);spin_unlock_irq(&x->wait.lock);return timeout;}static inline long __scheddo_wait_for_common(struct completion *x, long timeout, int state){if (!x->done) {DECLARE_WAITQUEUE(wait, current);wait.flags |= WQ_FLAG_EXCLUSIVE;__add_wait_queue_tail(&x->wait, &wait);do {if (signal_pending_state(state, current)) {timeout = -ERESTARTSYS;break;}__set_current_state(state);spin_unlock_irq(&x->wait.lock);timeout = schedule_timeout(timeout);spin_lock_irq(&x->wait.lock);} while (!x->done && timeout);__remove_wait_queue(&x->wait, &wait);if (!x->done)return timeout;}x->done--;return timeout ?: 1;}

Wait_for_completion () blocks the thread on completion. Do_wait_for_common blocking is called when the Count value is 0. Do_wait_for_common () first defines an initialized wait_queue_t with declare_waitqueue () and calls _ add_wait_queuetail () to add the node to the end of the waiting queue. Then call signal_pending_state () to check the thread signal and wait state. If the signal is allowed to respond and the signal is blocked on the thread, the system will return the-erestartsys directly without blocking any more. Otherwise, call _ set_current_state () to set the thread status (the thread blocking status is divided into task_interruptible and task_uninterruptible. The former allows signal interruption, and the latter does not.) And call schedule_timeout () swap out the current thread from the ready queue. Note that completion checks whether the count can be occupied when it is awakened. Sometimes it is necessary to block it again when it is awakened but cannot be occupied. After obtaining the count, call _ remove_wait_queue () to delete the local variable node from the waiting queue.

The C statement in the last line of do_wait_for_common () does not comply with the standard, which is part of the GCC extension. If the value of timeout is 0, 1 is returned. Otherwise, the value of timeout is returned. Schedule_timeout () enables the current thread to sleep at least timeout jiffies time slices. If the timeout value is max_schedule_timeout, it will sleep infinitely. The return value is 0. If it is restored in advance due to the response signal, the remaining timeout count is returned.
unsigned long __schedwait_for_completion_timeout(struct completion *x, unsigned long timeout){return wait_for_common(x, timeout, TASK_UNINTERRUPTIBLE);}int __sched wait_for_completion_interruptible(struct completion *x){long t = wait_for_common(x, MAX_SCHEDULE_TIMEOUT, TASK_INTERRUPTIBLE);if (t == -ERESTARTSYS)return t;return 0;}unsigned long __schedwait_for_completion_interruptible_timeout(struct completion *x,  unsigned long timeout){return wait_for_common(x, timeout, TASK_INTERRUPTIBLE);}int __sched wait_for_completion_killable(struct completion *x){long t = wait_for_common(x, MAX_SCHEDULE_TIMEOUT, TASK_KILLABLE);if (t == -ERESTARTSYS)return t;return 0;}

Wait_for_completion_timeout () is blocked with a timeout value.

Wait_for_completion_interruptible () uses blocking that allows signal interruption. Wait_for_completion_interruptible_timeout () is blocked by a blocked signal with a timeout value. Wait_for_completion_killable () uses blocking that can be killed. All four are variants of wait_for_completion (), which are implemented by calling do_wait_for_common () through wait_for_common.

void complete(struct completion *x){unsigned long flags;spin_lock_irqsave(&x->wait.lock, flags);x->done++;__wake_up_common(&x->wait, TASK_NORMAL, 1, 0, NULL);spin_unlock_irqrestore(&x->wait.lock, flags);}/* * The core wakeup function. Non-exclusive wakeups (nr_exclusive == 0) just * wake everything up. If it's an exclusive wakeup (nr_exclusive == small +ve * number) then we wake all the non-exclusive tasks and one exclusive task. * * There are circumstances in which we can try to wake a task which has already * started to run but is not in state TASK_RUNNING. try_to_wake_up() returns * zero in this (rare) case, and we handle it by continuing to scan the queue. */static void __wake_up_common(wait_queue_head_t *q, unsigned int mode,int nr_exclusive, int wake_flags, void *key){wait_queue_t *curr, *next;list_for_each_entry_safe(curr, next, &q->task_list, task_list) {unsigned flags = curr->flags;if (curr->func(curr, mode, wake_flags, key) &&(flags & WQ_FLAG_EXCLUSIVE) && !--nr_exclusive)break;}}

Complete () Wake up the blocked thread. Call _ wake_up_common.

Here, curr-> func () calls default_wake_function ().

int default_wake_function(wait_queue_t *curr, unsigned mode, int wake_flags,  void *key){return try_to_wake_up(curr->private, mode, wake_flags);}

Default_wake_function () wakes up the sleeping thread and calls try_to_wake_up. The content of try_to_wake_up () involves the TCB status and so on. We ignore it.

void complete_all(struct completion *x){unsigned long flags;spin_lock_irqsave(&x->wait.lock, flags);x->done += UINT_MAX/2;__wake_up_common(&x->wait, TASK_NORMAL, 0, 0, NULL);spin_unlock_irqrestore(&x->wait.lock, flags);}

Complete_all () Wake up all threads waiting for completion.

bool try_wait_for_completion(struct completion *x){int ret = 1;spin_lock_irq(&x->wait.lock);if (!x->done)ret = 0;elsex->done--;spin_unlock_irq(&x->wait.lock);return ret;}

Try_wait_for_completion () tries to obtain the semaphore count without blocking.

/** *completion_done - Test to see if a completion has any waiters *@x:completion structure * *Returns: 0 if there are waiters (wait_for_completion() in progress) * 1 if there are no waiters. * */bool completion_done(struct completion *x){int ret = 1;spin_lock_irq(&x->wait.lock);if (!x->done)ret = 0;spin_unlock_irq(&x->wait.lock);return ret;}

Completion_done () Check for thread blocking. However, the implementation here is too simple, because when 0 is returned, there may be no thread blocking, and it may only be used in special cases or loose scenarios.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.