Linux Work queue and concurrent manageable work queues __linux

Source: Internet
Author: User
Tags garbage collection goto

slow working mechanism for concurrent manageable work queues

The reason for this is that "there has been a brief slow working mechanism in the kernel (slow work mechanism)" because in the mainline kernel, there have been slow working mechanisms (slow, work mechanism), but with concurrent management work queues (CMWQ) , it has all been replaced by CMWQ, fading out of mainline

In kernel code, you often want to postpone part of your work to a certain time in the future. There are many reasons for doing so, such as doing a lot of (or time-consuming) work when holding a lock, or trying to aggregate the work to get the performance of a batch, or calling a function that might cause sleep, which makes it very inappropriate to perform a new schedule at this point.

A number of mechanisms are provided in the kernel to provide deferred execution, such as the part of the lower half of the interrupt process that can delay the interruption of the context; the timer can specify that a certain amount of time is delayed to perform a work; The work queue allows execution to be deferred in a process context. In addition, there have been brief slow working mechanisms in the kernel (slow work mechanism), asynchronous function calls (asynchronous function calls), and various private implementations of thread pools. Of the many kernel infrastructure components listed above, the Work Force column is the most used.


Work Queues (workqueues)

Before you discuss, define the terms used in several cores that use the work queues to facilitate the description. Workqueues: All work items are arranged in the queue (the work that needs to be performed), so they are called work queues (workqueues). Worker thread: Worker threads is a kernel thread that executes individual work items in the work queue, which becomes idle when there are no work items in the work queue. Single threaded (ST):: One manifestation of worker threads, within system scope, only one worker thread serves work Queues multi threaded (MT): One manifestation of worker threads, with each CPU on a multiple-CPU system There is a worker thread serving the work queue

The work queue became the most used deferred execution mechanism, benefiting from some of the interesting things in its implementation: the interface used is straightforward

For users, there are basically only 3 things to do, to create a work queue (if you use the kernel default work queue, you can omit this step) create a work item to submit a work item to a work queue execution in the process context so that it can sleep, be scheduled, and preempted

Execution is a very big advantage in the context of the process, the other half of the work mechanism, basically run in the interrupt context, we know that in the interrupt context, can not sleep, can not be blocked, because the interrupt context is not associated with any process, such as in the context of the interruption of sleep, the scheduler will not be able to wake it, Therefore, in the interrupt context can not cause the kernel into sleep behavior, such as holding semaphores, the implementation of non-atomic memory allocation. The work queues run in the process context (they execute through the kernel thread), so they are fully sleep-capable and can be scheduled or preempted by other processes. The use of the multicore environment is also very friendly

Compared with the tasklet mechanism, work queues can be an advantage in running concurrently on different CPUs. This makes the interface ideal for multi-core situations, and there have been discussions in the kernel mailing list for tasklet that do not support multiple CPU execution with soft interrupts and work queues.

In general, work queues and timer functions are handled somewhat similarly, all of which are deferred execution-related callback functions, but unlike the timer handler function, the timer callback function executes only once (of course it can be registered again at execution time for repeated calls, but this needs to be shown again for registration), and executes the timer callback function when the clock interrupts the environment, more restrictive, so the callback function cannot be too complex, and the work queue is implemented through kernel threads, always valid, repeatable, and dormant during execution, so work queues are ideal for handling tasks that are not very urgent, such as garbage collection processing.

Use of the Task Force column and some deficiencies note

Before 2.6.20, the interface for creating work items did not look like this, but at 2.6.20, the interface to the work queue was "thin" for the simple reason that the Task Force column was used more and more, and that saving one byte was a good thing for the kernel, This time, slim. A work item that is defined when it is created is divided into general work items and items that need to be deferred for a certain period of time

The ease of use of work queues was briefly discussed, based on the steps used in the work queue, the interfaces provided before 2.6.36 are listed below, and some of the options used are described. Because there is already a default shared work queue in the implementation of the work queue, when you select an interface, there are 2 choices: either use a shared work queue that the kernel already provides, or create a work queue yourself.

If you choose to use a shared work queue, the basic steps are:

1. Create Work Items

The interfaces for creating work items are static and dynamic, and the interfaces are:
Listing 1. Create a work item statically

				
 typedef void (*work_func_t) (struct work_struct *work); 

 Declare_work (name, func); 
 Declare_delayed_work (name, func); 

The series macro statically creates a work item named with name and sets the callback function func
Listing 2. Create work Items dynamically

				
 Init_work (struct work_struct WORK, work_func_t func); 
 Prepare_work (struct work_struct WORK, work_func_t func); 
 Init_delayed_work (struct delayed_work WORK, work_func_t func); 
 Prepare_delayed_work (struct delayed_work WORK, work_func_t func); 

This series of macros initializes the work item work at run time and sets the callback function func

2. Scheduling Work Items
listing 3. Scheduling Work Items

				
 int schedule_work (struct work_struct *work); 
 int schedule_delayed_work (struct delayed_work *work, unsigned long delay); 

The above two functions add work items to the shared work queue, and the work items are then executed at a suitable time.

If, for some reason, you do not want or cannot use a shared work queue provided by the kernel if you want to perform a blocking task, you will need to create a work queue yourself, and the above steps and the interfaces that you use will change slightly:

3. Create a Work queue

Before 2.6.36, each work queue in the kernel has a dedicated kernel thread to service it, when the work queue is created, there are 2 choices, optional system-wide ST, or MT of one kernel thread per CPU, with the following interface:
Listing 4. Creating a Work Queue

				
 Create_singlethread_workqueue (name) 
 Create_workqueue (name) 

A Wq task queue is also allocated relative to Create_singlethread_workqueue,create_workqueue. The difference is that for a multiple-CPU system, for each active CPU, will create a PER-CPU CWQ structure, corresponding to each CWQ, will generate a new worker_thread.

4. Create Work Items

The interface for creating work items is the same as when using the kernel default shared work queue. submitting work items to the Task Force column
Listing 5. Submitting work items to the work queue

				
 int Queue_work (workqueue_t *queue, work_t *work); 
 int Queue_delayed_work (workqueue_t *queue, work_t *work, unsigned long delay); 

Both commit work item work to work queue queues, but the second function ensures that the work is not executed until the minimum delay delay jiffies. In the case of MT, when the work item node is committed with Queue_work to the CWQ, which active CPU is calling the function, the work item node is added to the worklist on the CWQ corresponding to the CPU.

If you need to cancel a work item in a pending work queue, you can call:
Listing 6. Canceling a pending work item in a work queue

				
 int cancel_delayed_work (struct work_struct *work); 

If the work item is canceled before it starts executing, the return value is Non-zero. The kernel guarantees that execution of a given work item will not be executed after the call Cancel_delay_work succeeds. If Cancel_delay_work returns 0, the work item may already be running on a different processor and may still be executed after the call to Cancel_delayed_work. To be absolutely sure that the work function does not run anywhere after Cancel_delayed_work returns 0, you must follow this call and then invoke Flush_workqueue. After the Flush_workqueue returns. Any work function that was submitted before the call was changed will not run anywhere on the system.

Once you have finished using a Task force column, you can use the following function to release the related resources:
Listing 7. Releasing work Queues

				
 void Destroy_workqueue (struct workqueue_struct *queue); 

The advantages of the work queue and other delay mechanisms based on the interrupt context are compared earlier, but the work queues are not without drawbacks. The first is that public shared work queues do not provide more benefits, because if any of these work items are blocked, other work items will not be executed, so in actual use, the user will create work queues on their own, which leads to some of the following problems: The MT's work queue has caused the kernel's thread count to increase very quickly , this poses some problems: one is to occupy the number of PID, which is not a good news for the server, because the PID is actually a global resource, and a large number of work-thread competition for resources also led to ineffective scheduling, and these scheduling is not needed, the scheduler also brought pressure. Existing work queue mechanisms have a tendency to cause deadlocks in some cases, especially if there is a dependency between two work items. If you've ever debugged this occasional deadlock, you know this is a very frustrating thing to do.


Concurrent manageable work queues (concurrency-managed workqueues)

The work queue before 2.6.36, the core of which is that each work queue has a proprietary kernel thread for its service-system-wide ST or each CPU has a kernel thread of Mt. The new CMWQ the implementation of this, no longer have a proprietary thread associated with each work queue, in fact, now turned into online CPU number + 1 lines pool for the work queue, so that the management of the thread actually from the work queue users back to the kernel. When a work item is created and queued, it is passed to one of the threads at the right time, and Cmwq's most interesting change is that it is committed to the same work queue, and that the same CPU's work items may execute concurrently, which is also the reason for naming concurrency to manage the work queues.

The implementation of CMWQ follows several principles: compatibility with the original Work queue interface, CMWQ only changes the interface to create work queues, and is easy to migrate to new interfaces. Work queues share the PER-CPU thread pool, providing a flexible concurrency level without wasting large amounts of resources. Automatically balances worker thread pools and concurrency levels so that users of work queues no longer need to focus on so many details.

In the eyes of the user in the work queue, CMWQ has changed the backend of the interface implementation that created the work queue, compared to the previous work queues, and the new interface is now:
Create a back-end interface for work queues in Listing 8. Cmwq

				
 struct workqueue_struct 
   *alloc_workqueue (char *name, unsigned int flags, int max_active); 

which

Name: The names of the work queues, rather than the kernel threads that were actually served for the work queue before 2.6.36.

Flag indicates the properties of the work queue, you can set the following tags: wq_non_reentrant: By default, work queues only ensure that they are not reentrant on the same CPU, that the work items cannot be executed concurrently on the same CPU by multiple worker threads, but are allowed to execute concurrently on multiple CPUs. However, the flag is also not reentrant on multiple CPUs, the work item queues in a non reentrant work queue, and ensures that worker threads are executed at most one system-wide. Wq_unbound: A work item is placed into an unqualified work queue of a specific GCWQ service that is not qualified to a specific CPU, so that the unqualified worker queue is like a simple execution context and is not managed concurrently. Unqualified Gcwq try to execute work items as quickly as possible. Wq_freezeable: Can freeze WQ participate in system suspend operation. The work item for the work queue will be paused unless it is awakened and no new work item is executed. Wq_mem_reclaim: All work queues may be used on the memory recycle path. Using this flag ensures that there is at least one execution context, regardless of any memory pressure. WQ_HIGHPRI: high-priority Work items are rehearsed on the queue header and are executed without regard to the concurrency level; in other words, high-priority work items are executed as quickly as possible as long as the resources are available. High priority work items are executed in the order in which they are submitted. Wq_cpu_intensive:cpu Intensive work items do not contribute to the concurrency level, in other words, a CPU-intensive work item that can be run will not block other work items. This is useful for qualifying work items because it expects more CPU clock cycles, so dispatch their execution to the system scheduler. Migration of code

In the previous code, some users relied on the strict order of execution in ST, which in CMWQ could set max_active to 1,flag to Wq_unbound to get the same behavior.

Max_active: Determines the maximum work item that a WQ can perform on per-cpu. For example, max_active set to 16 indicates that up to 16 work items on a Task Force column can be executed concurrently on the PER-CPU. In current practice, the maximum value for all qualifying work queues is 512, while the max_active is 256 when set to 0, and for unqualified work queues, the maximum is: max[512,4 * Num_possible_cpus ()], Unless there is a specific reason for limiting the flow or other reasons, generally set to 0 on it.

CMWQ essentially provides the implementation of a common kernel thread pool, its interface is basically compatible with the previous, just changing the back end of the function that created the work queue, which actually changes the one by one binding of the work queue and kernel thread to the kernel to manage the creation of the kernel thread, so the CMWQ The creation of a work queue does not necessarily mean that the kernel thread is created.

Instead, the previous interface is implemented based on Alloc_workqueue.
Listing 9. Implementation based on the new backend interface

				
 #define CREATE_WORKQUEUE (name) 					 \ 
	 alloc_workqueue ((name), Wq_mem_reclaim, 1) 
 #define Create_freezeable_ Workqueue (name) 			 \ 
	 alloc_workqueue (name), wq_freezeable | Wq_unbound | Wq_mem_reclaim, 1) 
 #define CREATE_SINGLETHREAD_WORKQUEUE (name) 			 \ 
	 alloc_workqueue (name), Wq_unbound | Wq_mem_reclaim, 1) 


hook function in Dispatcher

To know when a worker thread will sleep or be awakened, add a Pf_wq_worker type tag to the kernel, indicating that it is a worker thread, and adds 2 hook functions to the current scheduler.
listing 10 The hook function in the scheduler

				
 void wq_worker_waking_up (struct task_struct *task, unsigned int cpu); 
 struct task_struct *wq_worker_sleeping (struct task_struct *task, unsigned int cpu); 

Where wq_worker_waking_up is invoked in try_to_wake_up/try_to_wake_up_local when a worker thread is awakened. Wq_worker_sleeping is called in schedule (), indicating that the worker thread will sleep, and that the return value is a task that can be try_to_wake_up_local to wake up on the same CPU. Now the 2 hook functions are hard-coded in the kernel scheduler, and subsequent changes may change their implementation in other ways.

Back-end gcwq of concurrent manageable work queues

In the implementation of CMWQ, the most important is the gcwq of the following end:
list Gcwq.

				
 * * Global per-cpu workqueue. There ' one and only one for each CPU * and all works are queued and processed this regardless of their * target WORKQ 
 Ueues. 		 * * struct GLOBAL_CWQ {spinlock_t lock; 	 /* The GCWQ lock/struct List_head worklist; 		 /* L:list of pending works */unsigned int cpu; 		 /* I:the Associated CPU */unsigned int flags; 	 * l:gcwq_* flags */int nr_workers; 	 /* l:total Number of workers */int nr_idle; /* l:currently idle ones/* workers are chained either in the Idle_list or Busy_hash * * struct List_head idle 	 _list; 
						 /* X:list of idle workers/struct hlist_head busy_hash[busy_worker_hash_size]; 	 /* L:hash of busy workers * * struct timer_list idle_timer; 	 /* L:worker Idle Timeout * * struct timer_list mayday_timer; 	 /* L:sos Timer for dworkers * * struct IDA Worker_ida; *trustee; 	 /* L:for GCWQ shutdown * * unsigned int trustee_state; 	 /* l:trustee state */wait_queue_head_t trustee_wait; 	 /* Trustee Wait * * struct worker *first_idle; 
 /* L:first idle Worker/} ____CACHELINE_ALIGNED_IN_SMP;

It is used to manage the thread pool, which is one gcwq per CPU, and a specific GCWQ work item service for unqualified (unbound) work queues. Note that there is only a number ofonline CPU + 1 (unbound) thread pool in Cmwq. Since the count starts at 0, the maximum number of possible thread pools is nr_cpus. Because of the hot plug problem involving the CPU, only the online CPU on the thread pool and its binding.

Some of the important fields in the structure are as follows:

Worklist: All pending work items are linked to the list

CPU: Indicates the thread pool and which CPU binding, the implementation has a GCWQ not bound to any CPU, it is marked as Work_cpu_unbound, in code, this is not bound to a specific CPU gcwq and the GCWQ bound to the CPU, this definition should be defined Work_ Cpu_unbound = Nr_cpus, which is one of the little tricks in the code.

Nr_workers: Total number of worker threads

Nr_idle: Current number of idle worker threads

Idle_list: Idle worker line Cheng connected to the list

Busy_hash[busy_worker_hash_size]: The worker thread that is performing the work item task is placed in the hash table

With the previous basics, we can begin to look at the implementation of CMWQ, starting from the initialization section based on past experience:
List of CMWQ initialization

				
 static int __init init_workqueues (void) {unsigned int cpu; 
         int i; Register the notification chain of CPU events, mainly for processing CPU hot swap, the CPU on the work queue to the online CPU///in CMWQ, this mechanism is called Trustee Cpu_notifier (work 

         Queue_cpu_callback, Cpu_pri_workqueue); 
                  Initialize the number of CPUs + 1 Gcwq for_each_gcwq_cpu (CPU) {struct GLOBAL_CWQ *gcwq = get_gcwq (CPU); 
	 ...... . //Initialize online CPU number + 1 worker thread pools//created threads are named as follows://For CPU-bound threads, the PS command reads: [kworker/cup_id 
	 : THREAD_ID],CUP_ID is the number//thread_id of the CPU for the worker thread ID created, and for threads in the thread pool of the unbound CPU, display as//[kworker/u:thread_id] 
		 FOR_EACH_ONLINE_GCWQ_CPU (CPU) {..... 
	         Worker = Create_worker (Gcwq, true); 
		 ...... . 
                  Start_worker (worker); 
	 ...... . 
	 //Create 4 global work queues System_wq = Alloc_workqueue ("Events", 0, 0); 
	 System_long_wq = Alloc_workqueue ("Events_long", 0, 0); System_nrt_wq = Alloc_workqueue ("EVENTS_NRT", Wq_non_reentrant,0); 
         System_unbound_wq = Alloc_workqueue ("Events_unbound", Wq_unbound, wq_unbound_max_active); 
 ... return 0; 
 }

Management of worker thread pools

To implement the worker thread pool, for each worker thread, encapsulates a structure worker for worker thread management, as follows:
listing 13. Worker's management structure

 struct worker {//is associated with the state of the worker thread, use idle if the worker thread is in the entry state; 	 State,//Then use the hash node hentry to refer to the Idle_list and Busy_hash fields in the Gcwq union {struct List_head; 	 /* L:while Idle * * struct hlist_node hentry; 
         /* l:while busy * *; 	 ...//Scheduled work item list, note that only entry to the list, work items are actually handled by the work queue struct List_head scheduled; 		 /* l:scheduled works//is a kernel-scheduled entity that worker threads seem to be just a task in the kernel scheduler struct task_struct *task; 		 /* I:worker Task/* struct GLOBAL_CWQ *gcwq;  /* I:the Associated GCWQ/* bytes boundary on 64bit, 32bit///Record last active time to determine if the worker thread can be 	 unsigned long last_active is used in destory; 		 /* L:last Active timestamp */unsigned int flags; 		 /* X:flags//worker thread ID, with PS command in user space can see the specific value int ID; 	 /* I:worker ID */struct work_struct rebind_work; 
/* L:rebind worker to CPU */}; 

topics that are not discussed

This article does not discuss processing CPU hot-swappable and processing in the memory Reclaim path, both of which use the trustee and rescurer mechanisms in Cmwq, and interested readers can refer to the code or documentation themselves

The principal execution of the worker thread pool is worker_thread, and its execution process is as follows:
listing 14. Management of worker Threads

				
 static int worker_thread (void *__worker) {struct worker *worker = __worker; 

         struct GLOBAL_CWQ *gcwq = worker->gcwq; 
 Tell dispatcher this is a worker thread worker->task->flags |= pf_wq_worker; 
         WOKE_UP:SPIN_LOCK_IRQ (&gcwq->lock); ...//Let the worker leave the idle state, because the newly created worker thread is in the idle state and needs to leave from//idle state to perform the associated action Worker_leave_idle when the worker thread is working (WOR 
 Ker); Recheck://Check if there is a need for more worker threads/Check the basis is if there is a high priority work, if the work queue has work to do but the CPU's global queue has//has no idle processing kernel thread, it is necessary 

        The IF (!need_more_worker (GCWQ)) goto sleep is processed; may_start_working Check GCWQ If there are idle worker threads//manage_workers later in detail if (unlikely (!may_start_working) gcwq 

         p;& manage_workers (worker)) Goto recheck; 

         Ensure that the worker thread's scheduled list is empty bug_on (!list_empty (&worker->scheduled)); 

         The set tag indicates that worker threads are about to process related work items, similar to a busy tag worker_clr_flags (worker, Worker_prep); 
		 The basic process is to merge work items into the dispatched list of worker threads, and then process the work of the scheduled list in turn.struct Work_struct *work = list_first_entry (&gcwq->worklist, struct work_struct, entry); if (Likely (!) ( 
                           *work_data_bits (Work) & work_struct_linked)) {/* optimization path, not strictly necessary * * Note that this is just an optimization of the display on the code path, which does not need the path in nature, and the else part is where the code//logic is located, and this can be ignored Process_one_wor 
			 K (worker, work); 
		 if (Unlikely (!list_empty (&worker->scheduled)) Process_scheduled_works (worker); else {//move the GCWQ work item to the worker thread's scheduled list, and the worker thread will then process the scheduled list//handle single work 
			 The items are process_one_work move_linked_works (work, &worker->scheduled, NULL); 
		 Process_scheduled_works (worker); 

	 } while (Keep_working (GCWQ)); 

        Worker_set_flags (worker, Worker_prep, false); 
 If no work items need to be processed, let the worker thread sleep: ...}

Manage_workers to handle worker threads that need to be destroy, as well as the need to create new worker threads: in the maybe_destroy_workers to determine if the number of worker threads is considered too much (the nature of working threads is a policy issue , the practitioner believes that if the worker thread is idle by an excess of 1/4 busy workers, and the worker thread has entered the idle state for 5 minutes, it is determined that the workers ' threads can be destroy, and that Maybe_create_worker decides whether to create a new Worker threads for work queues, which are determined if there is a high priority job, or if there is work to be done in the work queue, but there is no idle processing kernel thread in the global queue of the CPU, then it is necessary to create a new worker thread.



The prospect of concurrent manageable work queues

Concurrent manageable work queues enter mainline, but have quickly replaced old work queue interfaces and slow working mechanisms (slow workmechanism), but this is not the only goal, its long-term goal is to provide a common thread pool mechanism in the kernel, so , the scope of application of the Task Force column would be more widespread.

Resources

Learn to view the article "Sched:prepare for Cmwq, Take#2", which describes the hooks in the kernel scheduler.

To view the article "Concurrency-managed workqueues", Jonathan Corbet describes in detail the original CMWQ, and gives a preliminary overview of the principles and challenges of the Tejun Heo solution. The

looks at the article "Working on Workqueues", which explains the new interface.

Reference concurrency Managed workqueue (CMWQ), which is a description of Cmwq's main contributors Tejun Heo.

in the DeveloperWorks Linux area, look for more resources for Linux developers (including beginners in Linux) to check out our most popular articles and tutorials.

Review all Linux tips and Linux tutorials on the developerWorks.

always focus on developerWorks technical activities and webcasts.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.