This article through to. NET4.5 's ThreadPool Source analysis explains the inside of the. NET thread pool and summarizes the good and lack of threadpool design.
The role of the thread pool
The thread pool, as the name implies, is the threading object. Both task and TPL are useful to the thread pool, so knowing the inside of the thread pool can help you write better programs. As space is limited, here I only explain the following core
Concept:
The size of the thread pool
How to call the thread pool to add a task
How the thread pool performs tasks
ThreadPool also supports manipulating IOCP threads, but here we don't study it, involving both task and TPL in their respective blogs.
The size of the thread pool
No matter what the pool, there is always size, ThreadPool is no exception. ThreadPool provides 4 ways to adjust the size of the thread pool:
SetMaxThreads
GetMaxThreads
SetMinThreads
GetMinThreads
SETMAXTHREADS Specifies the maximum number of threads a thread pool can have, and getmaxthreads naturally gets this value. SETMINTHREADS Specifies the minimum number of threads that are alive in the thread pool, and getminthreads is getting this value.
Why set a maximum number and a minimum quantity? The size of the original thread pool depends on several factors, such as the size of the virtual address space. For example, if your computer is 4g of memory and the initial stack size of a thread is 1m, you can create up to 4g/1m threads (ignoring the operating system itself and other process memory allocations), because threads have memory overhead, so if the thread pool has too many threads and is not fully used, So this is a waste of memory, so limiting the maximum number of thread pools is make sense.
So what's the minimum number? Thread pools are the object pool of threads, and the greatest use of object pooling is to reuse objects. Why reuse a thread, because it takes a lot of CPU time to create and destroy threads. Therefore, in high concurrency, the thread pool saves a lot of time by eliminating the need to create destruction threads, which improves the responsiveness and throughput of the system. The minimum number allows you to adjust the minimum number of surviving threads to cope with different high concurrency scenarios.
How to call the thread pool to add a task
The thread pool provides 2 main methods to invoke: QueueUserWorkItem and Unsafequeueuserworkitem.
The code of the two methods is basically the same, except attribute, QueueUserWorkItem can be called by the partial trust code, and Unsafequeueuserworkitem can only be called by the Full trust code.
public static bool QueueUserWorkItem (WaitCallback callBack) {StackCrawlMark stackmark = Stackcrawlmark.lookformycaller ; return Threadpool.queueuserworkitemhelper (CallBack, (object) null, ref stackmark, True);}
Queueuserworkitemhelper first calls threadpool.ensurevminitialized () to ensure that the CLR virtual machine is initialized (the VM is a generic, not a single-fingered Java virtual machine, It can also refer to the CLR's execution engine), followed by instantiating Threadpoolworkqueue, and finally calling Threadpoolworkqueue's Enqueue method and passing in callback and true.
securitycritical]public void Enqueue (Ithreadpoolworkitem callback, BOOL Forceglobal) {threadpoolworkqueuethreadlocals queuethreadlocals = (threadpoolworkqueuethreadlocals) null;if (! Forceglobal) queuethreadlocals = threadpoolworkqueuethreadlocals.threadlocals;if (this.loggingEnabled) FrameworkEventSource.Log.ThreadPoolEnqueueWorkObject ((object) callback); if (queuethreadlocals! = null) { QueueThreadLocals.workStealingQueue.LocalPush (callback);} Else{threadpoolworkqueue.queuesegment comparand = This.queuehead;while (!comparand. Tryenqueue (callback)) {interlocked.compareexchange<threadpoolworkqueue.queuesegment> (ref comparand. Next, New Threadpoolworkqueue.queuesegment (), (threadpoolworkqueue.queuesegment) null); for (; Comparand. Next! = NULL; Comparand = This.queuehead) interlocked.compareexchange<threadpoolworkqueue.queuesegment> (ref this.queueHead , Comparand. Next, Comparand);}} This. Ensurethreadrequested ();}
The threadpoolworkqueue consists of 2 "queue" (actually an array), one for queuesegment (global Work queue) and the other for the Workstealingqueue (local work queue). The specific difference between the two will be explained in the TASK/TPL, here is not to explain.
Because Forceglobal is true, it is executed to comparand. Tryenqueue (callback), also known as Queuesegment.tryenqueue. Comparand first start from the head of the queue (Queuehead) Enqueue, if not, continue to enqueue, success and then assign to Queuehead.
Let's take a look at the source code of Queuesegment:
Public queuesegment () {this.nodes = new ithreadpoolworkitem[256];} public bool Tryenqueue (Ithreadpoolworkitem node) {int Upper;int lower;this. Getindexes (out upper, out lower) and while (upper! = this.nodes.Length) {if (this. Compareexchangeindexes (ref upper, Upper + 1, ref lower, lower)) {volatile.write<ithreadpoolworkitem> (ref This.nodes[upper], node); return true;}} return false;}
This so-called global work queue is actually an array of ithreadpoolworkitem, and the limit is 256, which is why? Is it because it is aligned with the IIS thread pool (and only 256 threads)? The use of interlock and memory write barrier Volatile.write to ensure the correctness of nodes, compared to the synchronous lock performance is greatly improved.
The last call to ensurethreadrequested,ensurethreadrequested calls Qcall to send the request to the CLR, which is dispatched by the CLR ThreadPool.
How the thread pool performs tasks
After the thread is dispatched, the callback is executed through the Threadpoolworkqueue dispatch method.
Internal static bool Dispatch () {threadpoolworkqueue Threadpoolworkqueue = threadpoolglobals.workqueue;int TickCount = Environment.tickcount;threadpoolworkqueue.markthreadrequestsatisfied (); threadpoolworkqueue.loggingenabled = FrameworkEventSource.Log.IsEnabled (Eventlevel.verbose, (eventkeywords)); bool Flag1 = True;ithreadpoolworkitem callback = (ithreadpoolworkitem) null;try{threadpoolworkqueuethreadlocals tl = Threadpoolworkqueue.ensurecurrentthreadhasqueue (); while (long) (Environment.tickcount-tickcount) < (long) Threadpoolglobals.tpquantum) {Try{}finally{bool missedsteal = False;threadpoolworkqueue.dequeue (tl, out callback, out Missedsteal); if (callback = = null) Flag1 = missedsteal;elsethreadpoolworkqueue.ensurethreadrequested ();} if (callback = = null) return true;if (threadpoolworkqueue.loggingenabled) FrameworkEventSource.Log.ThreadPoolDequeueWorkObject ((object) callback); if ( threadpoolglobals.enableworkertracking) {bool Flag2 = False;try{try{}finally{threadpool.reportthreadstaTus (true); Flag2 = true;} Callback. Executeworkitem (); callback = (Ithreadpoolworkitem) null;} Finally{if (Flag2) Threadpool.reportthreadstatus (false);}} Else{callback. Executeworkitem (); callback = (Ithreadpoolworkitem) null;} if (! Threadpool.notifyworkitemcomplete ()) return false;} return true;} catch (ThreadAbortException ex) {if (callback! = NULL) callback. Markaborted (ex); Flag1 = false;} Finally{if (FLAG1) threadpoolworkqueue.ensurethreadrequested ();} return true;}
The while statement will continue to execute the next callback if the execution time is less than 30ms. This is because most machine thread switching is probably at 30ms, if the thread only executes less than 30ms in the wait for the disconnection process switch that is too wasteful CPU, waste shameful AH!
Dequeue is responsible for finding the callback to be executed:
public void Dequeue (Threadpoolworkqueuethreadlocals tl, out Ithreadpoolworkitem callback, out bool Missedsteal) { callback = (Ithreadpoolworkitem) Null;missedsteal = false; Threadpoolworkqueue.workstealingqueue workStealingQueue1 = Tl.workstealingqueue;workstealingqueue1.localpop (out callback); if (callback = = null) {for (threadpoolworkqueue.queuesegment comparand = This.queuetail;!comparand. Trydequeue (out callback) && Comparand. Next! = null && comparand. Isusedup (); Comparand = This.queuetail) interlocked.compareexchange<threadpoolworkqueue.queuesegment> (ref this.queueTail , Comparand. Next, comparand);} if (callback! = null) return; Threadpoolworkqueue.workstealingqueue[] current = Threadpoolworkqueue.allthreadqueues.current;int num = Tl.random.Next (current. length); for (int length = current. Length; Length > 0; --length) {Threadpoolworkqueue.workstealingqueue workStealingQueue2 = volatile.read< threadpoolworkqueue.workstealingqueue> (ref current[num% current. Length]); if (workStealingQueue2! = NULL && workStealingQueue2! = workStealingQueue1 && Workstealingqueue2.trysteal (out callback, ref missedsteal)) Break;++num;}}
Because we added callback to the global work queue, the local work queue (Workstealingqueue.localpop callback) could not find callback,local Work Queue Lookup callback will be explained in the task. Then go to global Work queue lookup, first from the start of the global work queue to find the tail, so the callback in global work Quque is the FIFO execution order.
public bool Trydequeue (out Ithreadpoolworkitem node) {int Upper;int lower;this. Getindexes (out upper, out lower) and while (lower! = Upper) {//ISSUE:EXPLICIT reference operation//issue:variable of a Refe rence typeint& prevupper = @upper;//issue:explicit reference Operationint newupper = ^prevupper;//issue:explicit r Eference operation//issue:variable of a reference typeint& prevlower = @lower;//ISSUE:EXPLICIT reference operation int newlower = ^prevlower + 1;if (this. Compareexchangeindexes (Prevupper, Newupper, Prevlower, Newlower)) {SpinWait SpinWait = new SpinWait (); while (node = Volatile.read<ithreadpoolworkitem> (ref this.nodes[lower])) = = null) spinwait.spinonce (); This.nodes[lower] = ( Ithreadpoolworkitem) Null;return true;}} node = (Ithreadpoolworkitem) Null;return false;}
Using spin lock and memory read barrier to avoid the switching of kernel State and user state, improve the performance of acquiring callback. If there is still no callback, select one randomly from all local work queues and then "steal" a task (callback) in the local job queue.
Get callback and execute callback. Executeworkitem (), the notification is complete.
Summarize
ThreadPool provides a way to adjust the thread pool for the least active threads to cope with different concurrency scenarios. ThreadPool with 2 work queues, one golbal and one local.
Perform a task from the local, then go to global, and finally go to randomly select a local to steal a task, where global is the FIFO execution order.
The work queue is actually an array that uses a large number of spin locks and memory barriers to improve performance. But in the steal task, whether you can consider more, randomly choose a local too casual.
The first thing to consider is that there must be an executable on the queue, and then you can choose a local work queue that is not in the schedule, which reduces the possibility of spin lock and speeds up the stealing; Finally, you can consider stealing half of the tasks in a queue like Golang. , because after the execution of the stolen task, the next time the thread is dispatched to or maybe no task can be executed, but also to steal someone else's task, so that both wasted CPU time, and let the task on-line distribution uneven, reducing the system throughput!
In addition, if you disable log and ETW trace, you can make ThreadPool performance a step further.
The above is. NET programming thread pool insider content, more relevant content please pay attention to topic.alibabacloud.com (www.php.cn)!