. Net thread pool insider,. net thread insider
This article analyzes and explains the ThreadPool source code of. NET4.5 to reveal the inside story of the. NET thread pool, and summarize the advantages and disadvantages of the ThreadPool design.
Role of Thread Pool
Thread Pool, as its name implies, is a thread object pool. Task and TPL are both useful to the thread pool, so understanding the thread pool details helps you write better programs. Due to limited space, I will only explain the following core concepts:
- Thread pool size
- How to call a thread pool to add a task
- How to execute tasks in the thread pool
Threadpool also supports IOCP thread control, but we will not study it here. The tasks and TPL involved will be explained in their respective blogs.
Thread pool size
Regardless of the pool, there is always a size, and ThreadPool is no exception. ThreadPool provides four methods to adjust the size of the thread pool:
- SetMaxThreads
- GetMaxThreads
- SetMinThreads
- GetMinThreads
SetMaxThreads specifies the maximum number of threads in the thread pool, and GetMaxThreads naturally obtains this value. SetMinThreads specifies the minimum number of surviving threads in the thread pool, and GetMinThreads obtains this value.
Why set a maximum number and a minimum number? The size of the original thread pool depends on several factors, such as the size of the virtual address space. For example, if your computer has 4 GB memory and the initial stack size of a thread is 1 MB, so you can create up to 4g/1 m threads (ignore the memory allocation of the operating system itself and other processes). This is because the thread has memory overhead, therefore, if there are too many threads in the thread pool and they are not fully used, this is a waste of memory. Therefore, the maximum number of threads in the thread pool is limited to make sense.
Why is the minimum number? The thread pool is the thread object pool. The biggest use of the Object pool is to reuse objects. Why do we need to reuse the thread? Because the creation and destruction of the thread takes a lot of cpu time. Therefore, in high concurrency, the thread pool saves a lot of time because it does not need to create and destroy threads, improving the system's response capability and throughput. The minimum number allows you to adjust the minimum number of live threads to cope with different high concurrency scenarios.
How to call a thread pool to add a task
The thread pool mainly provides two methods for calling: QueueUserWorkItem and UnsafeQueueUserWorkItem.
The codes of the two methods are basically the same. Except for different attributes, QueueUserWorkItem can be called by the code of partial trust, while UnsafeQueueUserWorkItem can only be called by the full trust code.
1 public static bool QueueUserWorkItem(WaitCallback callBack)2 {3 StackCrawlMark stackMark = StackCrawlMark.LookForMyCaller;4 return ThreadPool.QueueUserWorkItemHelper(callBack, (object) null, ref stackMark, true);5 }
QueueUserWorkItemHelper first calls ThreadPool. ensureVMInitialized () to ensure that the CLR Virtual Machine initialization (VM is a general term, not a Java virtual machine, but also the CLR execution engine), and then instantiate ThreadPoolWorkQueue, finally, call the Enqueue method of ThreadPoolWorkQueue and pass in callback and true.
1 [SecurityCritical] 2 public void Enqueue(IThreadPoolWorkItem callback, bool forceGlobal) 3 { 4 ThreadPoolWorkQueueThreadLocals queueThreadLocals = (ThreadPoolWorkQueueThreadLocals) null; 5 if (!forceGlobal) 6 queueThreadLocals = ThreadPoolWorkQueueThreadLocals.threadLocals; 7 if (this.loggingEnabled) 8 FrameworkEventSource.Log.ThreadPoolEnqueueWorkObject((object) callback); 9 if (queueThreadLocals != null)10 {11 queueThreadLocals.workStealingQueue.LocalPush(callback);12 }13 else14 {15 ThreadPoolWorkQueue.QueueSegment comparand = this.queueHead;16 while (!comparand.TryEnqueue(callback))17 {18 Interlocked.CompareExchange<ThreadPoolWorkQueue.QueueSegment>(ref comparand.Next, new ThreadPoolWorkQueue.QueueSegment(), (ThreadPoolWorkQueue.QueueSegment) null);19 for (; comparand.Next != null; comparand = this.queueHead)20 Interlocked.CompareExchange<ThreadPoolWorkQueue.QueueSegment>(ref this.queueHead, comparand.Next, comparand);21 }22 }23 this.EnsureThreadRequested();24 }
ThreadPoolWorkQueue mainly contains two "queue" (actually an array), one is QueueSegment (global work queue), and the other is WorkStealingQueue (local work queue ). The specific differences between the two are described in Task/TPL.
Because forceGlobal is true, comparand. TryEnqueue (callback) is executed, that is, QueueSegment. TryEnqueue. Comparand starts the enqueue from the queue header (queueHead). If not, it continues to enqueue, and then assigns the value to queueHead.
Let's take a look at the source code of QueueSegment:
1 public QueueSegment() 2 { 3 this.nodes = new IThreadPoolWorkItem[256]; 4 } 5 6 public bool TryEnqueue(IThreadPoolWorkItem node) 7 { 8 int upper; 9 int lower;10 this.GetIndexes(out upper, out lower);11 while (upper != this.nodes.Length)12 {13 if (this.CompareExchangeIndexes(ref upper, upper + 1, ref lower, lower))14 {15 Volatile.Write<IThreadPoolWorkItem>(ref this.nodes[upper], node);16 return true;17 }18 }19 return false;20 }
This so-called global work queue is actually an array of IThreadPoolWorkItem and is limited to 256. Why? Is it because it is aligned with the IIS thread pool (and there are only 256 threads? The interlock and memory write barrier volatile. write are used to ensure the correctness of nodes, which greatly improves the synchronization lock performance. Finally, EnsureThreadRequested is called. EnsureThreadRequested will call QCall to send the request to CLR, where the CLR schedules ThreadPool.
How to execute tasks in the thread pool
After a thread is scheduled, call back is executed through the Dispatch method of ThreadPoolWorkQueue.
1 internal static bool Dispatch() 2 { 3 ThreadPoolWorkQueue threadPoolWorkQueue = ThreadPoolGlobals.workQueue; 4 int tickCount = Environment.TickCount; 5 threadPoolWorkQueue.MarkThreadRequestSatisfied(); 6 threadPoolWorkQueue.loggingEnabled = FrameworkEventSource.Log.IsEnabled(EventLevel.Verbose, (EventKeywords) 18); 7 bool flag1 = true; 8 IThreadPoolWorkItem callback = (IThreadPoolWorkItem) null; 9 try10 {11 ThreadPoolWorkQueueThreadLocals tl = threadPoolWorkQueue.EnsureCurrentThreadHasQueue();12 while ((long) (Environment.TickCount - tickCount) < (long) ThreadPoolGlobals.tpQuantum)13 {14 try15 {16 }17 finally18 {19 bool missedSteal = false;20 threadPoolWorkQueue.Dequeue(tl, out callback, out missedSteal);21 if (callback == null)22 flag1 = missedSteal;23 else24 threadPoolWorkQueue.EnsureThreadRequested();25 }26 if (callback == null)27 return true;28 if (threadPoolWorkQueue.loggingEnabled)29 FrameworkEventSource.Log.ThreadPoolDequeueWorkObject((object) callback);30 if (ThreadPoolGlobals.enableWorkerTracking)31 {32 bool flag2 = false;33 try34 {35 try36 {37 }38 finally39 {40 ThreadPool.ReportThreadStatus(true);41 flag2 = true;42 }43 callback.ExecuteWorkItem();44 callback = (IThreadPoolWorkItem) null;45 }46 finally47 {48 if (flag2)49 ThreadPool.ReportThreadStatus(false);50 }51 }52 else53 {54 callback.ExecuteWorkItem();55 callback = (IThreadPoolWorkItem) null;56 }57 if (!ThreadPool.NotifyWorkItemComplete())58 return false;59 }60 return true;61 }62 catch (ThreadAbortException ex)63 {64 if (callback != null)65 callback.MarkAborted(ex);66 flag1 = false;67 }68 finally69 {70 if (flag1)71 threadPoolWorkQueue.EnsureThreadRequested();72 }73 return true;74 }
The while statement determines that if the execution time is less than 30 ms, it will continue to execute the next callback. This is because the thread switching of most machines is about 30 ms. If the thread only executes less than 30 ms and waits for the interrupted thread switching, it will be a waste of CPU resources. A waste of shame!
Dequeue is responsible for finding the callback to be executed:
1 public void Dequeue(ThreadPoolWorkQueueThreadLocals tl, out IThreadPoolWorkItem callback, out bool missedSteal) 2 { 3 callback = (IThreadPoolWorkItem) null; 4 missedSteal = false; 5 ThreadPoolWorkQueue.WorkStealingQueue workStealingQueue1 = tl.workStealingQueue; 6 workStealingQueue1.LocalPop(out callback); 7 if (callback == null) 8 { 9 for (ThreadPoolWorkQueue.QueueSegment comparand = this.queueTail; !comparand.TryDequeue(out callback) && comparand.Next != null && comparand.IsUsedUp(); comparand = this.queueTail)10 Interlocked.CompareExchange<ThreadPoolWorkQueue.QueueSegment>(ref this.queueTail, comparand.Next, comparand);11 }12 if (callback != null)13 return;14 ThreadPoolWorkQueue.WorkStealingQueue[] current = ThreadPoolWorkQueue.allThreadQueues.Current;15 int num = tl.random.Next(current.Length);16 for (int length = current.Length; length > 0; --length)17 {18 ThreadPoolWorkQueue.WorkStealingQueue workStealingQueue2 = Volatile.Read<ThreadPoolWorkQueue.WorkStealingQueue>(ref current[num % current.Length]);19 if (workStealingQueue2 != null && workStealingQueue2 != workStealingQueue1 && workStealingQueue2.TrySteal(out callback, ref missedSteal))20 break;21 ++num;22 }23 }
Because we have added the callback to the global work queue, the callback cannot be found in the local work queue (workStealingQueue. LocalPop (out callback). The local work queue will find the callback in the task. Then go to the global work queue query, first from the starting position of the global work queue to the end, so the callback in the global work quque is the FIFO execution order.
1 public bool TryDequeue(out IThreadPoolWorkItem node) 2 { 3 int upper; 4 int lower; 5 this.GetIndexes(out upper, out lower); 6 while (lower != upper) 7 { 8 // ISSUE: explicit reference operation 9 // ISSUE: variable of a reference type10 int& prevUpper = @upper;11 // ISSUE: explicit reference operation12 int newUpper = ^prevUpper;13 // ISSUE: explicit reference operation14 // ISSUE: variable of a reference type15 int& prevLower = @lower;16 // ISSUE: explicit reference operation17 int newLower = ^prevLower + 1;18 if (this.CompareExchangeIndexes(prevUpper, newUpper, prevLower, newLower))19 {20 SpinWait spinWait = new SpinWait();21 while ((node = Volatile.Read<IThreadPoolWorkItem>(ref this.nodes[lower])) == null)22 spinWait.SpinOnce();23 this.nodes[lower] = (IThreadPoolWorkItem) null;24 return true;25 }26 }27 node = (IThreadPoolWorkItem) null;28 return false;29 }
The spin lock and memory read barrier are used to avoid switching between kernel and user States and improve callback performance. If there is still no callback, select one from all the local work queue randomly, and then "steal" a task (callback) in the local work queue ).
After obtaining the callback, run callback. ExecuteWorkItem () to complete the notification.
Summary
ThreadPool provides methods to adjust the threads with the least active thread pool to cope with different concurrent scenarios. ThreadPool has two work queue, one golbal and one local. During execution, the task is first retrieved from the local, then global, and then a random local is selected to steal a task. The global is the FIFO execution order. Work queue is actually an array and uses a large number of spin locks and memory barrier to improve performance. However, it is too casual to randomly select a local for stealing tasks. First, we must consider that there must be executable tasks on the stolen queue. Secondly, we can select a local work queue that is not in the scheduling thread. This reduces the possibility of spin locks and speeds up the stealing; finally, you can consider stealing half of the other's queue tasks like golang, because after executing the stolen task, the next time this thread is scheduled again, it may not be able to execute any task, and it also needs to steal other tasks. This will not only waste CPU time, but also cause uneven distribution of tasks on threads, reduced system throughput!
In addition, if log and ETW trace are disabled, the ThreadPool performance can be further improved.