In this blog post, we put aside the suspicion of Aliyun, completely from the asp.net point of view, to see if we can find a more reasonable explanation of the problem phenomenon.
The main features of the "black 30-second" problem are: Queued requests (Requests Queued), the number of requests to reach HTTP.sys (arrival Rate) Drop, QPS (requests/sec) drop, CPU consumption down, current Connections up.
Last night around 18:08 happened 1 "Black 30 seconds", just take this case analysis.
1, why requests queued will increase suddenly?
The most immediate reason is that ASP.net does not have a thread available to handle the current request. Why is there no thread available? Asp. NET available threads, after all, are limited, and may be instant concurrent requests too much, ASP. NET too late to create enough threads to handle these requests.
Let's take a look at the asp.net thread-related settings--machine.config in the processmodel (located in C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Config).
There are 4 related settings: maxWorkerThreads (the default is Maxiothreads), minWorkerThreads (default is 1), miniothreads (the default value is 1). (These settings are for each CPU core)
We use the default settings, because our Web server is 8 core, so the actual maxworkerthreads is 160, the actual maxiothreads is 160, the actual minworkerthreads is 8, The actual miniothreads is 8.
Based on this setting, is there a queue if the instant concurrent request is 169? No, ASP. NET is not so stupid! Because the CLR can only create 2 threads in 1 seconds, when the thread is finished, the day Lily is cold. We speculate that ASP.net is simply based on this setting to predict whether the threads available in the thread pool are tense, whether new threads need to be created, and how many threads to create.
So what happens when there's a lot of requests in the black 30 seconds line? If the number of concurrent requests is normally 300, a sudden number of concurrent requests is 600, exceeding the number of available threads that the asp.net estimated, so requests that do not get the thread are queued for the executing request-freeing thread and the CLR creates a new thread. Over time, the freed threads + newly created threads are sufficient to handle these queued requests and return to normal.
So how do you validate this speculation? Modify maxWorkerThreads, Maxiothreads, minWorkerThreads, miniothreads settings, so that the ASP.net provides more available threads, currently we use the following settings:
<processmodel enable= "true" requestqueuelimit= "5000" maxworkerthreads= "a" maxiothreads= " "Miniothreads=" "/>"
If you use this setting, the "black 30 seconds" phenomenon almost does not appear, you can verify that the problem is in this place. Now the master station www.cnblogs.com has used this setting and needs to be observed for some time to verify.
Enlightenment
1 monitor \asp.net\requests queued through Windows Performance Monitor to visually assess the throughput capability (throughput) of the ASP.net application.
2) The ASP.net Asynchronous Programming (async/await) can effectively reduce the request queuing problem caused by the available thread tension.
2, why arrival rate will fall?
(The Orange Line in the picture above)
This is the "black 30 seconds" question the most puzzling place, ASP. NET requests how to queue, how can cause to arrive http.sys of request number drop? At first we always did not believe that the request line caused the arrival rate to drop, but the surveillance picture is irrefutable evidence.
Before writing this blog, we suddenly figured it out! Previously overlooked a place--when you play this blog, the 1th request is an HTML page, and if the request gets a normal response, the browser will issue multiple AJAX requests when it loads the page; if the 1th request is queued and the browser is waiting, subsequent AJAX requests will not be issued. This will reduce the number of requests to reach HTTP.sys. This also explains why it is sometimes arrival rate in the middle of "black 30 seconds" because there is a lot of Ajax in the page corresponding to the queued request, and when it ends up being queued, Many subsequent Ajax requests (many of which may be queued for such requests) have reached HTTP.sys.
As a result, we believe that the request to queue caused by the arrival rate decline.
Enlightenment
We should not confine our eyes to the problems we see at present, but consider the relationship between the various phenomena by combining many factors.
3, QPS decline
In the same vein as the arrival rate, QPS (REQUESTS/SEC) is directly related to arrival rate and is proportional to the relationship.
As a result, the QPS declined because the request was queued.
4. Decrease in CPU consumption
Also the same, arrival rate and QPS decline, indicating that the CPU to do less work, natural consumption will decline.
As a result, CPU consumption is reduced because requests are queued.
5, Current connections rise
The current connections is a direct representation of the request queue, the request has not been executed, and the connection will certainly remain.
As a result, the current connection is also rising because the request is queued.
6, see a new indicator requests executing
(The Green line above shows the requests executing)
During the request queue, the number of requests being executed by the ASP.net (Requests executing) is increasing, indicating that as the freed line Cheng and more new threads are created, the queued requests are being executed more and more. This illustrates from the side that the thread in the execution may be normal and not stuck. (This is further verified by the following IIS log information)
As a result, the Requests executing is also increasing because the request is queued, and it shows that the line is normal and no place is stuck.
7, look at the request in the IIS log Time-taken
In the "Black 30 seconds" phase, there is no time-taken over 1s in the IIS log! What does that mean? Indicates that the request being executed is handled quickly and there is no place to be stuck ... The request was queued in addition to insufficient available threads.
As a result, the IIS log indicates that everything is fine except for the queue.
Summary
If the "Black 30 seconds" problem is attributed to the ASP.net threading problem, other problems will be more reasonably explained in addition to the time of about 30 seconds.
Before writing this blog, we felt that the probability of a "black 30-second" problem caused by the ASP.net threading problem was 80%, and after the 7-point analysis, we thought the probability was 99%, unless the "black 30 seconds" of the analysis was not the same problem as the "black 30 seconds".
Now we also need to use the new settings (maxworkerthreads= "," maxiothreads= "," minworkerthreads= "," miniothreads= "50") after the validation.
The big ending is coming, the important thing is not what the outcome is, but the process, and we share the problem-solving process.