One case of high CPU solution without windbg

Source: Internet
Author: User

Scenario: A 400-700 requests/Sec web app encounters a high CPU problem after a major revision. When the load increases, the system starts to slow down accordingly.

Basics: This article focuses on the process of problem analysis. Windows User State written by lixiongProgramThe efficient troubleshooting Book tells the truth from the very beginning: the key to finding a problem is not to use windbg, but to analyze the problem.

First of all, this is a Web application for Web chat and message delivery, so generally there is no transmission or processing of big data. CPU in75-90%Fluctuation. In addition, the program uses the lion exception Capture component. That is to say, all the situations that can be seen as Asp.net resulting in yellow screen of death can be captured. So when there is a problem with the Asp.net program, we should first look at the situation from perfmon.

Open perfmon and add several counters. Wait for a while and check the average value:

Request/sec: 450
Request in application queen: 0 (no hang)
Both virtual bytes and private bytes are stable, and the relationship between them is about 4 times.
# Bytes in all heaps: fluctuating
......
......

The web system is slow in response. Generally, I will first consider whether there is a lock that causes hang. Everyone is waiting, so that the value of request in application queen is not 0, when this value continues to increase to a certain extent, the access program will directly throw you error 503. However, the value here is 0, indicating that the possibility of Hang is too small.

Virtual bytes = 4 times the size of private bytes, which seems a bit problematic. Generally, virtual bytes is not larger than 2 times the size of private bytes. Otherwise, check whether there is a fragment problem, check web. config, DEBUG = false. At this time, the possibility of fragment is relatively low.

# The value of bytes in all heaps should be changed only when GC is collected, but the virtual bytes and private bytes are both stable and there is basically no possibility of memory leak, the program is not used.Code. So # Why are bytes in all heaps activities frequently?

Since the commonly used counter cannot explain more problems, I have added several counters related to. net. Suddenly I was surprised by the value of # Of exceps thrown/sec:Throw 480 exceptions in seconds!

# Of exceps thrown/sec counters indicate the number of exceptions that occur every second. The exceptions here include those that have been caught. Generally, this value is normal at 20. As mentioned above, I have not logged so many exceptions, that is, all these exceptions are thrown by. Net or Asp.net. Here I separate. NET and Asp.net for two reasons:

    1. If some of the unmanaged resources called by the. NET platform return incorrect hresult formats, an exception occurs.
    2. The Asp.net framework also throws an exception in some places, but it is handled by itself. But we didn't notice it.

OK. Do not rush to capture dump to find the specific reason. Continue to view perfmon. Delete some counters and retain a few. After adjusting the color and proportion, observe the trend for a period of time: the CPU, number of exceptions, and number of requests are roughly the same, and the difference is basically constant, at least we can see that:

    1. The CPU is so high, exception is indispensable.
    2. Almost every request throws an exception, but certainly not all requests throw an exception (otherwise it should be near a proportional relationship)
    3. There are always some requests that throw more than one exception at a time (or determined by a non-proportional relationship)

Usually a large web application only has a fixed number of pages with a large access volume (of course, this is a special case when SPIDER activities are not frequent ). What kind of code will be called so frequently? Generally, there are three possibilities: httpmoudle, httphandler, and something like a custom page base class.

Find them one by one in sequence. You 'd better find a place and use the context. server. Transfer () method in httphandler. The transfer () method of the httpserverutitility class can be viewed with reflector:

1 Public   Void Transfer ( String Path, Bool Preserveform)
2 {
3 Page Handler =   This . _ Context. Handler As Page;
4 If (Handler ! =   Null ) && Handler. iscallback)
5 {
6 Throw New Applicationexception (Sr. getstring ("Transfer_not_allowed_in_callback"));
7}
8 This . Execute (path, Null , Preserveform );
9 This . _ Context. response. End ();
10 }

See the problem: If the Handler has a problem, throw an applicationexception. Otherwise, the response is called. end () method, while response. end () also throw one: (thanks for overred correction)

1 Public   Void End ()
2 {
3 If ( This . _ Context. isincancellableperiod)
4 {
5 Internalsecuritypermissions. controlthread. Assert ();
6 Thread. currentthread. Abort ( NewHttpapplication. cancelmoduleexception (False ));
7 }
8 Else   If ( ! This . _ Flushing)
9 {
10 This . Flush ();
11 This . _ Ended =   True ;
12 If ( This . _ Context. applicationinstance ! =   Null )
13 {
14 This . _ Context. applicationinstance. completerequest ();
15 }
16 }
17 }

These handler processes frequently requested pages, and one request has at least two exceptions.

I would like to leave two points to discuss with you:

    1. Why is the CPU High? Except for other parts, creating exceptions makes it very expensive to collect callstack and the data. Then GC is responsible for clearing these items, which is also an expensive way.
    2. At the beginning, the ratio of virtual bytes to private bytes was about 4 times. Why is the ratio of virtual bytes so high? Is there a lot of CPU? (4 CPU, x86 can be seen in taskmgr)

Please leave your valuable ideas and analysis ideas. Thank you.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.