In the afternoon, a friend on MSN sent a dump file, saying that the CPU usage of the Web server is at 100%. The problem could not be found. Let's take a look, let's upload the dump file and find out where the problem is.
Framework2.0 is a 2 k Windows OS.
After loading the dump file, load the SOS extension debugging module of Version 2.0:
. Load c: \ windows \ Microsoft. NET \ framework \ v2.0.50727 \ SOS. dll
Habitually list managed threads:
! Threads
The following error is reported:
Unable to load Image
C: \ windows \ Assembly \ nativeimages_v2.0.50727_64 \ mscorlib \ 339a2c337cdafd47f2ba740bb03f9718 \ mscorlib. Ni. dll, Win32 Error 2
* ** Warning: unable to verify checksum for mscorlib. Ni. dll
Lm and below, I found that the symbol file was not loaded, so. Reload, the path of the symbol file I configured is as follows:
C: \ windows \ symbols; C: \ Program Files \ Microsoft Visual studio8
\ SDK \ V2.0 \ symbols; SRV * E: \ Bak \ symbols * http://msdl.microsoft.com/download/symbols
The network traffic is reduced to 30 kb/s. At this time, the symbolic file is being downloaded from the symbol file server. After a while, all the required symbolic files have been downloaded:
First, let's look at the situation in the thread pool:
0: 000>! Threadpool
CPU utilization 100%
Worker thread: Total: 1 running: 0 idle: 1 maxlimit: 100 minlimit: 1
Work request in queue: 0
--------------------------------------
Number of Timers: 8
--------------------------------------
Completion port thread: Total: 1 free: 1 maxfree: 2 Currentlimit: 1 maxlimit: 100 minlimit: 1
Er, the CPU usage is indeed 100%.
Check the CPU time usage:
0: 000>. Time
Debug session time: Thu Jun 12 08:59:14. 0002008 (GMT + 8)
System uptime: 2 days 1:01:08. 081
Process uptime: 2 days 0:30:27. 000
Kernel Time: 0 days 0:00:22. 000
User time: 0 days 0:09:40. 000
Well, the main reason is that the user's application occupies the process time. Then let's take a look at the thread that used all the CPU time of the user mode:
0: 000>! Runaway
User Mode Time
Thread time
12: b9c 0 days 0:06:27. 109
2: b54 0 days 0:00:50. 468
15: f98 0 days 0:00:09. 937
10: b94 0 days at 0:00:03. 531
5: b6c 0 days 0:00:01. 937
11: b98 0 days 0:00:01. 671
3: b58 0 days 0:00:01. 171
0: b44 0 days at 0:00:00. 640
9: B84 0 days 0:00:00. 265
7: b7c 0 days 0:00:00. 203
14: 388 0 days 0:00:00. 000
13: b68 0 days 0:00:00. 000
8: b80 0 days 0:00:00. 000
6: b78 0 days 0:00:00. 000
4: b5c 0 days 0:00:00. 000
1: B50 0 days 0:00:00. 000
User time is a total of 0 days 0:09:40. 000 so much time, thread 12 takes up so much time for 0 days 0:06:27. 109, okay, let's see what we did in private:
0: 000> ~ 12 s
Eax = 000007ff EBX = 00000000 ECx = 0130d894 edX = 013055ac ESI = 013055ac EDI = dbb83133
EIP = 051dc0be ESP = 06a8ef48 EBP = 06a8efb4 iopl = 0 NV up EI PL NZ AC PE NC
Cs = 001b Ss = 0023 DS = 0023 es = 0023 FS = 003b GS = 0000 EFL = 00000216
051dc0be 3bf8 cmp edi, eax
After switching, view the call Stack:
0: 012> K
Childebp retaddr
Warning: frame IP not in any known module. Following frames may be wrong.
06a8efb4 051d4b92 0x51dc0be
06a8f068 0127d1f 0x51d4b92
06a8f080 79e7c74b mscorlib_ni + 0x2f0d1f
06a8f090 79e7c6cc mscorwks! Calldescrworker + 0x33
06a8f110 79e7c8e1 mscorwks! Calldescrworkerwithhandler + 0xa3
06a8f24c 79e7c783 mscorwks! Methoddesc: calldescr + 0x19c
06a8f268 79e7c90d mscorwks! Methoddesc: calltargetworker + 0x1f
06a8f27c 79eb300f mscorwks! Methoddesccallsite: call_retargslot + 0x18
06a8f448 79eb2f31 mscorwks! Executecodewithguaranteedcleanuphelper + 0x9b
06a8f4f8 034f3ff7
Mscorwks! Reflectioninvocation: executecodewithguaranteedcleanup + 0xf9
06a8f510 034f3ede mscorlib_ni + 0x2b3ff7
06a8f528 03530c68 mscorlib_ni + 0x2b3ede
06a8f540 79e7c74b mscorlib_ni + 0x2f0c68
06a8f550 79e7c6cc mscorwks! Calldescrworker + 0x33
06a8f5d0 79e7c8e1 mscorwks! Calldescrworkerwithhandler + 0xa3
06a8f710 79e7c783 mscorwks! Methoddesc: calldescr + 0x19c
06a8f72c 79e7c90d mscorwks! Methoddesc: calltargetworker + 0x1f
06a8f740 79fc58cd mscorwks! Methoddesccallsite: call_retargslot + 0x18
06a8f928 79ef3207 mscorwks! Threadnative: kickoffthread_worker + 0x190
06a8f93c 79ef31a3 mscorwks! Thread: doadcallback + 0x32a
Until calldescrworker, no exception was found. The first part of the call stack of a standard worker thread that initializes and executes a specific task is normal. Then there are two calls at the top of the stack:
06a8efb4 051d4b92 0x51dc0be
06a8f068 0127d1f 0x51d4b92
This does not show what the method is. Well, use SOS to list the managed method call Stack:
0: 012>! Clrstack
OS thread ID: 0xb9c (12)
ESP EIP
06a8ef48 051dc0be rswebgis. rsprotocolclass. getdatalong (byte [], int32)
06a8ef68 051dbf74 rswebgis. rsprotocolclass. parsepackage (byte [])
06a8efbc 051d4b92 rswebgis. rsprotocolclass. doanalysiswork ()
06a8f070 01_d1f system. Threading. threadhelper. threadstart_context (system. Object)
06a8f078 034f40ab system. Threading. executioncontext. runtrycode (system. Object)
06a8f49c 79e7c74b [helpermethodframe_protectobj: 06a8f49c]
System. runtime. compilerservices. runtimehelpers. executecodewithguaranteedcleanup (trycode, cleanupcode, system. Object)
06a8f504 034f3ff7
System. Threading. executioncontext. runinternal (system. Threading. executioncontext, system. Threading. contextcallback, system. Object)
06a8f51c 034f3ede
System. Threading. executioncontext. Run (system. Threading. executioncontext, system. Threading. contextcallback, system. Object)
06a8f534 03530c68 system. Threading. threadhelper. threadstart ()
06a8f760 79e7c74b [gcframe: 06a8f760]
06a8fa50 79e7c74b [contexttransitionframe: 06a8fa50]
At this time, the top three call estimates are the key to the problem. Based on past experience, there are several common performance debugging scenarios:
L high CPU usage, low memory usage
L high CPU usage. memory usage of a module or thread is outrageous
L CPU and memory usage are not high, that is, webpage access is particularly slow
The general causes of these problems and a common analysis routine are as follows. It is useful to know where the problem may occur and how to judge the problem at this time. :)
From the above call stack, there should be a problem with the program written in the code. Okay, is my mother-in-law bullied?
Syntax for separating a module from a dump file:
! Savemodule <base address> <FILENAME>
<Base address> there are several possible solutions. You can run the dumpdomain command to check which modules are loaded in the appdomain. The following is the screenshot! A record of dumpdomain. Based on the above stack display, you can find the rswebgis module:
Assembly: 001de3d0 [c: \ winnt \ Microsoft. NET \ framework \ v2.0.50727 \ temporary
ASP. netfiles \ rswebgis \ cbfaa214 \ c8518bb9 \ Assembly \ dl3 \ b000bb67 \
2dae9f2e_ab93c801 \ rswebgis. dll
Classloader: 001dc518
Securitydescriptor: 001de348
Module name
05860010 c: \ winnt \ Microsoft. NET \ framework \ v2.0.50727 \ temporary
ASP. NET files \ rswebgis \ cbfaa214 \ c8518bb9 \ Assembly \ dl3 \ b000bb67 \
2dae9f2e_ab93c801 \ rswebgis. dll
Here, 05860010 is the base address of the rswebgis module. Of course, this base address can also be obtained by using the LM command to view the loaded module.
Okay, strip it out and save it to drive C:
0: 000>! Savemodule 05880000 c: \ rswebgis. dll. dll
4 sections in file
SECTION 0-Va = 1000, vasize = e82da9, fileaddr = 400, filesize = e82e00
Section 1-Va = e84000, vasize = 24d24, fileaddr = e83200, filesize = ec00
Section 2-Va = ea9000, vasize = 5a8, fileaddr = e91e00, filesize = 600
Section 3-Va = eaa000, vasize = c183c, fileaddr = e92400, filesize = c1a00
A total of four sections are left empty.
Copy reflector decompilation, first find
06a8ef48 051dc0be rswebgis. rsprotocolclass. getdatalong (byte [], int32)
This implementation:
Private int getdatalong (byte [] bytes, int count)
{
If (count> (bytes. length-1) | (count> (bytes. length-2) | (count> (bytes. length-3) | (count> (bytes. length-4) | (count <0 ))
{
Return 0;
}
Return bitconverter. toint32 (bytes, count );
}
My day, how are there so many parentheses? I think it's a blind eye. At the very beginning, it was estimated that the problem was located in this place, where a logic error occurred. After a long time, I couldn't find a reason, and there was no problem with this judgment, A few parentheses may cause a slight deviation from the normal judgment logic, but it will not be 100% of the CPU usage. Then I asked zhaochangj's brother about the program code for this method. The result is:
# Region obtains a long integer from the specified position in the byte array.
/// <Summary>
/// Obtain the data packet size
/// </Summary>
/// <Param name = "bytes"> specified data packet </param>
/// <Param name = "count"> Start position </param>
Private int getdatalong (byte [] bytes, int count)
{
If (count> bytes. Length-1 | count> bytes. Length-2 | count> bytes. Length-3
| Count> bytes. Length-4 | count <0)
It seems that this reflector will be a little short-circuited when dealing with such many parentheses.
Well, since there is no problem, let's look at how the above calls this:
Private void parsepackage (byte [] bytedata)
{
Int datalong;
Int length = bytedata. length;
For (INT I = 0; I <length; I + = datalong)
{
Datalong = This. getdatalong (bytedata, I );
If (I + datalong) <= length) & (datalong> 0 ))
{
Long num4;
Int num5;
Int num6;
String STR;
Byte [] buffer;
Byte [] buffer2;
This. copybytes (bytedata, I, datalong, out buffer2 );
Int start = 0;
If (buffer2.length-Start)> = 15) & (this. tcpmsgheadproc (buffer2, start,
Out num4, out num5, out STR, out num6, out buffer) = 0l ))
{
This. m_nobackcount = 0;
This. tcpdataproc (num4, num5, STR, num6, buffer );
}
}
}
}
Er, you can see the for loop. Well, you need to pay special attention to the fact that this situation may often be an endless loop in a specific situation. Of course, it is not all.
I am wondering whether the defined datalong variable is assigned a value for the first time. In this case, the compiler prompts an error, in addition, I + = datalong is the first time I saw it in the for loop... On MSN, I asked zhaochangj about the code and sent the source code of this function:
While (offset <bytestotal)
{
Int length = getdatalong (bytedata, offset );
Byte [] tmpmsg;
If (Offset + Length <= bytestotal) & (length> 0 ))
{
// Omitted
Finally, the problem is found: If the returned value of the getdatalong method is 0, the loop will continue, that is, for (INT I = 0; I <length; I + = datalong) in the program, if datalong returns 0, that is, when formatting a byte string with a length less than 4, the loop will continue.
Sum: It only describes the general idea of finding a problem with windbg in the case of slow CPU, memory, and access speed after the website is released, you may have specific problems in specific situations.