Scene:
A recent DB server occasional CPU alarm, my email alarm threshold (Please read yù) value set is 15%, the beginning is not the case, thinking is what the statistical category of inquiries, and then more frequent.
Explore:
I decided to check to see what was going on, and the Order of my troubleshooting was as follows:
1, first turn on cacti monitoring, found that the most recent CPU average after a day suddenly rise, and can see system\processor Queue Length and Sqlservr\%processortime also in significant changes.
2, starting from the most easy to start the low efficiency of SQL, consider what is the latest business changes? Connect to the SQL instance, open the Activity Monitor, expand the "most recent resource-intensive queries," and reverse CPU time, where there is no immediate resource-intensive query. According to personal experience, here, if the value is 4 digits, 3 digits in minutes, the general server CPU is probably more than 10%, if the CPU time there is 5 digits, and the number of minutes in the execution is also very high, hundreds of times, the CPU will generally not calm. Pictures for demo only
3, no resource-consuming SQL, this is the most reluctant DBA to see the results, because perhaps SQL Server from internal or external pressure, so that they spend too much time to deal with the operating system communication. The performance problems of SQL Server's common query-inefficient classes, most of which come from memory or hard disks, sometimes require a parallel baseline to determine who is the cause and who is the fruit. Here, we first look at SQL Server memory usage, and when I turn on the performance counters, my friends and I are stunned ... With a database with 64G of memory installed, SQL Server has a targetmemory of more than 500 megabytes! This stolenpage also occupies more than 200 trillion, the database datapage only more than 200 megabytes of memory to use, oh,shit! Although I do not want to use "where to" these three words, but "where is my memory?" At the same time we also note that the Pagelifeexpectancy value is only 26 (a sufficient memory server, this value should at least be on the W), and very early before we relish the "Cache Hit ration" but still maintain a relatively high level of 98! This case tells us that the cache hit rate performance counter is often not a problem.
4, OK, in this case, who occupies the memory that belongs to my dear SQL Server? We continue, open wiindows task management, select the Process tab, click to show all the user processes, found that Svchost.exe occupies the vast majority of 60G memory!
5. What is that svchost.exe? We use processmonitor this tool below, open after automatically load all wiindows process, sorted by memory, mouse move to Svchost.exe process, display for Remote Registry service.
6, find here, things have been a certain way, this is mostly Windows memory leak bug, hence Google keywords: Windows Server 2008 R2 Remote Registry memory leak
Find the following link: http://support.microsoft.com/kb/2699780/en-us
Sure enough: Assume that your query performance counters on a sqlremote computer by using a application on a computer this is running W indows 7 or Windows Server 2008 R2. In this situation, the memory usage to the Remote Registry service on the local computer increases the until Memory is exhausted.
Workaround:
1, reboot the server, install hotfix
2, because the restart of the server will affect the business, so I am thinking of restarting the RemoteRegistry service, should also be able to temporarily solve the problem, this bug should be in a certain fixed situation occurred.
Then, at the right time, I restarted the service, the SQL Server targetmemory back to more than 60 g,cpu also normal, so far the problem has not occurred.
follow-up follow up:
The work of a DBA is hard to say, it's easy, it's easy to find a problem, it's not enough to solve the problem, we have to be aware of our shortcomings, in this case, I didn't have the SQL Server memory monitored, so I didn't find the severity of the illness in the first place, but the server didn't take on the important business, Otherwise disastrous consequences, perhaps early collapse, fear is that, if the crash, nature to restart the server, to that time, we even the first scene did not, when leader asked, I should be hard scratching my head.
After this event, I set up the SQL Server memory monitoring, 1 days later, I new monitoring data, and found a server has the same problem! I am glad that I am not thankful that the server is not down, but I am glad I did it right.
With a memory monitor, you can see that after the service restart, the total pages of SQL Server has been rising, and gradually stabilized, Page life expectancy also become more and more large, the CPU can also indicate that the disease has been eliminated, I am pleased.
Summary:
The server before performance problems, most of the early signs, especially memory leaks, because the memory is a little bit squeezed, and finally reached a limit, SQL Server will suddenly crash off, and then only left you a dump, Microsoft smiled. Experienced doctors should see some clues from the daily backache, then further analysis, predict the occurrence of major diseases in advance, this is the value of DBA. This case, tell me, pay attention to the details of the server abnormal changes in order to do in the bud.