Source: Click to open the link
Scenario:
Recently, a DB Server occasionally experienced CPU alarms. My mail alarm threshold (please read y threshold) is set to 15%. It didn't take the case at the beginning. I thought it was a statistical query, later, it became more and more frequent.
Exploration:
I decided to check what the fault was. The Troubleshooting sequence is as follows:
1. First enable Cacti monitoring and find that the CPU average has suddenly increased after a certain day, and the System \ Processor Queue Length and sqlservr \ % ProcessorTime are also significantly changing.
2. Starting from the inefficient SQL statement which is the easiest way to start, do you have made any changes to your business recently? Connect to the SQL instance, open the activity monitor, expand the "query that consumes a large amount of resources recently", and run the CPU time in reverse order. No real-time resource-consuming query is found here. In my personal experience, if the value here is 4 digits and the number of execution times in a minute is 3 digits, the average server CPU is more than 10%. If the cpu time is 5 digits, in addition, the number of executions within a minute is also high. If the number of executions exceeds several hundred times, the CPU is generally not relaxed. The image is for demonstration only.
3. There is no resource-consuming SQL statement, which is the most undesirable result of DBA, because perhaps the SQL Server is under pressure from internal or external sources, so that I spent too much time dealing with the communication with the operating system. The vast majority of non-query inefficient performance problems in SQL Server come from memory or hard disks, and sometimes the two need to study and compare baselines at the same time to determine who is the cause and who is the result. Here, we first check the memory usage of SQL Server. When the performance counter is turned on, my friends and I are stunned ...... A database with 64 GB memory is installed. The TargetMemory of SQL Server is only over 500 mb! Among them, StolenPage also occupies more than 200 MB, and the database DataPage only has more than 200 MB of memory for use. Oh, Shit! Although I do not want to use the word "Where are I going", but "Where is my memory?" At the same time, we also noticed that the PageLifeExpectancy value is only 26 (a server with sufficient memory, this value should be at least W ), the "Cache Hit Ration" We talked about a long time ago still maintains a relatively high level of 98! This case tells us that the performance counter of the cache hit rate cannot be explained in many cases.
4. OK. In this case, who occupies the memory that should have belonged to my dear SQL Server? We will continue to go to The wiindowstask Management page, select a route lag card, and click it to show all user flows. We will find that svchost.exe occupies the vast majority of 60 GB of memory!
5. What is svchost.exe? The Remote Registry Service is displayed on the worker process.
6. Check that something has reached a certain level. Most of this is a windows memory leakage Bug. google keywords: windows server 2008 r2 remote registry memory leak
Sure enough: Assume that you query performance counters on a remote computer by using an application on a computer that is running Windows 7 or Windows Server 2008 R2. In this situation, the memory usage of the Remote Registry service on the local computer increases until the available memory is exhausted.
Solution:
1. Restart the server and install hotfix.
2. Because restarting the server will affect the business, I am trying to restart the RemoteRegistry Service. This bug should also be fixed in some fixed scenarios.
Then, at the right time, I restarted the service. The TargetMemory of SQL Server was restored to over 60 GB, And the CPU was normal. So far, this problem has not occurred again.
Follow-up:
DBA's work is hard and hard to say, and it is easy to say. It is not enough to discover and solve problems. We also need to be aware of our shortcomings. In this case, I have not set up SQL Server Memory monitoring before, so I did not immediately find the severity of the illness. Fortunately, this Server does not undertake important services, otherwise the consequences will be unimaginable, it may have been a long time before the crash. The fear is that, if the crash occurs, the server will be restarted. At that time, we won't even have the first scene. When the leader asks, I should have scratched my head again.
After this event, I set up the SQL Server Memory monitoring. One day later, I found the same problem on one Server from the new monitoring data! I am glad that the server is not down, but that I am doing the right thing.
With a memory monitoring chart, we can see that after the service is restarted, the Total Pages of SQL Server has been rising and gradually stabilized, and Page life expectancy is also growing, I am glad that the CPU can also indicate that my illness has been eliminated.
Summary:
Before a performance problem occurs on the server, the majority of the problems occur in advance, especially Memory leakage, because the memory is squeezed a little bit, and the last limit is reached, SQL Server will suddenly Crash off, and then leave you with a dump, and Microsoft will laugh. Experienced doctors should see some clues from their daily back pain, and then further analyze and predict the occurrence of major diseases in advance. This is the value of DBA. This case tells me that only by focusing on the changes in the details of server exceptions can we prevent them from happening before they happen.