[GO] Detecting SQL Server database CPU bottlenecks and memory bottlenecks

Source: Internet
Author: User
Tags set set system log cpu usage high cpu usage server memory jconsole

You see the memory footprint of the SQL Server 2000 process in Task Manager, and in SQL Server 2005, you cannot view the memory footprint of the SQL Server 2005 process in Task Manager by using the

The following statement looks at the actual memory consumption of SQL Server:

SELECT * from sysperfinfo where counter_name like '%memory% '

Of these, total Server memory indicates a footprint.

Select locked_page_allocations_kb from Sys.dm_os_process_memory
Select sum (awe_allocated_kb) as [awe allocated, KB] from sys.dm_os_memory_clerks

To properly limit the memory footprint of SQL Server, set the maximum value for SQL Server, and do not give all of the physical memory to SQL Server, leaving the Windows system with the appropriate size of memory.

For large physical memory, to add the/PAE parameter to boot. ini in the Windows Server 2003 system, if it is AMD's CPU, add the/usepmtimere parameter, set "lock pages in memory" in the local Group Policy, in SQL Erver 2005 Select AWE and set minimum and maximum values to restart the server or SQL Server service, and refer to the Help documentation for "Enabling support for physical memory above 4GB" in Windows Server 2003 and SQL Server 2005 for details.

First, SQL database CPU bottleneck

There are a number of States for a worker process for SQL Server, with 3 main states running (RUNNING), operational (RUNNABLE), and suspended (suspened).

You can determine the CPU bottleneck by viewing the system monitoring counter processor:% Processor time. If the value of this counter is high. For example, lasting 15-20 minutes over 80% means that there is a bottleneck in the CPU.

When you suspect that computer hardware is the primary cause of SQL Server performance, you can monitor the load on the appropriate hardware through SQL Server Performance Monitor to verify your guesses and identify system bottlenecks. Some of the commonly used analysis objects and their parameters are described below.

Memory:page faults/sec

If the value is occasionally higher, it indicates that the thread is competing for memory at that time. If it continues to be high, then memory can be a bottleneck.

Process:working Set

This parameter of SQL Server should be very close to the memory value assigned to SQL Server. In SQL Server settings, if you set set working set size to 0, Windows NT determines the size of the working set of SQL Server. If set working set size is placed at 1, the working set size is forced to the allocated memory size of SQL Server. In general, it is best not to change the default value of "set working set Size".

Process:%processor time

If the value of this parameter persists over 95%, the bottleneck is CPU. Consider adding a processor or swapping it for a faster one.

Processor:%privileged time

If the parameter value and the "physical Disk" parameter values are always high, there is a problem with I/O. Consider replacing a faster hard drive system. In addition, the tempdb in RAM is set to reduce the max async IO, and all measures such as Max lazy writer IO will decrease this value.

Processor:%user time

Represents CPU-intensive database operations, such as sorting, executing aggregate functions, and so on. If the value is high, consider increasing the index, using a simple table join, and horizontally splitting the large table to reduce the value.

Physical Disk:Avg.Disk Queue Length

The value should be no more than 1.5~2 times the number of disks. To improve performance, you can increase the disk.

Note: A RAID disk actually has more than one disk.

Sqlserver:cache hit Ratio

The higher the value, the better. If it lasts below 80%, you should consider increasing the memory. Note that the value of this parameter is incremented after starting SQL Server, so the value will not reflect the current value of the system after a period of time has elapsed.



Another way to detect CPU pressure is to calculate the number of worker processes in a running state , which can be obtained by executing the following DMV query:

SELECT COUNT (*) as WORKERS_WAITING_FOR_CPU, T2. scheduler_id

From sys.dm_os_workers as T1, sys.dm_os_schedulers as T2

WHERE t1.state = ' RUNNABLE ' and t1.scheduler_address=t2.scheduler_address

and t2.scheduler_id < 255

GROUP by t2.scheduler_id

You can also perform the following query to get the time that the worker process spends in a running state:

SELECT SUM (Signal_wait_time_ms) from Sys.dm_os_wait_stats

The following query is to find the top 100 CPU-intensive queries per execution:

SELECT TOP Total_worker_time/execution_count as Avg_cpu_cost, Plan_handle, Execution_count,

(SELECT SUBSTRING (text, statement_start_offset/2+1,

(case if Statement_end_offset =-1 Then LEN (CONVERT (nvarchar (max), text)) * 2

ELSE statement_end_offset End-statement_end_offset)/2)

From Sys.dm_exec_sql_text (sql_handle)) as Query_text

From Sys.dm_exec_query_stats

ORDER by Avg_cpu_cost DESC

Make a slight change to find the most frequently run queries:

SELECT TOP total_worker_time/execution_countasavg_cpu_cost, plan_handle, Execution_count,

(SELECT SUBSTRING (text,statement_start_offset/2+1,

(case if Statement_end_offset =-1 Then LEN (CONVERT (nvarchar (max), text)) * 2

ELSE statement_end_offset End-statement_end_offset)/2)

From Sys.dm_exec_sql_text (sql_handle)) as Query_text

From Sys.dm_exec_query_stats

ORDER by Execution_count DESC

You can use the following system monitoring performance counters to view the speed of compilation and recompilation:

1. Sqlserver:sql statistics:batchrequests/sec (number of batch requests per second)

2. Sqlserver:sql statistics:sqlcompilations/sec (SQL compile times per second)

3. Sqlserver:sql statistics:sqlrecompilations/sec (SQL recompilation times per second)

You can also get the time that SQL Server spends optimizing the query plan by using the following statement:

SELECT * from Sys.dm_exec_query_optimizer_info

WHERE counter= ' optimizations ' OR counter = ' Elapsed time '

The following query finds the top 10-bit query plans that are compiled most:

SELECT TOP Ten plan_generation_num, Execution_count,

(SELECT SUBSTRING (text, statement_start_offset/2+1,

(case if Statement_end_offset =-1 Then LEN (CONVERT (nvarchar (max), text)) * 2

ELSE statement_end_offsetend-statement_end_offset)/2)

From Sys.dm_exec_sql_text (sql_handle)) Asquery_text

From Sys.dm_exec_query_stats

WHERE plan_generation_num> 1

ORDER by Plan_generation_num DESC

Second, SQL database memory bottleneck

When memory is stressed, a query plan may have to be moved out of memory. If the plan is committed again, it must be optimized again, and because query optimization is CPU intensive, this can put a strain on the CPU. Similarly, when the memory is under pressure, the database page may need to be moved out of the buffer pool. If these pages are soon checked again, it will result in more physical IO.

Generally speaking, memory refers to the available physical memory (both RAM) on the server. There is another kind of memory called virtual address space (VAS) or virtual memory. On Windows systems, all bit applications have a GB of process address space that is used to obtain the maximum GB of physical memory. In addition to GB of available memory, the process can also receive a GB VAS in user mode, while the GB reserve can only be obtained through kernel mode. To change this configuration, you can use/3GB switch in the boot. ini file.

A common operating system mechanism is page debugging, which uses a swap file to store portions of the process memory that have not been used recently. When this memory is referenced again, it reads (or gets into) the physical memory directly from the swap file.

The following parameters can be monitored through performance counters:

1. Memory: Available Bytes (Available Bytes)

2. SQL Server: Buffer Manager: Cache Hit Ratio (buffer cache hits Ratio) refers to the proportion of pages that are found directly in the buffer pool without reading through disk. For most product workloads, this value should be more. (should be the bigger the better)

3. SQL Server: Buffer Manager: Page life expectancy refers to the number of seconds that a page that is not referenced is retained in the buffer pool. If the value is low, the buffer pool is experiencing a low memory condition.

4. SQL Server: Buffer Manager: Checkpoint page/sec (Checkpoint pages/sec) refers to the number of pages refreshed by the checkpoint or other operations that require all dirty pages to be refreshed. It shows the amount of buffer pool activity added to the workload.

5. SQL Server: Buffer Manager: Deferred write/sec (lazywrites/sec) refers to the number of buffers written by the buffer manager's delay writer, which acts like the previous Checkpoint page/second.

When you suspect that there is not enough memory:

Method 1:

"Monitoring indicators": Memory Available MBytes, Memory pages/sec, page read/sec, page faults/sec

"Reference value":

If the Page reads/sec ratio continues to remain at 5, it indicates that there may be insufficient memory.

Page/sec Recommended 00-20 (this value will always be high if the server does not have enough memory to handle its workload.) If greater than 80, indicates a problem).

Method 2: Analyze performance bottlenecks based on physical Disk value

"Monitoring metrics": Memory Available MBytes, Pages read/sec,%disk time and Avg.Disk Queue Length

"Reference value":%disk time recommended threshold value 90%

When the memory is low, a bit of process is transferred to the hard disk to run, resulting in a sharp decline in performance, and a memory-starved system often shows a high CPU utilization, because it needs to constantly scan the memory, the memory of the page moved to the hard disk.

When a memory leak is suspected

"Monitoring indicators": Memory Available MBytes, processprivate bytes and processworking set,physicaldisk/%disk time

"description":

In Windows resource Monitoring, if the values of the Processprivate bytes counter and the Processworking set counter continue to rise over a long period of time, the value of the Memoryavailable bytes counter continues to degrade , there is a good chance of a memory leak. Memory leaks should be tested over a long period of time to investigate the application response when all memory is exhausted.

CPU Bottleneck problems

1, System\%total Processor Time if the value continues to exceed 90%, and the processor is blocked, the entire system is facing a processor bottleneck.

Note: In some multi-CPU systems, although the data itself is not large, but the load situation between the CPU is extremely uneven, it should also be seen as a system to create a processor bottleneck.

2, excluding memory factors, if the value of the Processor%processor time counter is larger, while the network card and hard disk value is lower, then you can determine the CPU bottleneck. (In the case of low memory, a bit of process will be transferred to the hard disk to run, resulting in a sharp decline in performance, and a memory-starved system often shows a high CPU utilization, because it needs to constantly scan the memory, the memory of the page moved to the hard disk.) )

Causes of high CPU usage:

Frequent execution of programs, complex operations, heavy CPU consumption

Database query statement complex, a large number of where clauses, order BY, the group by sort, etc., CPU prone to bottlenecks

Insufficient memory, IO disk issue increases CPU overhead

CPU Analysis

"Monitoring Indicators":

System%processor time Cpu,processor%processor time CPU

Processor%user Time and processor%privileged time

Systemprocessor Queue Length

Context Switches/sec and%privileged time

"Reference value":

System\%total processor Time does not last more than 90%, if the server is dedicated to SQL Server, the maximum acceptable limit is 80-85%, with a reasonable range of 60% to 70%.

Processor%processor time less than 75%

Systemprocessor Queue length value, less than the total number of CPUs +1

Disk I/O analysis

"Monitoring indicator": Physicaldisk/%disk time,physicaldisk/%idle time,physical disk Avg.Disk Queue Length, Disk Sec/transfer

"Reference value":%disk time recommended threshold value 90%

In Windows Resource monitoring, if the value of% Disk Time and Avg.Disk Queue length is high, and page reads/sec is low, there may be a disk bottle diameter.

Processor%privileged time the value of this parameter is always high, and if the physical disk counter is only a large%disk, and the other values are relatively moderate, the hard disk may be a bottleneck. If several values are larger, then the hard drive is not a bottleneck. If the value continues to exceed 80%, it may be a memory leak. If the value of this counter (processor%privileged time) is also high when the value of the physical disk counter is high, consider using a faster or more efficient disk subsystem.

Disk Sec/transfer Generally, this value is less than 15ms is best, between 15-30ms is good, between 30-60ms is acceptable, more than 60ms need to consider the replacement of hard disk or hard disk RAID mode.

Average Transaciton Response Time (transaction average response times) as the test time changes, the speed of the system processing transactions begins to slow down, which indicates that the application system will have a downward trend as the production time changes.

Transactions per Second (number of transactions per second/tps) when the pressure increases, the clickthrough rate/tps curve If the change is slow or there is a flat trend, it is likely that the server began to bottleneck

Hits per Second (clicks per second) can determine if the system is stable by looking at the click-through times. Decreased CTR of the system usually indicates that the server is responding slowly and needs further analysis to find the bottleneck of the system.

The throughput (throughput rate) can assess the amount of load generated by virtual users based on the throughput of the server, as well as the ability of the server to handle traffic and whether there are bottlenecks.

Connections (number of connections) when the number of connections reaches a steady state and the transaction response time increases rapidly, adding a connection can greatly improve performance (transaction response time will be reduced)

Time to primary buffer breakdown (over time) (the first Buffer time subdivision (with the change) can be used to determine when a server or network problem occurs during a scene or session step run.

Performance issues that you have encountered:
    • 1. In the case of high concurrency, the resulting processing failure (for example: Database connection pool is too low, the number of server connections exceeds the limit, database lock control is not considered enough)
    • 2. Memory leaks (for example: In the long run, the memory is not properly released, the outage occurs, etc.)
    • 3. CPU Usage deviation (for example: high concurrency leads to high CPU usage)
    • 4. Log print too much, the server has no hard disk space
How to locate these performance issues:

1. Check the system log, log is the Magic weapon to locate the problem, if the full log records, it is easy to find the problem through the log.

For example, when the system goes down, the system log prints out an out-of-memory error when a method executes, and we can follow the lead and quickly locate the problem that caused the memory overflow.

2. Using performance monitoring tools, such as: Java Development B/s structure of the project, can be through the JDK jconsole, or jprofiler, to monitor server performance, Jconsole can remotely monitor the server's CPU, memory, thread and other state, and draw a change graph.

Use of Spotlight to monitor database usage.

The performance points we need to focus on are: CPU load, memory utilization, network I/O, etc.

3. Tools and logs are just means, and in addition, a reasonable performance test scenario needs to be designed

Specific scenarios are: Performance testing, load testing, stress testing, stability testing, surge testing, etc.

Good test scenarios to find bottlenecks more quickly and locate bottlenecks

4. Understand the system parameter configuration, can perform later performance tuning

In addition, I would like to say a digression, that is, the use of performance testing tools

In the beginning with LoadRunner and JMeter, when doing high concurrency test, there has been no server crushing, the two programs fall first of their own situation.

If this problem is encountered, it can be resolved by remotely invoking multiple client services and dispersing the stress of the performance test tool client.

The purpose of this is to say that when doing performance testing, we must ensure that bottlenecks do not occur on our own test scripts and test tools

[GO] Detecting SQL Server database CPU bottlenecks and memory bottlenecks

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.