Performance counters monitored by LR

Source: Internet
Author: User
Tags disk usage

From: http://bbs.51testing.com/thread-2196-1-1.html

 

Hello everyone. In the future, we will discuss the design and design of LR monitoring performance counters and LR scenarios.
Today, I will first paste out some counters and their threshold requirements. These counters are aimed at My SQL Server database and web platform for Windows operating systems and C/S structures. some counters during the testing of the net product. You can continue to add those who have tested the Oracle database and J2EE architecture and WebLogic on the UNIX platform and want to paste their own counters, for everyone to share.
Well, let's talk about this first. We hope that this topic will allow everyone to analyze their testing results. Memory: memory usage may be the most important factor in system performance. If the system frequently switches pages, the memory is insufficient. "Page Swap" is a unit called "page". It moves code and data blocks of a fixed size from Ram to a disk to release memory space. Although some page switches enable Windows 2000 to use more memory than actually, they are acceptable, but frequent page exchanges will reduce system performance. Reducing page switching significantly increases the system response speed. To monitor the status of insufficient memory, start with the following object counters:
Available Mbytes: number of available physical memory. If the available Mbytes value is small (4 MB or smaller), it indicates that the total memory on the computer may be insufficient, or a program does not release the memory.
Page/sec: indicates the number of pages retrieved from the disk due to hardware page errors, or the number of pages written to the disk to release the working set space due to page errors. Generally, if pages/sec continues to exceed several hundred, you should study the page exchange activity further. You may need to increase the memory to reduce the page feed requirement (you can multiply this number by 4 K to get the hard disk data traffic caused by this ). A large value of pages/sec does not necessarily indicate memory problems, but may be caused by running programs that use memory ing files.
Page read/sec: Page hardware fault, a subset of page/sec. In order to parse the reference to memory, the number of times page files must be read. The threshold value is greater than 5. The lower the threshold, the better. A large value indicates disk read rather than cache read.
Because too many page swapping requires a lot of hard disk space, it may lead to confusion between insufficient Page Swap memory and the disk bottle diameter that leads to Page Swap. Therefore, you must track the following disk usage counters and memory counters when studying the causes of page exchanges with less memory:
Physical disk \ % disk Time
Physical disk \ avg. Disk Queue Length
For example, page reads/sec, % disk Time, And avg. Disk queue length. If the page reading speed is very low and the value of % disk Time and AVG. Disk queue length is very high, there may be disk bottle diameter. However, if the length of the queue increases while the page read rate does not decrease, the memory is insufficient.
To determine the impact of excessive page swapping on disk activity, increase the values of physical disk \ avg. Disk SEC/transfer and memory \ pages/sec counters several times. If the count of these counters exceeds 0.1, page switching will take more than 10 percent of disk access time. If this happens for a long time, you may need more memory.
Page faults/sec: Number of soft page failures per second (including some that can be satisfied directly in the memory and some that need to be read from the hard disk) compared with page/sec, data cannot be used immediately in the specified work set in the memory.
Cache Bytes: file system cache. By default, it is 50% of the available physical memory. If the iis5.0 runtime memory is insufficient, it will automatically sort out the cache. Pay attention to the trend changes of this counter
If you suspect Memory leakage, Please monitor memory \ available bytes and memory \ committed bytes to observe memory behavior, and monitors the Process \ private bytes, Process \ working set, and process \ handle count that may leak memory. If you suspect that a kernel-mode process causes leakage, you should also monitor memory \ pool nonpaged bytes, memory \ pool nonpaged allocs and process (process_name) \ pool nonpaged bytes.
Pages per second: the number of pages retrieved per second. The number should be less than one page per second. Process:
% Processor time: Number of processors consumed by the processor. If the server is dedicated to SQL Server, the maximum acceptable limit is 80-85%.
Page faults/sec: Compares page faults generated by a process with those generated by the system to determine the impact of the process on system page faults.
Work set: processes memory pages recently used by threads, reflecting the number of memory pages used by each process. If the server has enough idle memory, the page will be left in the work set. When the free memory is less than a specific threshold, the page will be cleared from the working set.
Inetinfo: Private Bytes: Number of current bytes allocated by this process that cannot be shared with other processes. If the system performance decreases over time, this counter can be the best indicator of Memory leakage.

 

 

Processor: monitors "processor" and "system" Object counters to provide valuable information about processor usage, helping you determine whether a bottleneck exists.
% Processor time: if this value continues to exceed 95%, the bottleneck is the CPU. You can consider adding a processor or changing a faster processor.
% USER time: Indicates CPU-consuming database operations, such as sorting and executing Aggregate functions. If the value is very high, you can consider increasing the index and try to reduce the value by using simple table join and horizontal table segmentation methods.
% Privileged time: (CPU Kernel Time) indicates the percentage of time taken to process the code executed by a thread in privileged mode. If the value of this parameter and the value of physical disk remain high, it indicates that I/O is faulty. Consider replacing a faster hard drive system. In addition, you can set tempdb in Ram to reduce "Max async Io" and "Max lazy writer Io.
In addition, the server work queues \ queue length counter that tracks the current length of the server work queue of the computer will display a processor bottleneck. If the queue length is greater than 4, processor congestion may occur. This counter is the value of a specific time, not the average value of a period of time.
% DPC time: the lower the better. In a multi-processor system, if the value is greater than 50% and processor: % processor time is very high, adding a NIC may improve performance and the network provided is not saturated.

 

 

Thread
Contextswitches/sec: (instantiate inetinfo and DLLHOST processes) if you decide to increase the size of the thread byte pool, you should monitor these three counters (including the one above ). Increasing the number of threads may increase the number of context switches, so that the performance will not increase but decrease. If the context switching value of the ten instances is very high, the size of the thread byte pool should be reduced.

 

 

Physical Disk:
% Disk Time %: the percentage of time the selected disk drive is busy providing services for read or write requests. If all three counters are large, the hard disk is not the bottleneck. If only % disk Time is large and both are moderate, the hard disk may be a bottleneck. Before recording this counter, run diskperf-YD in the command line window of Windows 2000. If the value exceeds 80%, the memory may leak.
AVG. Disk queue length: the average number of Read and Write requests (queued for the selected disk in the instance interval. This value should not exceed 1.5 of the number of disks ~ 2 times. To improve performance, you can add disks. Note: A raid disk actually has multiple disks.
Average disk read/write queue length: the average number of read (write) requests (queues.
Disk reads (writes)/S: Number of disk reads and writes on a physical disk per second. The sum of the two should be smaller than the maximum capacity of the disk device.
Average disksec/read: the average time required to read data on this disk in seconds.
Average disk SEC/transfer: the average time required to write data to this disk in seconds.
Network Interface:
Bytes total/sec: the speed at which bytes are sent and received, including frame characters. Determine whether the network connection speed is a bottleneck. You can use the counter value to compare with the current network bandwidth.

 

 

Sqlserver performance counters:
Access methods (access method) is used to monitor the methods used to access the logical pages in the database.
. Full scans/sec (full table scan/second) unlimited number of full scans per second. It can be a basic table scan or full index scan. If the value displayed by this counter is greater than 1 or 2, you should analyze your query to determine whether full table scan is required and whether the S q l query can be optimized.
. Page splits/sec (Page Division/second) number of page segments per second due to data update operations.
Buffer Manager: monitors Microsoft SQL server usage: memory stores data pages, internal data structures, and high-speed cache processes; the counter monitors physical I/O when SQL Server reads the database page from the disk and writes the database page to the disk. Monitoring the memory and counters used by SQL Server helps determine whether the bottleneck exists due to the lack of available physical memory to store frequently accessed data in the cache. If so, SQL Server must retrieve data from the disk. Whether to improve query performance by adding more memory or making more memory available for data cache or SQL Server internal structure.
The frequency at which SQL Server reads data from the disk. Compared with other operations, such as memory access, physical I/O takes a lot of time. Minimizing physical I/O can improve query performance.
. Page reads/sec: Number of physical database page reads per second. This statistics shows the total number of physical pages read between all databases. Because of the high overhead of physical I/O, you can minimize the overhead by using larger data caching, smart indexing, more efficient queries, or changing database design.
. Page writes/sec (. Written page/second) Number of pages written by the physical database per second.
. Buffer cache hit ratio. Percentage of pages not read in the buffer pool (buffer cache/buffer pool) to all pages in the buffer pool. The percentage of pages that can be found in the cache and do not need to be read from the disk. This ratio is the total number of cache hits divided by the total number of cache searches after the SQL server instance is started. After a long period of time, the ratio changes very little. Since reading data from the cache is much lower than reading data from the disk, It is generally expected that this value is higher. Generally, you can increase the cache hit rate by increasing the memory available for SQL Server. The counter value depends on the application, but the ratio is best 90% or higher. Increase the memory until this value continues to exceed 90%, indicating that more than 90% of data requests can obtain the required data from the data buffer.
. Lazy writes/sec (inert write/second) Number of buffer zones written by the inert write process per second. The value is preferably 0.
Cache manager objects provide counters to monitor how Microsoft SQL server uses memory to store objects, such as stored procedures, special and prepared Transact-SQL statements, and triggers.
. Cache hit ratio (high-speed cache hit rate, all cache "hit rate. In SQL Server, the cache can include log cache, buffer cache, and procedure cache, which is an overall ratio .) The ratio of the number of cache hits to the number of searches. This is a good counter for viewing how SQL Server high-speed cache works for your system. If this value is very low and continues below 80%, more memory needs to be added.
Latches are used to monitor internal SQL server resource locks called latches. Monitoring latches to identify user activity and resource usage helps identify performance bottlenecks.
. Average latch wait ti m e (m s) (Average latch wait time (MS) an SQL Server thread must wait for an average latch time, in milliseconds. If the value is high, you may be experiencing serious competition problems.
. Latch waits/sec (latch wait/second) Number of waits per second on the latch. If the value is high, it indicates that you are experiencing a lot of competition for resources.
Locks provide information about SQL Server locks on individual resource types. Add locks to SQL Server resources (such as reading or modifying rows in a transaction) to prevent multiple transactions from using resources concurrently. For example, if a row (x) Lock is added to a row of a table by a transaction, no other transaction can modify this row before the lock is released. Using as few locks as possible improves concurrency and performance. You can monitor multiple instances of the locks object at the same time. Each instance represents a lock on a resource type.
. Number of deadlocks/sec (number of deadlocks/s) the number of lock requests that cause deadlocks
. Average wait time (MS) (average wait time (MS) threads wait for the average wait time of a certain type of lock
. Lock requests/sec (Lock request/second) the number of some type of lock requests per second.
Memory Manager: used to monitor the overall server memory usage to estimate user activity and resource usage and help identify performance bottlenecks. Monitoring the memory used by the SQL server instance helps determine:
Whether the bottleneck exists due to the lack of available physical memory to store frequently accessed data in the cache. If so, SQL Server must retrieve data from the disk.
Whether more memory can be added or used for data cache or SQL Server internal structure to improve query performance.
Lock blocks: Number of locked blocks on the server. The locks are on resources such as pages, rows, and tables. You do not want to see a growth value.
Total Server Memory: the total amount of dynamic memory currently in use by the SQL Server server.

 

 

Monitors some counters required by IIS
Internet Information Services Global:
File Cache hits %, file cacheflushes, File Cache hits
File Cache hits % is the percentage of cache hits in all cache requests, reflecting the file cache settings of IIS. For a website composed of mostly static webpages, this value should be around 80%. File Cache hits is the specific value hit by the File Cache. File cacheflushes is the number of File Cache refreshes since the server starts. If the refresh is too slow, the memory will be wasted. If the refresh is too fast, objects in the cache are discarded too frequently and cannot be cached. By comparing File Cache hits and File Cache Flushes, you can obtain the cache hit rate to the cache clearing rate. Observe the two values to get an appropriate refresh value (refer to setting objectttl, memcachesize, and maxcachefilesize in IIS)
Web Service:
Bytes total/sec: displays the total number of bytes sent and received by the Web server. A low value indicates that the IIS is transmitting data at a low speed.
Connection refused: the lower the value, the better. A high value indicates a bottleneck in the network adapter or processor.
Not found errors: displays the number of requests that cannot be met by the server because the requested file cannot be found (HTTP Status Code 404)

 

Other words:

Monitor memory counters
To monitor the status of insufficient memory, start with the following object counters:
Memory information:
Memory \ available bytes
Memory \ pages/sec
Memory \ available bytes
If you suspect Memory leakage, Please monitor memory \ available bytes and memory \ committed bytes to observe memory behavior, and monitors the Process \ private bytes, Process \ working set, and process \ handle count that may leak memory. If you suspect that a kernel-mode process causes leakage,
You should also monitor memory \ pool nonpaged bytes, memory \ pool nonpaged allocs and process (process_name) \ pool nonpaged bytes.

CPU information:
Processor \ % processor time to obtain the processor usage.
You can also choose to monitor processor \ % USER time and % privileged time for detailed information.
The server work queues \ queue length counter displays a processor bottleneck. If the queue length is greater than 4, processor congestion may occur.
System \ processor queue length used for Bottleneck Detection
By using process \ % processor time and Process \ Working Set
Total time of all threads in the Process \ % processor time process on each processor.

Hard Disk information:
Physical disk \ % disk Time
Physical disk \ avg. Disk Queue Length
For example, page reads/sec, % disk Time, And avg. Disk queue length. If the page reading speed is very low and the value of % disk Time and AVG. Disk queue length is very high, there may be disk bottle diameter. However, if the length of the queue increases while the page read rate does not decrease, the memory is insufficient.
Physical disk \ % disk Time
Physical disk \ avg. Disk Queue Length
For example, page reads/sec, % disk Time, And avg. Disk queue length. If the page reading speed is very low and the value of % disk Time and AVG. Disk queue length is very high, there may be disk bottle diameter. However, if the length of the queue increases while the page read rate does not decrease, the memory is insufficient.
Observe the value of the processor \ interrupts/sec counter, which measures the speed of service requests from input/output (I/O) devices. If the value of this counter increases significantly, but the system activity does not increase accordingly, it indicates that there is a hardware problem.
Physical disk \ disk reads/sec and disk writes/sec
Physical disk \ current disk Queue Length
Physical disk \ % disk Time
Logicaldisk \ % free space
When testing disk performance, record the performance data to another disk or computer so that the data does not interfere with the disk you are testing.
Additional counters that may be observed include physical disk \ avg. Disk SEC/transfer, avg. Disk Bytes/transfer, and disk Bytes/sec.
AVG. Disk SEC/transfer counters reflect the time used by the disk to complete the request. A high value indicates that the disk controller keeps retrying the disk because of the failure. These faults increase the average disk transfer time. For most disks, the average transfer time of a higher disk is greater than 0.3 seconds.
You can also view avg. Disk Bytes/transfer values. A value greater than 20 KB indicates that the disk drive is normally running well. If the application is accessing the disk, a low value is generated. For example, applications that randomly access a disk increase the average disk SEC/transfer time, because random transfer requires an increase in the search time.
Disk Bytes/sec provides the throughput of the disk system.
Determine workload balancing
To balance the load on the network server, you need to know how busy the server disk drive is. Use the physical disk \ % disk Time counter to display the percentage of drive activity time. If % disk Time is high (more than 90%), check the physical disk \ current disk queue length counter to check the number of system requests waiting for disk access. The number of waiting for I/O requests should be no greater than 1.5 to 2 times the number of main axes that constitute the physical disk.

Although cheap Disk redundancy array (RAID) devices usually have multiple spindle, most disks have one spindle. The hardware RAID device is displayed as a physical disk in the System Monitor, and the RAID device created through the software is displayed as multiple drives (instances ). You can monitor the physical disk counters for each physical drive (rather than RAID), or use the _ total instance to monitor data on all computer drives.

Use the current disk queue length and % disk Time counters to detect Disk Subsystem bottlenecks. If the current disk queue length and % disk Time values are always high, consider upgrading the disk drive or moving some files to another disk or server.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.