SQL Server memory-related counters and memory pressure diagnostics

Source: Internet
Author: User
Tags time interval cpu usage server memory

In the database server, memory is one of the most important resources for the database to provide services to the outside,

Not just SQL Server, including other databases, such as Oracle,mysql, is a kind of memory-like application.

In a SQL Server server, ideally, SQL Server caches all of the required data in memory, but this is often impractical because the data is always larger than the available physical memory

It can be said that memory pressure can directly determine whether the database can run efficiently, at the same time, if the memory pressure, but also affect the CPU usage and storage performance, can be said to be a loss, with a joint.

So, how to identify the memory pressure, how to determine whether there is a memory bottleneck on a server?

After SQL Server 2012, the management of memory was drastically reformed, and all memory management was controlled by Max Server

The maximum server memory setting as shown below (this is, of course, a test instance on my native computer, which does not indicate how the memory is configured)

Most of the memory quantization involves the buffer Pool of SQL Server, an internal cache manager,

It can be said that after SQL Server2012, all memory management is under the control of Max Server.

At the same time, SQL Server in the process of running, the parameters of various memory will be recorded, which is very important for us to determine the memory pressure of SQL Server is of great reference significance

Some of the counters mentioned below are stored in the Sys.dm_os_performance_counters system view

We extract the most important of them to explain:

    1. Page Life expectancy
    2. Buffer Cache hit Ratio
    3. Page reads/sec
    4. Page writes/sec
    5. Lazy writes/sec
    6. Total Server Memory
    7. Target Server Memory
    8. Paging File% Usage

It is important to note that the memory bottleneck cannot be arbitrarily determined by one of the above values, and that there is a certain relationship between the counters, and it is necessary to combine multiple values to make careful analysis and judgment.

    • Page Life expectancy

Page life expectancy is also referred to as bit ple, meaning is the memory page in memory of the average time to stay in the system view Sys.dm_os_performance_counters can be found in the unit is the second,

It is important to note that it does not refer to the maximum or minimum value of a page, but rather to an average of all the times that are calculated by the time that all pages stay in buffer pool.

The larger the value, the greater the probability that SQL Server will fetch data directly from the buffer pool when retrieving data

If SQL Server retrieves data directly from the buffer pool, then it is not necessary to go to the disk to query, because it is much more efficient to get the data directly from the memory than to get the data from the disk.

Because the latency of querying data from memory is nanosecond, and the latency of getting data from the disk is millisecond, which is two orders of magnitude, which can be seen from the cache and from the disk to obtain data, the impact on the performance of how much

So how many bits of ple are normal? I found that many of the data on this value are misunderstood, said 300s,300s was a reference value more than 10 years ago, is based on the memory of the server is affected by the 4GB memory constraints,

At present, the server memory is more than 100GB, with the same standard, it is clearly not accurate, the value of the calculation is related to the specific server memory configuration

Specifically I will not do any further explanation, you can refer to the following links

https://simplesqlserver.com/2013/08/19/fixing-page-life-expectancy-ple/

A reference to the standard algorithm is Max Buffer Pool (GB)/4*300 (S)

This value can be queried directly from the Sys.dm_os_performance_counters system view.

Select *  from  whereobject_namelike'%buffer manager%'and counter_name='Page life expectancy'

For example, your service memory is 64G, and the maximum memory allocated to SQL Server (above Max Server memories) is 60G

So the reference value of ple is 60/4*300=4500s, about 75 minutes, that is, the minimum is every 75 minutes, the memory of the data with the disk to do a full exchange

If the ple value on your server is lower than the calculated reference value for a long time, or if there is a significant change in the value over a certain period, then you need to be aware that there is a bottleneck in the memory.

If you've really done this, it's a very big difference in different environments.

Of course, for the test server, often few people use, or the pressure is very small server, memory pressure or the server has no load at all, the cached data may have existed in memory

This value can be very large, it is possible to reach tens of thousands of seconds.

Don't believe me to cut a picture for you, hehe

  

Of course, the pressure is larger than the production server, even if there are dozens of GB of memory, this value, it may be small to a few 10 seconds, my company is the case.

Therefore, the value of ple is one of the most important indicators for determining whether memory is a bottleneck.

    • Buffer Cache hit Ratio

The buffer cache hit ratio is the cache hits, and the literal explanation is the percentage of all data that is read directly from memory when needed in a query process

    in view of the value shown by its algorithm constraints, but in memory pressure diagnosis does not have too much reference significance ,

Since the buffer Cache hit ratio does not have much reference meaning, why put him here?

Because of this very popular parameter, this parameter is mentioned on many materials.

Many of the materials are introduced that the threshold is 90%,95% and other reference values, in fact, are wrong,

Actually observed people, the following link, long ago someone has this question, from ple and buffer hit ratio to arrive at a fundamentally inconsistent conclusion, sometimes we study or to pay attention to strictly forbidden, not to conform

How to understand that the buffer hit ratio is 99%, but page life expectancy<200?

I do not elaborate here, you can refer to my other blog post, there are detailed.

Http://www.cnblogs.com/wy123/p/5272675.html

In addition, really admire the foreigner, from the nature of the buffer Cache hit ratio, can be learned to do so seriously, really not easy.

    • Page Reads (writes)/sec

These two counters are the corresponding average per second of physical read/write data volume, this counter is a cumulative value, in units of page, and each page is 8Kb, can be converted to a KB-based or MB-bit unit data

A similar cumulative value in a counter does not prevent us from calculating an average over a time interval.

Other words

For Page Reads/sec, when a query is executed, it finds that the required data does not exist in buffer pool and needs to be queried on disk

For page writes/sec, the memory page is written to disk to free up memory when facing memory pressure

It says that the difference in time between reading data directly from the buffer pool, which is in memory and reading from disk, is huge, and the effect on performance is very obvious.

In fact, we will often encounter this phenomenon, some SQL query statements, the first execution is slow, but the time to execute again, it is relatively fast a lot,

Of course, through the SET STATISTICS IO This information can also be found to have the first physical reading phenomenon, this speed difference, or more obvious

These two values can be obtained by using the following SQL query

Select *  from  whereobject_namelike'%buffer manager%'and (counter_name='Page reads/sec'or counter_name=  'Page writes/sec' )

If large quantities of physical-rational IO operations occur frequently on a single server, you should be aware of any memory problems.

Because the regular mass of physical IO can seriously slow down the efficiency of SQL execution, ideally, this value should not be too large, and the material is said to not last more than 0, I personally feel a bit absolute

In fact, there is no absolute standard, as long as this value can be stabilized at a lower level, there is no persistent high volume of data written (disk) in the read (loaded into memory from disk), can accept

Conversely, if the long term is at a high level and observes that ple is not stable within the reference value range, there may be a bottleneck in memory.

  

    • Lazy writes/sec

The lazy writes is the data page information for buffers written by the buffer manager's lazy writer (lazy writer) per second.

Lazy writer is a system process that is used to bulk flush the dirty pages in memory to disk, and to clean up the memory space consumed by the original dirty pages in an action.

If there is a memory pressure, Lazy writer is triggered to clean up the dirty pages and the planned cache that has not been used for a long time.

If triggered frequently, there may be a bottleneck in memory

It is important to note that the lazy writes/sec value that is queried by the following sys.dm_os_performance_counters is a cumulative value

But it does not prevent us from having the data of the lazy writes/sec that occurred within a certain time interval, and I believe you can count on it.

Select *  from  whereobject_namelike'%buffer manager%'and counter_name='Lazy writes/sec'

For dirty pages and aging cache schedules, there are other mechanisms to implement memory space that is written to disk storage and that the cleaner consumes

Lazy write is triggered in the face of memory pressure,

If the Lazy write continues to be nonzero within a certain time interval, it is necessary to use PLE and page reads (writes)/sec to determine if there is enough memory for the analysis.

After talking about ple and page reads (writes)/sec and Lazy writes/sec, you can do a little summary.

It says that when measuring a memory bottleneck, it is often useful to combine multiple values to make judgments, if your ple is not in the expected value of the calculated reference, along with a large number of page reads (writes)/sec

Then you can almost conclude that your server has a memory bottleneck.

Because PLE does not reach the expected value, that is, there may be a large number of required data that does not exist in the cache,

While reading this data and physically reading it from disk, there is a higher page reads (writes)/sec phenomenon

The data that is physically read takes up the cache space (which can then be returned to the client for the query).

The original cache space in the data is cleared through the lazy writes memory, so that the data from the disk into the cache, and the data in the cache is cleaned out, resulting in the result is ple do not go

So by combining the three-value information, you can basically determine if your memory is a bottleneck.

Of course, in addition to the above three counters, there is more information to diagnose the memory, we continue.

    • Total Server Memory/target Server Memory

Total Server memory is the SQL Server Ram Manager "committed" memory, which is an already occupied memory, and target Server memory is the maximum memory available to the SQL Server Ram Manager

These two values can also be queried by Sys.dm_os_performance_counters.

Select *  from  whereobject_namelike'%memory manager%'and   in ('Target Server Memory (KB)','totalServer Memory (KB)')

When total Server memory is less than target server memory, SQL Server also knows that the system also has available RAM, when in need of memory, directly with the system to request,

At this point the total Server memory will gradually become larger.

However, when total Server memory is close to or equal to target Server memory, SQL Server realizes that it has exhausted the system's available RAM.

If the system is unable to continue allocating new memory when memory is needed, it will need to clean up the used memory space and use the newly cleaned up space for new data.

This seems to be in touch with the ple and the lazy write.

Of course, the system memory space is often less than the space of the data, such as it is possible that your database file size is 500GB, and memory 32G or 64G,

After a period of running the database, total Server memory is always close to or equal to the target Server memory,

So what do we say about the meaning of total Server memory and target Server memory?

It says that, given that data in data files are often larger than available physical memory (and of course extreme examples, you have only 2GB, 32GB of memory)

  The data is most likely not fully available in memory, but at the very least, the cache lasts until a certain amount of time to free up space (for new data use), which requires a degree

You can't say total Server memory is always close to or equal to the target Server memory, and the memory is not used, it is normal to clean up the RAM

If the cache is normal for 75 minutes,

found that total Server memory is close to or equal to target Server memory, and ple is significantly lower than the calculated reference value, down to a few minutes or even two minutes,

At the same time, it is observed that the memory and disk are frequently and physically exchanging data, which also indicates that there is a potential bottleneck in memory.

    • Paging File% Usage

Paging file is also cached files, another name called virtual memory, you must have heard, is to use physical disk space as memory space,

Windows system virtual memory files are generally stored in the C drive, a file called Pagefile.sys, the default is hidden

As follows

  

Here are the first two questions:

Does 1,sql server use cache files?

The answer is: Will

Can 2,sql server control the use of physical memory or page file?

The answer is: No, a Windows application, using physical memory or page file, is the absolute operating system, the application itself cannot decide which part of the memory to use

So how to know how much cache file space is used, this view can be queried by sys.dm_os_sys_memory.

Of course I this is in my machine, do not see what is particularly large use, a field is total_page_file_kb, one is available_page_file_kb

As the name implies, the total minus available, is the used

  

So what does file caching have to do with memory bottlenecks?

The application of the use of the file cache is not controlled by its own factors, is entirely determined by the operating system, SQL Sever is no exception, the amount of file cache use of course by the operating system to dispatch

How much does a file cache use to reflect?

If the file cache is used more, from the side can reflect the current physical memory on the server and the actual requirements of the gap between the memory, of course, the greater the gap, the higher the lack of memory

The use of the file cache is scheduled by the Windows operating system, which SQL Server cannot determine whether its cache data is stored in the house in memory or in a page file.

This is a black box, the specific algorithm I do not know

From the actual test, the consumption of physical memory and page file consumption is synchronous,

For example, the execution of a very large query, through sys.dm_os_sys_memory can be very clearly observed,

In the consumption of physical memory, but also with the consumption of virtual memory, the two exactly how to allocate, or what is the linear relationship between, I am not clear, but also want to have the expert pointing

It can be very clear that some production servers, because of the lack of physical memory, 32GB of physical memory of the machine, the use of the file cache has reached a very high degree (more than 30 GB), more than the physical memory itself

This should be an abnormal state, but this value also does not have an authoritative data, but also want to have an understanding of the message can contribute

Of course there are some environments that have bigger file caches to use, and I just haven't seen them.

Based on the use of page file, it is found that if you use a large number of page file, or even more than the physical memory itself, you can probably see how much memory the SQL Server server actually needs to be different from the existing memory.

It can also be used as one of the reference indicators when making memory bottleneck decisions.

  

Summarize

The above mentioned several memory bottleneck pressure judgment indicators, but also only involves a part of memory-related counters, of course, including but not limited to the above values .

If you do a memory bottleneck, you can have more reference values,

The previous article also said, under memory pressure,

SQL Server is a self-tuning application, and the value of each count value is a series of correlations, and often multiple performance counters exhibit some consistent characteristics

For example, in the case of low memory: PLE does not reach the expected value, Page reads (writes)/sec continues to remain at a high level, along with the lazy writer/sec persistence of the occurrence

If there are more other reference indicators, of course, more persuasive

However, if the above values are passed, the pressure on the memory can be sorta by locating a bottleneck.

For other memory-related counters, there is time to continue summarizing.

In fact, said the relevant content, but also just a SQL Server memory a very rough analysis, of course, the various parts of the memory classification can be further refined analysis and discussion.

In this paper, we analyze some knowledge points of SQL Server memory bottleneck, but there are still some shortcomings, please point out, thank you. Hope to be able to help you to SQL Server interested in crossing, learn together.

  

SQL Server memory-related counters and memory pressure diagnostics

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.