SQL Server performance optimization-Waiting for--sleep_bprool

SQL Server performance optimization-Waiting for--sleep_bprool_flush

Last Update:2016-01-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Objective:

There is a database for historical archive (for short, historical library), after a certain amount of time accumulated, data files have reached more than 700 GB, and later decided that some data can not be retained, this part of the data truncate, spare more than 600 GB of space, that is, after shrinking, Theoretically, the database is only more than 100 g. To do this, I was ready to shrink after rebuilding the clustered index for each table (with a few tables, but a single-table number or tens of millions of).

But when the contractions start, even if the range of each contraction is reduced to 500MB, the speed is extremely slow and often does not respond for several hours. After viewing the wait information, it is found that a spid=18 session (SPID<=50 is a system session) has always shown a wait status of "Sleep_bprool_flush" and has blocked the shrink operation.

To do this, I think even small probability events (because this wait type is common, but not always noticeable), since the emergence of, it may be useful to study.

Description: The environment is SQL Server 2008R2

The source of this article: http://blog.csdn.net/dba_huangzj/article/details/50455543

Brief introduction:

Since this has become a problem, it is necessary to first understand what sleep_bpool_flush this waiting state is. In Microsoft's official note: https://technet.microsoft.com/zh-cn/library/ms179984 (v=sql.105). aspx, with a simple description: When checkpoints abort new I/O to prevent disk subsystem flooding Appears at the time of publication. Obviously, this explanation is inadequate. So I turned over the foreign Daniel Blog and other books, summarized as follows:

This wait state is directly related to the checkpoint process, and checkpoint is primarily used in memory buffers (bufferpool), where data changes (called dirty pages) have occurred since it was loaded into memory. The dirty pages are written back from memory to the data file of the disk after the checkpoint is triggered.

So it's natural to think of checkpoint. But in terms of behavioral characteristics, it also means that your disk subsystem may have performance problems.

Checkpoint Introduction:

To understand the Sleep_bpool_flush wait type, it is necessary to understand checkpoint this thing first. It is a system process that is triggered by the SQL Server background, or you can manually enter checkpoint to run it.

This process is responsible for writing the modified pages of the buffer to the data file. The common place is in the backup. One of the important roles of this process is to speed up the database recovery in exceptional situations. In the event of a database failure, SQL Server must restore the database to its previous normal state as much as possible. SQL Server uses the transaction log for Redo (redo) or rollback (undo) to reattach the modifications to the data file that are not written to the data file. If the data page is modified but not yet written to the data file, SQL Server must redo the modification. If a checkpoint has occurred before and the dirty pages are written to the data file, this step can be skipped to speed up the database recovery. ：

When a data page is modified by a transaction, the modification is first recorded in the transaction log (not actually written to the LDF file but in an area of memory called the log buffer, which is then written to the LDF file of the disk, which is represented by the Writelog and Logbuffer wait types). The corresponding data page in the buffer pool of memory is then identified as a dirty page. When the checkpoint process triggers, all dirty pages since the last checkpoint have been physically written to the disk's data file, and this process does not control what the state of the transaction that caused the dirty page is (commit, uncommitted, rollback).

Typically, checkpoint is automatically run periodically by SQL Server (one minute by default). But it does not really mean that only 1 minutes of waiting is triggered. The user can set the run cycle, but don't modify it unless you're sure the root cause of the problem is here. Because checkpoint will analyze the current IO request, delay, and other conditions to trigger. Thus avoiding unnecessary high IO overhead.

The source of this article: http://blog.csdn.net/dba_huangzj/article/details/50455543

In SQL Server, there are several types of checkpoint (a detailed description of checkpoint will be specifically described in a subsequent article):

Internal checkpoint type: Not configurable, automatically triggered under certain circumstances, such as backup.
Automatic checkpoint type: If the SQL Server-related configuration is not changed, it will be triggered at 1-minute intervals. This type can modify the time, but this modification is instance-level and can only be modified to be less than or equal to 1 minutes.
Manual Checkpoint Type: initiates the checkpoint command via SSMS or another client. This trigger can enter a number of seconds to specify that the checkpoint must be completed within this number of seconds. This operation is at the library level. For example, checkpoint 10, which means that SQL Server will attempt to execute checkpoint within 10 seconds. Details Visible: https://technet.microsoft.com/zh-cn/library/ms188748 (v=sql.105). aspx
Indirect checkpoint type: This is the library level option introduced by SQL Server 2012. If this value is greater than 0, it overrides the default automatic checkpoint configuration on a particular database, which can be implemented by the following command:

ALTER database[database name] SET target_recovery_time = [number of seconds or minutes]

As mentioned earlier, SQL Server analyzes the current system pressure and, when it considers that there is no need to checkpoint at this time, kills the process, thus preventing the disk subsystem from becoming worse. When checkpoint is throttled, it is recorded in the information of the Sleep_bpool_flush wait type.

Under normal circumstances, this wait state should be as close to 0 as possible.

Reduce Sleep_bpool_flush wait:

Since there is a problem, it is time to fix it, even though it may not usually have much performance problems. When you encounter this problem, it is recommended that you first check the configuration or the sentence, and do not modify the default configuration if it is not necessary. You can query the configuration values by using the following statement:

SELECT * from sys.configurations WHERE name = ' recovery interval (min) '

Where "value" is 0 for the default configuration, this value is in minutes, the lower the value, the higher the frequency of checkpoint, the more likely to trigger Sleep_bpool_flush wait. In addition, frequent use of the checkpoint command in a transaction can easily trigger such a wait.

In addition to this scenario, there may be a performance issue with the disk subsystem where the data file resides. As mentioned earlier, the result of the checkpoint trigger is to write the dirty page of the buffer to disk, if the current disk load is very large, then the checkpoint operation will be frequently killed, causing sleep_bpool_flush wait.

Regression topics:

The meaning and reason of this wait state are described earlier, so now look at my problem, because the problem is still to be solved. After checking, the default configuration is fine, and what I'm doing is shrinking the data file, so the problem should be on the shrink.

There are three potential problems with shrinking a data file:

The logic of contraction is to move the data to the previous area of the data file, because the contraction is recycled from the last area of the data file, which consumes a lot of time and system resources to move all the data. In this process, SQL Server uses a large amount of CPU resources to determine where the data can be moved, how much space is available for movement, and requires a large amount of IO resources to read from the data file and write the data to the new physical address. In addition, if the table does not have a clustered index, the non-clustered index records the RID information because of the leaf node, so the move causes information update overhead for the nonclustered index. Note that "every row for each nonclustered index" is affected. Needless to say, you can imagine that this is a very expensive operation.
Log file growth: Regardless of the recovery model currently in use, SQL Server records each data movement operation, the allocation or recycling of each data page and area, and changes to each index. This record increases the system resource overhead of the first problem and causes the log file to grow rapidly. There's an MVP blog that describes the number of log files required to shrink a data file: http://www.karaszi.com/SQLServer/info_dont_shrink.asp
Increase fragmentation of tables and indexes: it is necessary to note that fragmentation is not always a bad thing, because there is a reason for existence. There are many operations that are not affected by fragmentation. This section can be seen in Microsoft's white paper: https://technet.microsoft.com/en-us/library/cc966523.aspx. It describes the types of fragmentation and fragmentation scenarios that require attention.

From the previous analysis, in the view of the server that the history of the disk (ordinary SAS disk), you can initially determine the disk IO performance issues. Because the clustered index of all tables has been rebuilt before (no heap table), the data tightness should be high enough. This is the problem of the most headaches, it is not possible because the contraction is slow to say change disk, even if can change, financial process is not generally cumbersome. So let's think about how to make every read and write operation as small as possible. The source of this article: http://blog.csdn.net/dba_huangzj/article/details/50455543

This is a history of the library at the end of the month (when writing this article) there will be more monthly knot class, the annual settlement of the query, in the process of frequent use of shrinking files is obviously unreasonable, so put this operation in idle operation (leisure is not necessarily the night, the main look at the system type and operating time period). In addition, the scale of contraction should be as small as possible, in order to avoid a large statement, you can use the following statement to automate contraction:

declare @sql nvarchar (1024x768) declare @size int=758000--current size, MB declare @end int =1024  -Stop range while @size >[email Protected]  -until the stop range is reached beginset @sql = ' DBCC shrinkfile (N ' data file name ', ' +cast (@size as nvarchar () + ') '--print @ SqlExec (@sql) set @[email protected]end

The print statement that is commented out is used to check that the command being executed is correct. This is just a point, readers can modify or add other functions according to the actual situation. In addition to the second line of code, set @[email protected] means that each contraction of 500MB, the reader can also be based on the specific circumstances of the test, may 100mb/times is the best and fastest, it may be set to set @[email protected].

By adjusting the size of each contraction, scheduling the idle run, and manually running the checkpoint on a regular basis, although the waiting state is still (after all, the disk performance is flawed), but the contraction progress is satisfactory.

The most important means is still in the server at leisure, after repeated testing, after 11 o'clock in the evening, the server maintenance job is not running, and the user has left work, even if each shrink 100G, it only takes 1 hours.

Although the results are somewhat unsatisfactory, the reader may wish to see how to resolve them thoroughly. But after all, it is a formal environment and cannot be easily tried and modified. But in addition to the previous approach, there are other ways to choose on demand:

Split the data file and move the file to a low-load or high-performance disk. However, this operation takes into account the late merging of data.
Some libraries can be briefly taken offline, moving the database to a better-performing disk and then attaching and then shrinking.
In fact... It's not a good thing not to shrink.

Summarize:

Sleep_bpool_flush waits for the checkpoint process of SQL Server to be closely related, and checkpoint is primarily responsible for writing dirty pages to disk. Before the checkpoint trigger, SQL Server analyzes the current load on the server and SQL Server logs this state to the Sleep_bpool_flush wait state if the disk subsystem is too stressed to cause checkpoint to be considered to be throttled.

In a normal system, the wait time should not be long, but it still has the potential to affect system performance. Running the checkpoint command too frequently or setting the value of "recovery interval" too low can trigger a sleep_bpool_flush wait. The disk subsystem performance of the data file is too low to cause this kind of wait information.

Therefore, the SQL Server configuration, statements, and disk subsystems need to be checked when this wait state is found to occur frequently or when the wait time is long.

The source of this article: http://blog.csdn.net/dba_huangzj/article/details/50455543

Next we'll talk about the recurring checkpoint of this article.

SQL Server performance optimization-Waiting for--sleep_bprool_flush

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More