Miscellaneous on Song performance-thoughts on a delete

Source: Internet
Author: User
Tags session id

Scenario introduction:

We have a table dedicated to generating auto-incremental IDs for business use. The table structure is as follows:

CREATE TABLE TB001(    ID INT IDENTITY(1,1) PRIMARY KEY,    DT DATETIME)

To obtain a new ID for each business, run the following SQL statement:

INSERT INTO TB001(DT)SELECT GETDATE();SELECT @@IDENTITY

Since the data only needs to be retained for the last day, it is easy to create an SQL job to regularly delete data:

DELETE TOP(10000) FROM TB001WHERE DT<GETDATE()-1

A job runs once every 10 seconds. It runs for 2 hours every day and can delete up to million data records.

Problem:

Because there is no anti-brush mechanism on the front-end page, malicious users use program attacks, resulting in a daily surge of nearly 0.1 billion million data (can I also blow off Nb !!!), The current job cannot delete such a large amount of data, which must be adjusted.

 

Solution:

To ensure that the program is not modified, we first come up with the following methods:

1: increasing the number of single-deletion will cause lock blocking, and serious blocking will affect the business, which is unacceptable;

2

3: Increase the deletion frequency. You can consider it, but test the frequency.

 

Because method 2 can only increase in a small amount, we focus on the test of method 3. Since the minimum cycle of the SQL Agent job is 10 seconds, we can modify it in the script called by the job, multiple Delete statements are called for each job, and waitfor is used in the middle of the delete statement for intermittent execution:

DELETE FROM TB001WHERE DT<GETDATE()-1WAITFOR DELAY ‘0:0:05‘DELETE FROM TB001WHERE DT<GETDATE()-1

During the test, it is found that it has little impact on the business, so it is released for modification.

Result After a job runs in the middle of the night, R & D immediately receives an alarm. The program access latency is serious. Check the server and wait for the lock for more than 500000 milliseconds. sys. dm_exec_requests shows that more than 300 calls are waiting for the same lock resource. After the job is stopped, the program immediately returns to normal.

Let's test it. Why?

First, prepare the test data.

CREATE TABLE TB001(    ID INT IDENTITY(1,1) PRIMARY KEY,    DT DATETIME)GOINSERT INTO TB001(DT)SELECT GETDATE()-1 FROM SYS.all_columnsGOINSERT INTO TB001SELECT GETDATE()-1 FROM TB001GO 13

Then try to delete the data

BEGIN TRANDELETE TOP(10000)  FROM TB001WHERE DT<GETDATE()-1

View the lock status:

-- The session ID of the preceding transaction is 55.
Sp_lock 55

The data deleted at a time is too large, resulting in a table lock and blocking the program to insert data. Solution: adjust the number of deleted records at a time.

PS: SQL Server attempts to update the lock when it receives 5000 locks in the row set, and tries to update the lock under memory pressure.

Therefore, we can only try to delete more frequently and smaller batches, so we can modify the deletion Code as follows:

DECLARE @ID INTSET @ID=0WHILE(@ID<100)BEGINDELETE TOP(100)  FROM TB001WHERE DT<GETDATE()-1
WAITFOR DELAY ‘0:0:00:400‘SET @ID=@ID+1END

PS: deleting 100 rows is only an attempt value, and there should be no optimal number of deleted rows. The best explanation is to consider setting this value: delete the page to be scanned, execute multiple times, the number of indexes on the table, the number of logs to be written, the lock and blocking, and so on. The uninstallation explanation is to perform multiple tests until the desired values are met.
If you delete 90 rows of data and write 60 kb of logs on average, it takes two physical writes to delete 100 rows. Why?

 

I tested it with the modified version. The speed was so fast, and my life was so beautiful. I want to update it to the production server to make the storm more violent !!!

 

Sure enough, this is not the end of life. The tragedy has emerged and the execution is unstable. It may take four minutes to complete the SQL statements that can be executed in 40 seconds. This is not scientific, I tested it several times !!!

 

Let's take a closer look at the statements. I don't blame others for the SQL spam I have written. I can't help but read the code again:

DELETE TOP(100)  FROM TB001WHERE DT<GETDATE()-1

This is written according to the business logic, and there is no problem, but there is no index on DT. Because the DT and ID in the table increase sequentially, It is scanned in ascending order of the primary key ID, the first row of ID is the smallest, and the insertion time is also the earliest, which is also the target for deletion. Therefore, you only need to perform several logical reads to easily find 100 rows of data that meet the conditions, therefore, the consumption is minimal, but the ideal is full and the reality is very skinny,

After the delete statement is run frequently, use set statistics Io on to view the statement. The same execution plan is as follows:

However, the logic Io caused is completely different, from four to several thousand times. This phenomenon is particularly evident when deleted at high frequency (you can run 10000 Delete queries consecutively during testing)

 

Try other methods to force ID index scanning:

DECLARE @ID INTSET @ID=0WHILE(@ID<10000)BEGIN;WITH T1 AS(SELECT TOP(100)* FROM TB001WHERE DT<GETDATE()-1ORDER BY ID)DELETE FROM T1SET @ID=@ID+1END

The test still finds the same problem. Is there no solution?

Once again, we found that we can find the largest ID to be deleted, and then delete the data smaller than this ID, and avoid a potential risk, because DT has no index, after the data is cleared one day ago, if the job continues to run, you need to find 100 rows of data that meet the conditions to delete the data, and a full table scan will be performed on the table, A larger number of logical Io is consumed.

DECLARE @MaxID INTSELECT @MaxID=MAX(ID) FROM TB001 WITH(NOLOCK)WHERE DT<GETDATE()DECLARE @ID INTSET @ID=0WHILE(@ID<10000)BEGIN;WITH T1 AS(SELECT TOP(100)* FROM TB001WHERE ID<@MaxIDORDER BY ID)DELETE FROM T1SET @ID=@ID+1END

From the logic Io point of view, the performance is not significantly improved, but from the CPU point of view, the CPU usage is significantly reduced, there are two reasons for speculation:
1: Date consumption is greater than int (Date Storage is similar to floating point storage, processing consumes additional CPU resources)

2: Due to the sorting of ID indexes, you may not need to compare all the data on the page row by row to determine whether the data meets the conditions (Personal guesses, do not take it seriously)

 

Because ID is auto-incrementing continuously, although there may be disconnections due to transaction rollback or DBA intervention, this is not the point. The point is that we do not have to delete 100 rows of data each time, therefore, we can delete intervals by ID and discard the top method:

DECLARE @MaxID INTDECLARE @MinID INTSELECT @MaxID=MAX(ID),@MinID=MIN(ID)FROM TB001 WITH(NOLOCK)WHERE DT<GETDATE()-1DECLARE @ID INTSET @ID=0WHILE(@ID<10000)BEGINDELETE  FROM TB001WHERE ID>=@MinID+@ID*100AND ID<@MinID+(@ID+1)*100AND ID<@MaxIDSET @ID=@ID+1END

The test found that the logic Io for each deletion is very stable and consumes very little. This is the perfect thing !!

-- ===================================================== ======================

Summary:

It seems like a simple SQL statement, and there are many aspects to consider. There are no harm to all kinds of tossing, puzzles, and materials that focus more on the Basic Principles. Bold guesses and careful arguments, multi-test is the only way to verify inference;

 

Additional Words:

1. about Business: in many cases, it is very bad for DBAs to optimize services without understanding the business. In addition, many of the best optimizations are programs rather than databases, denying the so-called "business needs" of developers is also a necessary skill for DBAs. One optimization found that the development was on the sorting page of tens of millions of data, and the inquiry Development received a reply "the user did not enter filtering conditions". Can the user not set the default conditions without entering the data? If you want to query the latest records, you can query the data of the last three days by default.

2. scenario: There are some beginners who want to draw absolute inferences without considering the impact of the scenario and lack of tests and arbitrary conclusions. This is also terrible, A solution suitable for your scenario is the best solution.

 

Legacy problems:

1. For the business scenarios mentioned in this article, there are some other solutions, such as partition mode, regular partition switching and then data deletion, such as using the "sequence" added in SQL Server 2012 ";

2. I guess the root cause of the problem mentioned above is the implementation method of SQL Server to delete rows. Only the data rows are deleted when they are deleted, rather than actually deleted from the page, in the process of frequent uninterrupted deletion, these data pages are not deleted in a timely manner,

SQL Server scans the data pages that should have been deleted, resulting in a high level of logical reading. Using the ID range lookup can avoid scanning these data pages, directly move to the data page to be accessed. If the deletion frequency is low (for example, once every 3 seconds), this problem will not occur.

-- ==================================

Still sister:

 

Miscellaneous on Song performance-thoughts on a delete

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.