Deleting large numbers from SQL Server is said to be enabled

Source: Internet
Author: User

What do you pay attention to when deleting data in a database? How to write code?

Most of you can see these two problems. If you don't want to think about them, just use a delete statement. Note that the deletion conditions should not be deleted. What should you do!

Yes, you can ask again. The deletion operation will write logs. Do you have enough disk space to store logs?

Content of this article:

1. Let me give you an actual problem;

2. Sort out the statement and writing of data deletion in T-SQL;

3. Solve the Problems in this work and summarize the results.

Let's take an example in my actual work. In such an application, we collect some data regularly every day and write it into the database. There are a batch of tables (more than a dozen) in the database to store the data, because of the large data volume, more than 0.1 billion of the data in a single table in three months, we only use tables to save 90 days of data. We don't care about what to do here. Each day, a job deletes data 90 days ago. Because this was a long time ago, my predecessors did this, and now people are no longer in this company. All the actions of these tables are deleted using the following statements:

 
 
  1. DELETE FROM dbo.S5_BinTest_Detail      
  2.          where BinTestID in (SELECT BinTestID      
  3.            from dbo.S5_BinTest_Info      
  4.             where TS <dateadd(dd,-90,getdate())     
  5.        )     
  6.         and datediff(dd,(select min(TS) from S5_BinTest_Info),TS)<=0    
  7.          DELETE from dbo.S5_BinTest_Info where TS <dateadd(dd,-90,getdate())   
  8.         and datediff(dd,(select min(TS) from S5_BinTest_Info),TS)<=0 

This is only one day of data deleted at a time, that is, the earliest date, 90 days ago. The following server alarms were reported last night:

I can see that the space usage of each disk on the server is as follows. disk D is the disk where the log file is located, and there is about 95gb of available space at ordinary times;

An edisk is a data file storage disk with a space of GB at ordinary times. The daily addition and deletion of data are basically balanced, so there is not much change.

The size of the database file is GB. Hundreds of millions of tables.

The average log file is several hundred MB.

The alarm is that the D disk is less than 40 GB, that is, the log file is increased by more than 45 GB, and the job that deletes data is executing. When I checked the statements for these deletion actions, I found a problem, that is, I did not consider the relationship between log growth and disk space When deleting big data. In this case, it may be that time.

Let's talk about how to delete data:

For data deletion, the T-SQL provides two statements for deleting data rows from the table: delete and truncate.

A Delete statement is a standard SQL statement used to delete data from a table based on specified predicates (conditions. The standard statement has only two clauses: The from clause used to specify the target table name and the where clause used to specify the adjustment condition. Only rows that allow the calculation result of the predicate condition to be true will be deleted.

For example:

 
 
  1. DELETE from dbo.S5_BinTest_Info where TS <dateadd(dd,-90,getdate()); 

This is to delete the time field ts in the table s5_bintest_info before 90 days.

The delete statement uses the complete log processing mode. When a large amount of data is deleted, it may take a lot of time and require a lot of log storage space.

The truncate statement is not a standard SQL statement. It is used to delete all rows in the table. Unlike the delete statement, the truncate statement does not need to add conditions, for example:

 
 
  1. TRUNCATE TABLE dbo.S5_BinTest_Info; 

Compared with the delete statement, truncate has the advantage of logging in the minimal mode. Performance is significantly different from the full mode of the delete statement. Truncate is the fastest. When an ID column exists in the table, delete does not change the value of the ID column, and truncate is reset to the initial seed value.

In addition, for join-based Delete, The T-SQL supports a join-based Delete syntax, which is not a standard SQL syntax. The join itself has an excessive effect because it has a predicate-based filter (on clause ). You can access the attributes (columns) of related rows in another table through the join operation, and reference these attributes in the WHERE clause, this means that you can delete the data rows in the table based on the row attribute defined in the other table. For example:

 
 
  1. DELETE FROM S5  
  2. FROM dbo.S5_BinTest_Info AS S5  
  3. JOIN dbo.S5_BinTest_Info_Dtl AS DTL  
  4. ON S5.ID=DTL.ID  
  5. WHERE DTL.QTY=1; 

This is very similar to the SELECT statement. In the logic, the first clause processed by the delete statement is the from clause (the second row from DBO. s5_bintest_info as S5), then process the WHERE clause, and finally the delete clause.

This can also be done using the query clause:

 
 
  1. DELETE FROM dbo.S5_BinTest_Info  
  2. WHERE EXISTS(SELECT 1 FROM dbo.S5_BinTest_Info_Dtl AS DTL  
  3. WHERE S5_BinTest_Info .ID=DTL.ID AND DTL.QTY=1); 

The query clause here is a standard SQL statement, and I prefer standard SQL.

Back to my work problems, I want to delete them in batches. You can delete a record of reasonable data several times.

Because I have an ID in it, I delete an ID record at a time. The following is the source code of the changed loop method.

 
 
  1. --, Due to the big data deletion problem, the following changes are implemented in batch deletion mode:
  2. Declare @ minid int;
  3. Declare @ n int;
  4. -- Retrieve the ID of the record 90 days ago to be deleted
  5. Select bintestid into # s5id
  6. From DBO. s5_bintest_info
  7. Where ts <dateadd (DD,-90, getdate ());
  8. -- The number of IDs to be deleted is a cyclic variable because the ID number may not be consecutive.
  9. Select @ n = (select count (1) from # s5id );
  10. While (@ n> 0)
  11. Begin
  12. -- Delete data corresponding to an ID at a time
  13. Select @ minid = min (bintestid) from # s5id;
  14. Delete from DBO. s5_bintest_detail
  15. Where bintestid = @ minid;
  16. Delete from DBO. s5_bintest_info
  17. Where bintestid = @ minid;
  18. -- Remove the deleted ID from the temporary table
  19. Delete # s5id where bintestid = @ minid;
  20. -- Change the number of remaining IDs to be deleted. This is a cyclic variable.
  21. Select @ n = (select count (1) from # s5id );
  22. End
  23. Drop table # s5id;

If there is no ID for a data table, we can use the top method to delete it.

When I use this method for execution, the log basically does not grow, because a small amount of data is deleted once, and will be released after the operation is successful, and then used again.

When you delete data, are there any problems with the syntax and conditions that may cause the log to grow space for a large amount of data?

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.