Delete duplicate records in SQL

Source: Internet
Author: User

Duplicate records may occur in the Database due to some reasons, such as user input or data import failure. if you do not use primary keys, constraints, or other mechanisms to implement data integrity, record them in your database. now let's look at how to delete these records in SQL SERVER 2008. First, we can simulate some simple duplicate records:

The code is as follows: Copy code

Create Table dbo. Employee

([Id] int Primary KEY,

[Name] varchar (50 ),

[Age] int,

[Sex] bit default 1)

 

Insert Into Employee ([Id], [Name], [Age], [Sex]) Values (1, 'James ', 25, default)

Insert Into Employee ([Id], [Name], [Age], [Sex]) Values (2, 'James ', 25, default)

Insert Into Employee ([Id], [Name], [Age], [Sex]) Values (3, 'James ', 25, default)

Insert Into Employee ([Id], [Name], [Age], [Sex]) Values (4, 'Lisa ', 24, 0)

Insert Into Employee ([Id], [Name], [Age], [Sex]) Values (5, 'Lisa ', 24, 0)

Insert Into Employee ([Id], [Name], [Age], [Sex]) Values (6, 'Lisa ', 24, 0)

Insert Into Employee ([Id], [Name], [Age], [Sex]) Values (7, 'mirsa)

Insert Into Employee ([Id], [Name], [Age], [Sex]) Values (8, 'mirsa)

Insert Into Employee ([Id], [Name], [Age], [Sex]) Values (9, 'mirsa)

Insert Into Employee ([Id], [Name], [Age], [Sex]) Values (10, 'John', 26, default)

Insert Into Employee ([Id], [Name], [Age], [Sex]) Values (11, 'Abraham', 28, default)

Insert Into Employee ([Id], [Name], [Age], [Sex]) Values (12, 'Lincoln ', 30, default)

 

Select * From dbo. Employee

First, we use the most common method:

The code is as follows: Copy code

Delete From Employee Where Name in (

Select Name

From Employee Group By Name Having Count (Name)> 1 );


Then use RowNumber ():

The code is as follows: Copy code

Delete T From (

Select Row_Number () Over (Partition By [Name] Order By (SELECT 0) As RowNumber, * From Employee) T

Where T. RowNumber> 1;

With Dups

(

Select ROW_NUMBER () Over (Partition by [Name] Order by (SELECT 0) as rn

FROM Employee

)

Delete From Dups

Where rn> 1;

WITH Dups

(

Select [ID], [Name], [Age], [Sex]

, ROW_NUMBER () OVER (Partition By [Name] Order By (SELECT 0) AS rn

, RANK () OVER (Partition By [Name] Order By (SELECT 0) AS rnk

FROM Employee

)

Delete from Dups

WHERE rn <> rnk;

The execution plans for these four T-SQL queries are as follows:

 

You can see that the CTE method does not have the largest overhead, mainly in Table Spool, where the overhead is 44%, and Table Spool is a physical operator.

The Table Spool operator scans input and places a copy of each row in a hidden offline Table. This Table is stored in the tempdb database and only exists during the query period. If you repeat this operator (for example, repeat through the Nested Loops operator) without any re-binding, the offline data is used instead of scanning the input again.

 

Note that the above method only applies when there are few duplicate records, if there are many duplicate records. the DELETE operation will be very slow. The best way is to copy the target data to another new table, DELETE the original table, and rename the new table as the original table. or use a temporary table to reduce database transaction logs. see the following T-SQL:

The code is as follows: Copy code

WITH Dups

(

Select [ID], [Name], [Age], [Sex]

, ROW_NUMBER () OVER (Partition By [ID] Order By (SELECT 0) AS rn

FROM Employee

)

Select [ID], [Name], [Age], [Sex]

INTO dbo. EmployeeDupsTmp

FROM Dups

WHERE rn = 1

 

Drop table dbo. Employee;

 

EXEC sp_rename 'dbo. Employeedupstmp', 'employe'

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.