Detailed method of SQL delete duplicate data

Last Update:2017-02-28 Source: Internet

Author: User

Tags getdate

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Duplicate data, usually two: one is a completely duplicate record, that is, the values of all fields are the same, and the second is the record that some of the field values are duplicated.

I. Deletion of fully duplicated records

Data that is completely duplicated, usually because the primary key/Unique key constraint is not set.

Test data:

If object_id (' Duplicate_all ') is not null

drop table Duplicate_all

CREATE TABLE Duplicate_all

(

C1 int,

C2 int,

C3 varchar (100)

)

INSERT INTO Duplicate_all

Select 1,100, ' AAA ' UNION ALL

Select 2,200, ' BBB ' UNION ALL

Select 3,300, ' CCC ' UNION ALL

Select 4,400, ' DDD ' UNION ALL

Select 5,500, ' Eee '

(1) The use of temporary tables

Use distinct to get a single record, delete the source data, and then lead back the record.

If the table is not large, you can export all the records once, and then truncate the table back, so that you can avoid the log operation of Delete.

If object_id (' tempdb.. #tmp ') is not null

drop table #tmp

SELECT DISTINCT * into #tmp

From Duplicate_all

where C1 = 1

Delete Duplicate_all where C1 = 1

INSERT INTO Duplicate_all

SELECT * FROM #tmp

(2) using Row_number

With TMP

(

Select *,row_number () over (PARTITION by c1,c2,c3 Order by (GETDATE ()) as Num

From Duplicate_all

where C1 = 1

)

Delete tmp where num > 1

If multiple tables have completely duplicate rows, consider combining multiple tables with a union and inserting them into a new, same-structured table, and SQL Server helps remove duplicate rows between tables and tables.

Two. Delete part of duplicate records

Partial columns duplicate data, usually with a primary key on the table, possibly because the program logic causes multiple rows of data column values to be duplicated.

Test data:

If object_id (' Duplicate_col ') is not null

drop table Duplicate_col

CREATE TABLE Duplicate_col

(

C1 int PRIMARY KEY,

C2 int,

C3 varchar (100)

)

INSERT INTO Duplicate_col

Select 1,100, ' AAA ' UNION ALL

Select 2,100, ' AAA ' UNION ALL

Select 3,100, ' AAA ' UNION ALL

Select 4,100, ' AAA ' UNION ALL

Select 5,500, ' Eee '

(1) Unique index

Unique indexes have an option to ignore duplicate builds, and you can use this index option when creating a PRIMARY KEY constraint/Unique key constraint.

If OBJECT_ID (' tmp ') is not null

DROP TABLE tmp

CREATE TABLE tmp

(

C1 int,

C2 int,

C3 varchar (100),

Constraint uq_01 Unique (C2,C3) with (Ignore_dup_key = ON)

)

INSERT INTO TMP

SELECT * FROM Duplicate_col

SELECT * FROM TMP

(2) Using primary key/unique key to delete

The maximum/minimum retention of PRIMARY key/unique keys is usually selected, and other rows are deleted. The following only retains the smallest C1 row in the duplicate record.

Delete from Duplicate_col

where exists (select 1 from duplicate_col b where Duplicate_col.c1 > B.c1 and (duplicate_col.c2 = B.c2 and Duplicate_col . c3 = B.c3))

--or

Delete from Duplicate_col

where C1 not in (select Min. (c1) from Duplicate_col Group by C2,C3)

If you want to keep the nth row in the duplicate record, refer to 05 for a few lines in the group.

(3) Row_number

It's basically the same as deleting a full duplicate record.

With TMP

(

Select *,row_number () over (PARTITION by c2,c3 Order by (GETDATE ()) as Num

From Duplicate_col

)

Delete tmp where num > 1

SELECT * FROM Duplicate_col

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More