How to delete duplicate data using SQL statements

Last Update:2013-10-22 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. delete completely Repeated Records

Completely duplicated data is usually caused by the absence of primary key/unique key constraints.
Test data:
Copy codeThe Code is as follows:
If OBJECT_ID ('duplicate _ all') is not null
Drop table duplicate_all
GO
Create table duplicate_all
(
C1 int,
C2 int,
C3 varchar (100)
)
GO
Insert into duplicate_all
Select 1,100, 'aaa' union all
Select 1,100, 'aaa' union all
Select 1,100, 'aaa' union all
Select 1,100, 'aaa' union all
Select 1,100, 'aaa' union all
Select 2,200, 'bbb 'union all
Select 3,300, 'ccc 'union all
Select 4,400, 'ddd 'union all
Select 5,500, 'eee'
GO

(1) use temporary tables

You can use DISTINCT to obtain a single record, delete the source data, and export back records that do not repeat.
If the table is not large, you can export all the records once, and then export them back to the truncate table to avoid delete log operations.
Copy codeThe Code is as follows:
If OBJECT_ID ('tempdb .. # tmp ') is not null
Drop table # tmp
GO
Select distinct * into # tmp
From duplicate_all
Where c1 = 1
GO
Delete duplicate_all where c1 = 1
GO
Insert into duplicate_all
Select * from # tmp

(2) Use ROW_NUMBER
Copy codeThe Code is as follows:
With tmp
As
(
Select *, ROW_NUMBER () OVER (partition by c1, c2, c3 order by (getdate () as num
From duplicate_all
Where c1 = 1
)
Delete tmp where num> 1

If multiple tables have completely repeated rows, you can use UNION to join multiple tables to a new table with the same structure, SQL Server helps remove duplicate rows between tables.

Ii. delete some duplicate records

Repeated data of some columns. Generally, the table has a primary key, which may be caused by repeated values of multiple data columns in the program logic.
Test data:
Copy codeThe Code is as follows:
If OBJECT_ID ('duplicate _ col') is not null
Drop table duplicate_col
GO
Create table duplicate_col
(
C1 int primary key,
C2 int,
C3 varchar (100)
)
GO
Insert into duplicate_col
Select 1,100, 'aaa' union all
Select 2,100, 'aaa' union all
Select 3,100, 'aaa' union all
Select 4,100, 'aaa' union all
Select 5,500, 'eee'
GO

(1) unique index

The unique index has an option to ignore duplicate creation. You can use this option when creating a primary key constraint or a unique key constraint.
Copy codeThe Code is as follows:
If OBJECT_ID ('tmp ') is not null
Drop table tmp
GO
Create table tmp
(
C1 int,
C2 int,
C3 varchar (100 ),
Constraint UQ_01 unique (c2, c3) with (IGNORE_DUP_KEY = ON)
)
GO
Insert into tmp
Select * from duplicate_col
Select * from tmp

(2) Use primary key/unique key to delete
The Maximum/minimum values of the primary key/unique key are usually retained, and other rows are deleted. Only the rows with the smallest c1 record are retained below.
Copy codeThe Code is as follows:
Delete from duplicate_col
Where exists (select 1 from duplicate_col B where duplicate_col.c1> B. c1 and (duplicate_col.c2 = B. c2 and duplicate_col.c3 = B. c3 ))

-- Or
Copy codeThe Code is as follows:
Delete from duplicate_col
Where c1 not in (select min (c1) from duplicate_col group by c2, c3)

If you want to retain the nth row in the record, you can refer to 05. Fetch certain rows in the group.
(3) ROW_NUMBER
The statement is basically the same as that for deleting a completely repeated record.
Copy codeThe Code is as follows:
With tmp
As
(
Select *, ROW_NUMBER () OVER (partition by c2, c3 order by (getdate () as num
From duplicate_col
)
Delete tmp where num> 1
Select * from duplicate_col

Only one SQL statement is retained to delete duplicate data (the following code, many netizens report errors, and we will test it more)

Use SQL statements to delete duplicate items and retain only one
There are some identical records in thousands of records. How can I use SQL statements to delete duplicates?
1. Search for redundant duplicate records in the Table. duplicate records are determined based on a single field (peopleId ).
Select * from people
Where peopleId in (select peopleId from people group by peopleId having count (peopleId)> 1)
2. Delete unnecessary duplicate records in the Table. Repeat records are determined based on a single field (eagleid), leaving only the records with the smallest rowid
Delete from people
Where peopleName in (select peopleName from people group by peopleName having count (peopleName)> 1)
And peopleId not in (select min (peopleId) from people group by peopleName having count (peopleName)> 1)
3. Search for redundant duplicate records in the table (multiple fields)
Select * from vitae
Where (a. peopleId, a. seq) in (select peopleId, seq from vitae group by peopleId, seq having count (*)> 1)
4. Delete redundant record (multiple fields) in the table, leaving only the records with the smallest rowid
Delete from vitae
Where (a. peopleId, a. seq) in (select peopleId, seq from vitae group by peopleId, seq having count (*)> 1)
And rowid not in (select min (rowid) from vitae group by peopleId, seq having count (*)> 1)
5. Search for redundant duplicate records (multiple fields) in the table, excluding records with the smallest rowid
Select * from vitae
Where (a. peopleId, a. seq) in (select peopleId, seq from vitae group by peopleId, seq having count (*)> 1)
And rowid not in (select min (rowid) from vitae group by peopleId, seq having count (*)> 1)
6. Remove the first place on the left of a field:
Update tableName set [Title] = Right ([Title], (len ([Title])-1) where Title like 'village %'
7. Remove the first digit on the right of a field:
Update tableName set [Title] = left ([Title], (len ([Title])-1) where Title like '% cune'
8. The redundant duplicate records (multiple fields) in the table are deleted without the records with the smallest rowid.
Update vitae set ispass =-1
Where peopleId in (select peopleId from vitae group by peopleId

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

How to delete duplicate data using SQL statements

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

How to delete duplicate data using SQL statements

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support