Detailed method for SQL delete duplicate data _mssql

Source: Internet
Author: User
Tags getdate

I. Deletion of fully duplicated records

Data that is completely duplicated, usually because the primary key/Unique key constraint is not set.
Test data:

Copy Code code as follows:

If object_id (' Duplicate_all ') is not null
drop table Duplicate_all
Go
CREATE TABLE Duplicate_all
(
C1 int,
C2 int,
C3 varchar (100)
)
Go
INSERT INTO Duplicate_all
Select 1,100, ' AAA ' UNION ALL
Select 1,100, ' AAA ' UNION ALL
Select 1,100, ' AAA ' UNION ALL
Select 1,100, ' AAA ' UNION ALL
Select 1,100, ' AAA ' UNION ALL
Select 2,200, ' BBB ' UNION ALL
Select 3,300, ' CCC ' UNION ALL
Select 4,400, ' DDD ' UNION ALL
Select 5,500, ' Eee '
Go

(1) The use of temporary tables

Use distinct to get a single record, delete the source data, and then lead back the record.
If the table is not large, you can export all the records once, and then truncate the table back, so that you can avoid the log operation of Delete.

Copy Code code as follows:

If object_id (' tempdb.. #tmp ') is not null
drop table #tmp
Go
SELECT DISTINCT * into #tmp
From Duplicate_all
where C1 = 1
Go
Delete Duplicate_all where C1 = 1
Go
INSERT INTO Duplicate_all
SELECT * FROM #tmp

(2) using Row_number
Copy Code code as follows:

With TMP
As
(
Select *,row_number () over (PARTITION by c1,c2,c3 Order by (GETDATE ()) as Num
From Duplicate_all
where C1 = 1
)
Delete tmp where num > 1

If multiple tables have completely duplicate rows, consider combining multiple tables with a union and inserting them into a new, same-structured table, and SQL Server helps remove duplicate rows between tables and tables.

Two. Delete part of duplicate records

Partial columns duplicate data, usually with a primary key on the table, possibly because the program logic causes multiple rows of data column values to be duplicated.
Test data:

Copy Code code as follows:

If object_id (' Duplicate_col ') is not null
drop table Duplicate_col
Go
CREATE TABLE Duplicate_col
(
C1 int PRIMARY KEY,
C2 int,
C3 varchar (100)
)
Go
INSERT INTO Duplicate_col
Select 1,100, ' AAA ' UNION ALL
Select 2,100, ' AAA ' UNION ALL
Select 3,100, ' AAA ' UNION ALL
Select 4,100, ' AAA ' UNION ALL
Select 5,500, ' Eee '
Go

(1) Unique index

Unique indexes have an option to ignore duplicate builds, and you can use this index option when creating a PRIMARY KEY constraint/Unique key constraint.

Copy Code code as follows:

If OBJECT_ID (' tmp ') is not null
DROP TABLE tmp
Go
CREATE TABLE tmp
(
C1 int,
C2 int,
C3 varchar (100),
Constraint uq_01 Unique (C2,C3) with (Ignore_dup_key = ON)
)
Go
INSERT INTO TMP
SELECT * FROM Duplicate_col
SELECT * FROM TMP

(2) Using primary key/unique key to delete
The maximum/minimum retention of PRIMARY key/unique keys is usually selected, and other rows are deleted. The following only retains the smallest C1 row in the duplicate record.
Copy Code code as follows:

Delete from Duplicate_col
where exists (select 1 from duplicate_col b where Duplicate_col.c1 > B.c1 and (duplicate_col.c2 = B.c2 and Duplicate_col . c3 = B.c3))

--or
Copy Code code as follows:

Delete from Duplicate_col
where C1 not in (select Min. (c1) from Duplicate_col Group by C2,C3)

If you want to keep the nth row in the duplicate record, refer to 05 for a few lines in the group.
(3) Row_number
It's basically the same as deleting a full duplicate record.
Copy Code code as follows:

With TMP
As
(
Select *,row_number () over (PARTITION by c2,c3 Order by (GETDATE ()) as Num
From Duplicate_col
)
Delete tmp where num > 1
SELECT * FROM Duplicate_col


SQL Delete duplicate data only keep one (the following code, many netizens feedback error, everyone more testing)

With SQL statements, delete duplicates and keep only one
In thousands of records, there are some of the same records, how can you use SQL statements to delete duplicates?
1, look for redundant records in the table, duplicate records are based on a single field (Peopleid) to judge
SELECT * from People
where Peopleid in (select Peopleid from People GROUP by Peopleid has count (Peopleid) > 1)
2, delete redundant records in the table, duplicate records are based on a single field (Peopleid) to judge, leaving only rowid minimal records
Delete from people
where Peoplename in (select Peoplename from People GROUP by Peoplename has count (peoplename) > 1)
and Peopleid not in (select min (peopleid) from people GROUP by Peoplename have Count (peoplename) >1)
3. Find redundant records in the table (multiple fields)
SELECT * FROM Vitae a
where (A.PEOPLEID,A.SEQ) in (select Peopleid,seq from Vitae GROUP by PEOPLEID,SEQ have count (*) > 1)
4, delete redundant records in the table (multiple fields), leaving only the smallest ROWID records
Delete from Vitae a
where (A.PEOPLEID,A.SEQ) in (select Peopleid,seq from Vitae GROUP by PEOPLEID,SEQ have count (*) > 1)
and rowID not in (select min (rowid) from Vitae GROUP by PEOPLEID,SEQ have Count (*) >1)
5, look for redundant records in the table (multiple fields), does not contain the smallest ROWID records
SELECT * FROM Vitae a
where (A.PEOPLEID,A.SEQ) in (select Peopleid,seq from Vitae GROUP by PEOPLEID,SEQ have count (*) > 1)
and rowID not in (select min (rowid) from Vitae GROUP by PEOPLEID,SEQ have Count (*) >1)
6. Remove the first digit to the left of a field:
Update tablename Set [Title]=right [title], (Len ([title]-1)) where Title like ' Village% '
7. Remove the first digit to the right of a field:
Update tablename Set [Title]=left [title], (Len ([title]-1)) where Title like '% Village '
8. Delete Redundant records (multiple fields) in a table without ROWID minimum records
Update vitae set Ispass=-1
where Peopleid in (select Peopleid to Vitae GROUP by Peopleid

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.