Original: MSSQL How to delete duplicate data in a table without a primary key
In order to experiment with duplicate data, a design is not very good (no primary key) table and insert some duplicate data:
Create Database testdbuse testdb; gocreate table DUPSNOPK (Col1 int null, Col2 char (5) null); Goinsert DUPSNOPK (col1,col2) Va Lues (1, ' abc '), (2, ' abc '), (2, ' abc '), (2, ' abc '), (7, ' XYZ '), ( 7, ' XYZ ');
To verify that the table does have duplicate data, the following query uses a GROUP BY and HAVING clause to return only duplicate rows and count the replicas:
Select Col1,col2,count (*) as Dupcountfrom Dupsnopkgroup by Col1,col2having Count (*) >1;
Results:
Here is the use of windowing to delete duplicate rows:
The key to this approach is to use windowing, with the Row_number () function and the over () clause of the partition. Each new partition is renumbered. Set the Over () clause to partition by each column to check for duplicate data. In this case, each column will be checked.
To run a windowing query, first show how the method applies to the line number:
Select Col1,col2, Row_number () over (partition by col1,col2 ORDER by Col1) as RN from DUPSNOPK
Results:
Each duplicate row has a value of Rn greater than 1, so it is easier to delete a copy:
With Dupsnumberedas ( select Col1,col2, row_number () over (partition by col1,col2 ORDER by Col1) as RN from DUPSNOPK ) Delete dupsnumbered where rn>1;
Results:
After executing the above statement, test the result of deleting duplicate data with a SELECT statement: