When writing SQL statements, you often have the following requirement: there are multiple duplicate data records in a table, how to effectively extract non-duplicate data, or delete duplicate data, or retrieve the first data repeat of a column.
The new functions now_number and CTE of SQL Server 2005 can be well implemented.
Example:
Create test data:
Code
create table Dup1
(
Col1 int null,
Col2 varchar(20) null
)
insert into Dup1 values
(1, 'aaa'),
(2, 'aaa'),
(2, 'aaa'),
(2, 'aaa'),
(3, 'bbb'),
(3, 'bbb'),
(4, 'ccc'),
(4, 'ddd'),
(5, 'eee')
select * from Dup1
Duplicate data can be viewed as follows:
SELECT Col1, Col2, COUNT(*) AS DupCount
FROM Dup1
GROUP BY Col1, Col2
HAVING COUNT(*) > 1
Next we will introduce how to delete duplicate data:
1. now_number: A good ranking function (now_number, rank, dense_rank, ntile) is added to SQL Server 2005. now_number () over (partition Gy) is the most direct and convenient method, you cannot modify tables or generate extra columns.
First, a column number is assigned, Which is sorted by col1 and col2.
SELECT Col1, Col2,
ROW_NUMBER() OVER (PARTITION BY Col1, Col2 ORDER BY Col1) AS rn
FROM Dup1
The sequence number is as follows:
Obviously, duplicate columns are grouped and sorted by group. You only need to delete the sorting sequence number> 1.
-- Use ctewith dupsd
As (
Select col1, col2,
Row_number () over (partition by col1, col2 order by col1) as Rn
From dup1
)
Delete dupsd
Where rn> 1; -- or delete a from (
Select col1, col2,
Row_number () over (partition by col1, col2 order by col1) as Rn
From dup1) A where a. Rn> 1
2. Create a table with a unique ID key.
ALTER TABLE dbo.Dup1
ADD
PK INT IDENTITY
NOT NULL
CONSTRAINT PK_Dup1 PRIMARY KEY;
SELECT *
FROM Dup1;
Delete the records that are the same as those of col1, col2 and larger than dup1.pk, that is, keep the records with the minimum pK value.
DELETE Dup1
WHERE EXISTS ( SELECT *
FROM Dup1 AS D1
WHERE D1.Col1 = Dup1.Col1
AND D1.Col2 = Dup1.Col2
AND D1.PK > Dup1.PK );
3. Select distant into. This method transfers non-duplicated result sets to the new table by using a new table.
SELECT distinct Col1, Col2 INTO NoDups
FROM Dup1;
select * from NoDups
The first and third methods are recommended. The first method is more common in programming of T-SQL, and the third method is often used in ETL.