How to delete duplicate rows in an SQL Server table

Source: Internet
Author: User

Original article: http://support.microsoft.com/kb/139444/
Summary
The Microsoft SQL Server table should not contain duplicate rows and non-unique primary keys. For the sake of conciseness, we sometimes refer to the primary key as "key" or "Pk" in this article, but this always indicates "primary key ". Repeated PK violates entity integrity and is not allowed in the relational system. SQL Server has various mechanisms to enforce entity integrity, including indexes, unique constraints, primary key constraints, and triggers.

Even so, duplicate primary keys may occur in some cases; otherwise, duplicate primary keys must be cleared. One of the scenarios where duplicate primary keys exist is that duplicate primary keys exist in non-relational data outside SQL Server. When importing the data, the primary keys are not forcibly unique. Another case where duplicate primary keys occur is due to a database design error. If entity integrity is not enforced for each table.

Duplicate primary keys are usually found when you try to create a unique index, because if a duplicate key is found, the creation of a unique index is aborted and the following message is displayed:
MSG 1505, level 16, state 1 create unique index aborted on duplicate key.
If you are using SQL Server 2000 or SQL Server 2005, you will receive the following error message:
MSG 1505, level 16, state 1 create unique index terminated because a duplicate key was found for object name' %. * ls 'and index name' %. * ls '. the duplicate key value is % ls.
This article describes how to find and delete duplicate primary keys in a table. However, you should carefully check the process with duplicate keys to avoid repeated occurrences.

More information
In this example, we will use the following table, which has duplicate PK values. In this table, the primary key is two columns (col1 and col2 ). We cannot create a unique index or primary key constraint because the two rows have duplicate primary keys. This process demonstrates how to identify and delete duplicate primary keys.
Create Table T1 (col1 int, col2 int, col3 char (50 ))
Insert into T1 values (1, 1, 'Data value one ')
Insert into T1 values (1, 1, 'Data value one ')
Insert into T1 values (1, 2, 'Data value two ')

The first step is to identify which rows have duplicate primary key values:
Select col1, col2, count (*)
From T1
Group by col1, col2
Having count (*)> 1

This will return a row for the duplicate PK values in each group in the table. The last column in this result is the number of duplicates of a specific pK value.

Col1 col2
1 1 2

If there are only several repeated PK values, the best way is to manually delete them one by one. For example:
Set rowcount 1
Delete from T1
Where col1 = 1 and col2 = 1

The rowcount value must be the number of duplicates of the given key value minus 1. In this example, there are two duplicate primary keys, so rowcount is set to 1. The value of col1/col2 comes from the group by query result above. If the group by query returns multiple rows, the "set rowcount" query must run once for each row of these rows. When you run this query, set rowcount to the number of duplicates of a specific pK value minus 1.

Before deleting a row, you should verify whether the whole row is repeated. Although the whole row is unlikely to be repeated, The pK value may be repeated, but the whole row does not. For example, a table that uses the ID card number as the primary key has two persons (I .e., rows) with the same number, but each person has a unique attribute. In this case, any problem that causes duplicate keys may also cause a valid unique data in the row. Before deleting the data, copy the data and save it for research and appropriate adjustments.

If multiple groups of duplicate PK Values exist in the table, it takes a lot of time to delete them one by one. In this case, you can use the following method:

1. First, run the group by query above to determine the number of repeated PK values and the number of duplicates in each group.

2. Select duplicate key values for the temporary table. For example:
Select col1, col2, col3 = count (*)
Into holdkey
From T1
Group by col1, col2
Having count (*)> 1

3. Select duplicate rows in the temporary table to clear duplicate values in the process. For example:
Select distinct T1 .*
Into holddups
From T1, holdkey
Where t1.col1 = holdkey. col1
And t1.col2 = holdkey. col2

4. At this time, the holddups table should have a unique PK. However, if T1 has a duplicate PK and the row is unique (in the SSN example above), this is not the case. Verify that each key in holddups is unique and that the key is unique but the row is unique. In this case, you must stop there to determine which rows you want to retain for a given duplicate key value. For example, the following query:
Select col1, col2, count (*)
From holddups
Group by col1, col2

1 is returned for each row. If the result is 1, continue with Step 5. If it is not 1, the key is repeated and the row is unique, and you need to decide which rows to save. In general, this will discard the row or create a unique key value for this row. Perform one of the two steps for each of the repeated pk in the holddups table.

5. Delete duplicate rows from the original table. For example:
Delete T1
From T1, holdkey
Where t1.col1 = holdkey. col1
And t1.col2 = holdkey. col2

6. Place the unique row in the original table. For example:
Insert T1 select * From holddups

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.