Microsoft SQL Server tables should not contain duplicate rows and non-unique primary keys. For brevity, in this article we sometimes call the primary key "key" or "PK", but this always represents "primary key". Duplicate PK violates entity integrity and is not allowed in a relational system. SQL Server has various mechanisms for enforcing entity integrity, including indexes, unique constraints, primary KEY constraints, and triggers.
However, duplicate primary keys may be present in some cases, and duplicate primary keys must be cleared if such cases occur. One of the cases where duplicate primary keys occur is that there is duplicate PK in the non relational data outside of SQL Server, and there is no enforcement of PK uniqueness when importing the data. Another scenario where duplicate primary keys occur is from database design errors, such as not enforcing entity integrity on each table.
You typically find duplicate PK when you try to create a unique index, because if you find a duplicate key, the creation of the unique index is aborted, and the following message appears:
MSG 1505, level, State 1 Create a unique index aborted on duplicate key.
If you are using SQL Server 2000 or SQL Server 2005, you receive the following error message:
MSG 1505, level, State 1 CREATE UNIQUE INDEX terminated because a duplicate key is found for object name '%.*ls ' and I Ndex name '%.*ls '. The duplicate key value is%ls.
This article discusses how to find and delete duplicate primary keys in a table. However, you should carefully examine the process of duplicate keys in order to avoid recurring occurrences.
More information
In this example, we'll use the following table, which has duplicate PK values. In this table, the primary key is two columns (col1, col2). We cannot create a unique index or PRIMARY KEY constraint because the two rows have duplicate PK. This procedure demonstrates how to identify and delete duplicate primary keys.
Copy Code code as follows:
CREATE TABLE T1 (col1 int, col2 int, col3 char (50))
INSERT into T1 VALUES (1, 1, ' data value one ')
INSERT into T1 VALUES (1, 1, ' data value one ')
INSERT into T1 VALUES (1, 2, ' data value two ')
The first step is to identify which rows have duplicate primary key values:
Copy Code code as follows:
SELECT col1, col2, COUNT (*)
from T1
GROUP by col1, col2
Having count (*) > 1
This returns a row for each set of duplicate PK values in the table. The last column in this result is the number of repetitions for a particular PK value.
Col1 col2
1 1 2
If there are only a few sets of duplicate PK values, the best method is to manually delete them individually. For example:
Copy Code code as follows:
SET ROWCOUNT 1
Delete from T1
where Col1=1 and Col2=1
The rowcount value should be the number of repetitions of the given key value minus 1. In the example, there are 2 duplicate primary keys, so the rowcount is set to 1. The Col1/col2 value comes from the above GROUP by query result. If the GROUP by query returns multiple rows, the SET ROWCOUNT query will have to run once for each row in those rows. Each time you run the query, set rowcount to the number of repetitions for a specific PK value minus 1.
Before you delete a row, you should verify that the entire row is duplicated. Although the whole row repeats are unlikely to occur, the PK value may be duplicated and the entire row will not repeat. For example, a table with the ID number as the primary key, which has two different people with the same number (right), but each has a unique attribute. In this case, any problem that causes a duplicate key may also cause a valid unique data to be placed in the row. Before you delete the data, you should copy the data and save it for research and appropriate tuning.
If you have multiple sets of duplicate PK values in a table, it can be time-consuming to delete them individually. In this case, you can use the following method: 1. First, run the group by query above to determine how many groups of duplicate PK values and the number of repetitions per group.
2. Select duplicate key values into the temporary table. For example:
Copy Code code as follows:
SELECT col1, col2, Col3=count (*)
Into Holdkey
from T1
GROUP by col1, col2
Having count (*) > 1
3. Select duplicate rows to be placed in a temporary table to clear duplicate values in the process. For example:
Copy Code code as follows:
SELECT DISTINCT t1.*
Into holddups
from T1, Holdkey
WHERE t1.col1 = holdkey.col1
and t1.col2 = Holdkey.col2
4. At this point, the Holddups table should have a unique PK, but if the T1 has duplicate PK and the row is unique (such as the SSN example above), this is not the case. Verify that the individual keys in the holddups are unique, and that there is no key to duplicate the row only. If so, you must stop there to determine which rows you want to keep for a given duplicate key value. For example, the following query:
Copy Code code as follows:
SELECT col1, col2, COUNT (*)
From Holddups
GROUP by col1, col2
Count 1 should be returned for each row. If the result is 1, proceed to step 5 below. If it is not 1, there is a situation where the key repeats and the row is unique, and you need to decide which rows to save. Typically, this will require discarding a row or creating a new unique key value for this row. Perform one of these two steps for each of these duplicate PK in the Holddups table.
5. Delete duplicate rows from the original table. For example:
Copy Code code as follows:
DELETE T1
from T1, Holdkey
WHERE t1.col1 = holdkey.col1
and t1.col2 = Holdkey.col2
6. Put the unique row back into the original table. For example:
INSERT T1 SELECT * from holddups