If multiple groups of duplicate PK Values exist in the table, it takes a lot of time to delete them one by one. In this case, you can use the following method:
1. First, run the group by query above to determine the number of repeated PK values and the number of duplicates in each GROUP.
2. Select duplicate key values for the temporary table. For example:
SELECT col1, col2, col3=count(*) INTO holdkey FROM t1 GROUP BY col1, col2 HAVING count(*) > 1 |
3. Select duplicate rows in the temporary table to clear duplicate values in the process. For example:
SELECT DISTINCT t1.* INTO holddups FROM t1, holdkey WHERE t1.col1 = holdkey.col1 AND t1.col2 = holdkey.col2 |
4. At this time, the holddups table should have a unique PK. However, if t1 has a duplicate PK and the row is unique (in the SSN example above), this is not the case. Verify that each key in holddups is unique and that the key is unique but the row is unique. In this case, you must stop there to determine which rows you want to retain for a given duplicate key value. For example, the following query:
SELECT col1, col2, count(*) FROM holddups GROUP BY col1, col2 |
1 is returned for each row. If the result is 1, continue with Step 5. If it is not 1, the key is repeated and the row is unique, and you need to decide which rows to save. In general, this will discard the row or create a unique key value for this row. Perform one of the two steps for each of the repeated pk in the holddups table.
5. Delete duplicate rows from the original table. For example:
DELETE t1 FROM t1, holdkey WHERE t1.col1 = holdkey.col1 AND t1.col2 = holdkey.col2 |
6. Place the unique row in the original table. For example:
INSERT t1 SELECT * FROM holddups |
- Use check constraints to verify data in SQL Server
- SQL Server database concurrency Test Method
- How to optimize SQL Server