Business requirements
Recently made a small tool for the company to bring data from one database (data source) into another (target database). The data required to import the target database cannot be duplicated. But the situation is that the data source itself has duplicate data. So you need to clear the data source data first.
So we summarize the query and processing of the duplicate data. This is only a database-based solution. The implementation of the program is not considered.
Environment is: SQL Server 2008
Database-based Solutions
Database Test table dbo. Member
First, single-row repetition, a group query method with having condition
(1) Querying a column for duplicate records
Statement:
SELECT Name from dbo. Member T WHERE name in (the SELECT name from dbo. Member GROUP by Name has COUNT (Name) >1) ORDER by T.name
Query Result:
(2) Querying a column for records that are not duplicates
Statement:
SELECT * FROM dbo. Member WHERE ID in (the SELECT MIN (ID) from dbo. Member GROUP by Name)
Query Result:
(3) Clear a column of duplicate data
Statement:
DELETE from dbo. Member WHERE ID not in (the SELECT MIN (ID) from dbo. Member GROUP by Name)
Execution Result:
Explanation: The above example only holds the minimum value for the respective name.
Second, the use of DISTINCT
Warm reminder:
Multi-column statistics not supported
Oracle and DB2 databases are also available
Use the DISTINCT keyword to return a unique different value
(1) Querying a column for non-repeating data
Statement:
SELECT DISTINCT Name from dbo. Member
Result set:
(2) DISTINCT query multiple columns are not duplicated (if any one of the columns of the query is not duplicated, this record is considered not to be duplicated)
Statement:
SELECT DISTINCT name,uid from dbo. Member
Query results
DISTINCT for Statistics
Statement
SELECT COUNT (DISTINCT (Name)) from dbo. Member
Second, multi-column repetition
Data table structure
Find original_id and match_id rows with two columns of values repeating
SQL statements
SELECT m.* from dbo. Match m, (SELECT original_id,match_idfrom dbo. Match GROUP by Original_id,match_idhaving COUNT (1) >1) as M1where m.original_id=m1. ORIGINAL_ID and M.MATCH_ID=M1. match_id
Query results
SQL Server One or more columns of duplicate data queries, deleting