No matter what database I use, duplicate data exists. Next I will introduce in detail various methods and Performance Analysis for deleting duplicate records in Oracle tables, for more information, see.
Method:
The Code is as follows: |
Copy code |
Group by XX having count (*)> 1, rowid, distinct, temporary table, procedure |
The following statement can query the data that is duplicated:
The Code is as follows: |
Copy code |
Select Field 1, Field 2, count (*) from table name group by field 1, Field 2 having count (*)> 1 |
Change the ">" number above to "=" to query the non-duplicated data.
To delete the duplicate data, use the following statement.
The Code is as follows: |
Copy code |
Delete from table name a where field 1, Field 2 in (Select Field 1, Field 2, count (*) from table name group by field 1, Field 2 having count (*)> 1) |
Example
1. query duplicate data in the table
A. Repeat a field
B. Repeat Multiple Fields
C. Repeat the entire line
Create a test table:
The Code is as follows: |
Copy code |
Create table cfa (businessid number, customer varchar2 (50), branchcode varchar2 (10), data_date varchar2 (10 )); Insert into cfa values (1, 'Albert ', 'scb', '2017-11-11 '); Insert into cfa values (2, 'andy ', 'db', '2017-11-12 '); Insert into cfa values (3, 'allen ', 'HSBC', '2017-11-13 '); --------------- The following are duplicate data ---------------------------------------------- Insert into cfa values (1, 'Alex ', 'icbc', '2017-11-14 '); Insert into cfa values (1, 'Albert ', 'ctbk', '2017-11-15 '); Insert into cfa values (1, 'Albert ', 'scb', '2017-11-11 '); |
In case of a, only businessid is repeated.
The Code is as follows: |
Copy code |
Select * from cfa where businessid in (select businessid from cfa group by businessid having count (businessid)> 1 ); |
In the case of B, both businessid and name are repeated.
The Code is as follows: |
Copy code |
Select * from cfa where (businessid, customer) in (select businessid, customer from cfa group by businessid, customer having count (*)> 1 ); |
For c, repeat the entire line.
Refer to Method B:
The Code is as follows: |
Copy code |
Select * from cfa where (businessid, customer, branchcode, data_date) in (select * from cfa group by businessid, customer, branchcode, data_date having count (*)> 1 ); |
2. Delete duplicate data in the table
In case a, the redundant duplicate records in the table are deleted. The duplicate records are determined based on a single field (businessid), leaving only the records with the smallest rowid
You can also keep the rowid instead of the minimum record. You need to change the value of min in the code to max.
The Code is as follows: |
Copy code |
Delete from cfa Where businessid in (select businessid From cfa Group by businessid Having count (businessid)> 1) And rowid not in (select min (rowid) From cfa Group by businessid Having count (businessid)> 1 ); |
Alternatively, use the following simple and efficient statements:
The Code is as follows: |
Copy code |
Delete from cfa t WHERE t. ROWID> (Select min (X. ROWID) FROM cfa x where x. businessid = t. businessid ); |
B. Delete the redundant record (multiple fields) in the table and only keep the records with the smallest rowid.
The Code is as follows: |
Copy code |
Delete from cfa Where (businessid, customer) in (select businessid, customer From cfa Group by businessid, customer Having count (*)> 1) And rowid not in (select min (rowid) From cfa Group by businessid, customer Having count (*)> 1 ); |
Alternatively, use the following simple and efficient statements:
The Code is as follows: |
Copy code |
Delete from cfa t WHERE t. ROWID> (select min (X. ROWID) FROM cfa X Where x. businessid = t. businessid And x. customer = t. customer ); |
The preceding statement is very simple, that is, to delete the queried data. However, the deletion execution efficiency is very low, and the database may be suspended for a large amount of data. Therefore, we recommend that you first Insert the queried duplicate data into a temporary table and then delete it. In this way, you do not need to perform another query When deleting the data. As follows:
Create table temporary TABLE
The Code is as follows: |
Copy code |
(Select Field 1, Field 2, count (*) from table name group by field 1, Field 2 having count (*)> 1)
|
The above statement creates a temporary table and inserts the queried data into it.
You can perform the following deletion operations:
The Code is as follows: |
Copy code |
Delete from table name a where field 1, Field 2 in (select Field 1, Field 2 from temporary table );
|
It is much more efficient to create a temporary table and then delete it.
C. This is simple. Use the temporary table method.
The Code is as follows: |
Copy code |
Create table cfabak as select distinct * from cfa; Truncate table cfa; -- it is best to back up the table for Production Insert into cfa select * from cfabak; Commit; |
We want to delete the duplicate data and keep only the latest one. We can write it like this:
The Code is as follows: |
Copy code |
Delete from table name Where a. rowid! = ( Select max (B. rowid) from table name B Where a. Field 1 = B. Field 1 and A. Field 2 = B. Field 2 ) |