During normal development, we often encounter repeated data in data tables. How can this problem be solved? Here we will introduce the data deduplication methods in two cases: 1. Completely duplicate data deduplication; 2. Partial field data deduplication. 1. Use the following SQL statement to deduplicate data in a table. CodeCREATETABL
During normal development, we often encounter repeated data in data tables. How can this problem be solved? Here we will introduce the data deduplication methods in two cases: 1. Completely duplicate data deduplication; 2. Partial field data deduplication. 1. Use the following SQL statement to deduplicate data in a table. Code CREATETABL
During normal development, we often encounter repeated data in data tables. How can this problem be solved? Here we will introduce the data deduplication methods in two cases: 1. Completely duplicate data deduplication; 2. Partial field data deduplication.
I. Completely duplicate data deduplication Method
You can use the following SQL statement to deduplicate data in a table.
Code
CREATETABLE "# temp" AS (SELECTDISTINCT * FROM Table Name); -- creates a temporary table and inserts the data after DISTINCT deduplication into the temporary table
TruncateTABLE table name; -- clear the original table data
INSERTINTO table name (SELECT * FROM "# temp"); -- inserts temporary table data into the original table
DROPTABLE "# temp"; -- delete a temporary table
The specific idea is to first create a temporary table, then insert the table data after DISTINCT into the temporary table, then clear the original table data, and then insert the data in the temporary table into the original table; finally, delete the temporary table.
Ii. Data deduplication Methods
First, find duplicate data
Select Field 1, Field 2, count (*) from table name groupby Field 1, Field 2 havingcount (*)> 1
Change the ">" number above to "=" to query the non-duplicated data.
To delete the duplicate data, use the following statement:
Deletefrom table name a where field 1, Field 2 in
(Select Field 1, Field 2, count (*) from table name groupby Field 1, Field 2 havingcount (*)> 1)
The preceding statement is very simple, that is, to delete the queried data. However, the deletion execution efficiency is very low. For large data volumes, the database may be stuck.
Based on the above situation, you can first Insert the queried duplicate data into a temporary table and then delete the data. In this way, you do not need to perform another query When deleting the data. As follows:
CREATETABLE temporary table
(Select Field 1, Field 2, count (*) from table name groupby Field 1, Field 2 havingcount (*)> 1)
You can perform the following deletion operations:
Deletefrom table name a where field 1, Field 2 in (select Field 1, Field 2 from temporary table );
Creating a temporary table before deleting it is much more efficient than simply using a statement to delete it.
The preceding statement deletes all the duplicates. in oracle, an automatic rowid is hidden, and each record is given a unique rowid, if we want to keep the latest record, we can use this field to keep the record with the largest rowid In the duplicate data.
The following is an example of querying duplicate data:
Select a. rowid, a. * from table name
Where a. rowid! =
(
Selectmax (B. rowid) from table name B
Where a. Field 1 = B. Field 1 and
A. Field 2 = B. Field 2
)
The statement in the brackets above is to query the record with the largest rowid in the repeated data. In addition, duplicate data except the maximum rowid is queried.
As a result, we need to delete the duplicate data and keep only the latest data record. Then we can write it like this:
Deletefrom table name
Where a. rowid! =
(
Selectmax (B. rowid) from table name B
Where a. Field 1 = B. Field 1 and
A. Field 2 = B. Field 2
)
Similarly, the execution efficiency of the above Code is low after all. Therefore, we can consider creating a temporary table to insert duplicate fields and rowids into the temporary table and then compare them during deletion.
Createtable temporary table
Select a. Field 1, a. Field 2, MAX (a. ROWID) dataid from formal Table a GROUPBY a. Field 1, a. Field 2;
Deletefrom table name
Where a. rowid! =
(
Select B. dataid from temporary table B
Where a. Field 1 = B. Field 1 and
A. Field 2 = B. Field 2
);
Commit;