oracle| Design | data | Repeat we may have this situation, a table originally poorly designed, resulting in the table data data duplication, then, how to delete the duplicate data?
Duplicate data may have two scenarios, the first one being only some of the fields in the table, and the second is exactly the same as two rows of records.
One, for partial field duplicate data deletion
Let's talk about how to query for duplicate data.
The following statement can query that the data is duplicated:
Select field 1, Field 2,count (*) from table name Group By field 1, Field 2 having count (*) > 1
Change the above > number to = number to query for no duplicate data.
To delete these duplicate data, you can use the following statement to delete
Delete from table name a where field 1, Field 2 in (Select field 1, Field 2,count (*) from table name Group By field 1, Field 2 having count (*) > 1)
The above statement is very simple, that is, the query to delete the data. However, this deletion is very inefficient and may hang the database for large amounts of data. So I suggest that you insert the duplicate data from the query into a temporary table, and then delete it, so that you don't have to do a query again when you perform the deletion. As follows:
CREATE table temporary table as (select field 1, Field 2,count (*) from table name Group By field 1, Field 2 having count (*) > 1)
The above sentence is to create a temporary table, and the query to insert the data.
The following can be done with this delete operation:
Delete from table name a where field 1, Field 2 in (Select field 1, Field 2 from temporary table);
This is a much more efficient way to delete the first temporary table than to remove it directly with one statement.
At this time, everyone may jump out and say, what? You told us to execute this statement, and that's not to delete all the duplicates? And we want to keep the latest record in the duplicate data! Don't worry, let me tell you how to do this.
In Oracle, there's a hidden automatic rowid that gives each record a single rowid, and if we want to keep the latest record,
We can use this field to keep the largest record of ROWID in duplicate data.
Here is an example of querying for duplicate data:
Select a.rowid,a.* from table name a where A.rowid!= ( select Max (b.rowid) from table name B where a. Field 1 = B. Field 1 and a. Field 2 = B. Field 2 )
Let me just explain that the statement in parentheses above is the largest record of ROWID in the duplicate data.
And the outside is to query out other than ROWID the largest number of other duplicate data.
So, we're going to delete the duplicate data and just keep the latest one, so we can write this:
Delete from table name a where A.rowid!= ( select Max (b.rowid) from table name B where a. Field 1 = B. Field 1 and a. Field 2 = B. field 2 )
Casually speaking, the execution efficiency of the above statement is very low, you can consider the establishment of temporary tables, say need to judge the duplicate fields, rowID inserted in the temporary table, and then delete in the comparison.
CREATE table temporary table as select a. field 1,a. Field 2,max (A.rowid) Dataid from official form a GROUP by a. Field 1,a. field 2; Delete from table name a where A.rowid!= ( select B.dataid from temporary table B where a. Field 1 = B. Field 1 and a. Field 2 = B. Field 2< c11/>); Commit
Ii. deletion of full duplicate records
For two rows in a table that are identical, you can get the record after you remove the duplicate data by using the following statement:
SELECT DISTINCT * FROM table name
You can place the records of the query in a temporary table, then delete the original table records, and finally return the data from the temporary table back to the original table. As follows:
CREATE table temporary table as (SELECT DISTINCT * from table name); drop table formal table; Insert into formal form (SELECT * from temporary table); drop table temporary table;
If you want to delete duplicate data for a table, you can first create a temporary table, import data from the duplicate data into a temporary table, and then import the data from the temporary table into the formal table as follows:
INSERT into T_table_bakselect distinct * from t_table;