The Oracle tutorial you are looking at is: Oracle quickly deletes duplicate records. When working on a project, when a colleague guides the data, he accidentally makes all the data in a table heavier, that is to say, there is a duplicate of all the records in this list. The data for this table is tens, and it is a production system. In other words, you can't delete all the records, and you have to quickly delete the duplicate records.
In this case, we summarize the method for deleting duplicate records, and the pros and cons of each method.
For easy presentation, assume that the table name is TBL and there are three columns of col1,col2,col3 in the table, where Col1,col2 is the primary key and the col1,col2 is indexed.
1, by creating a temporary table
You can lead the data into a temporary table, and then delete the data from the original table, and then return the data back to the original table, the SQL statement is as follows:
creat table Tbl_tmp (select distinct* from TBL) TRUNCATE table tbl;//empty tables record INSERT INTO TBL SELECT * from tbl_tmp;//the data in the temporary table Plug it back in.
This approach can achieve requirements, but it is obvious that this approach is slow for a tens-logged table, which in production systems can be costly and not possible.
2. Using rowID
In Oracle, each record has a rowid,rowid that is unique across the database, ROWID determines which data files, blocks, and rows are in Oracle for each record. In duplicate records, all columns may have the same content, but ROWID will not be the same. The SQL statement is as follows:
Delete from tbl where rowID in (select A.rowid from tbl A, tbl B where A.rowid>b.rowid and a.col1=b.col1 and a.col2 = b . col2)
This SQL statement applies if you already know that there is only one duplicate of each record. But if each record has a duplicate record of N, this n is unknown, consider the following method.
3. Use Max or Min function
Here also to use ROWID, with the above different is the combination of Max or min function to achieve. The SQL statement is as follows
Delete from tbl awhere rowid (select Max (B.rowid) from TBL B where a.col1=b.col1 and a.col2 = b.col2);/here Max uses min also OK
Or use the following statement
Delete from tbl awhere rowid< (select Max (B.rowid) from TBL B where a.col1=b.col1 and a.col2 = b.col2);//Here, if you change Max to Min, , you need to change "<" to ">" in the previous where clause
With the above method of thinking is basically the same, but the use of group by, reduce the dominant comparison conditions, improve efficiency. The SQL statement is as follows:
Deletefrom tbl where ROWID (select Max (ROWID) from TBL Tgroup to T.col1, t.col2);d elete from TBL where (col1, col2) In (select Col1,col2 from Tblgroup bycol1,col2havingcount (*) >1) and Rowidnotin (Selectnin (ROWID) Fromtblgroup bycol1, Col2havingcount (*) >1)
There is also a way to compare the number of records that have duplicate records in a table and have an indexed case. Assuming that there is an index on the col1,col2 and that there are fewer records in the TBL table, the SQL statement is as follows 4, using GROUP by to increase efficiency