The data de-duplication technology can provide more backup capacity, achieve longer data retention, maintain continuous verification of backup data, improve the level of data recovery service, and facilitate data disaster tolerance. Duplicate data may have two scenarios, the first one being only some of the fields in the table, and the second is exactly the same as two rows of records. The Oracle Database De-duplication technology has the following advantages: greater backup capacity, continuous data validation, higher levels of data recovery services, and easy implementation of backup data disaster tolerance.
One, delete partial field duplicate data
Let's talk about how to query for duplicate data.
The following statement can query that the data is duplicated:
Select field 1, Field 2,count (*) from table name Group By field 1, Field 2 having count (*) > 1
Change the above > number to = number to query for no duplicate data.
To delete these duplicate data, you can delete it using the following statement:
Delete from table name a where field 1, Field 2 in
(select field 1, Field 2,count (*) from table name Group By field 1, Field 2 having count (*) > 1)
The above statement is very simple, that is, the query to delete the data. However, this deletion is very inefficient and may hang the database for large amounts of data. So I suggest that you insert the duplicate data from the query into a temporary table, and then delete it, so that you don't have to do a query again when you perform the deletion. As follows:
CREATE Table temporary table as
(select field 1, Field 2,count (*) from table name Group By field 1, Field 2 having count (*) > 1)
The above sentence is to create a temporary table, and the query to insert the data.
The following can be done with this delete operation:
Delete from table name a where field 1, Field 2 in (Select field 1, Field 2 from temporary table);
This is a much more efficient way to delete the first temporary table than to remove it directly with one statement.
At this time, everyone may jump out and say, what? You told us to execute this statement, and that's not to delete all the duplicates? And we want to keep the latest record in the duplicate data! Don't worry, let me tell you how to do this.
In Oracle, there is a hidden automatic rowid, which gives each record a unique rowid, and if we want to keep the latest record, we can use this field to keep the rowid largest record in the duplicate data.
Here is an example of querying for duplicate data:
Select a.rowid,a.* from table name a
where A.rowid!=
(
Select Max (b.rowid) from table name B
Where a. Field 1 = B. Field 1 and
A. Field 2 = B. Field 2
)
Let me just explain that the statement in parentheses above is the largest record of ROWID in the duplicate data.
And the outside is to query out other than ROWID the largest number of other duplicate data.
So, we're going to delete the duplicate data and just keep the latest one, so we can write this:
Delete from table name a
where A.rowid!=
(
Select Max (b.rowid) from table name B
Where a. Field 1 = B. Field 1 and
A. Field 2 = B. Field 2
)
See more highlights of this column: http://www.bianceng.cnhttp://www.bianceng.cn/database/Oracle/