Oracle Database deduplication

Source: Internet
Author: User

The deduplication technology can provide a larger backup capacity for longer data retention, continuous verification of backup data, improve the level of data recovery services, and facilitate data disaster tolerance. Duplicate data may be in the following two cases. In the first case, only some fields in the table are the same, and in the second case, the two rows have identical records. Oracle Database deduplication technology has the following advantages: larger backup capacity, continuous data verification, a higher level of data recovery services, and convenient Disaster Tolerance for backup data.
I. Delete repeated data from some fields
Let's talk about how to query duplicate data.
The following statement can query the data that is duplicated:
Select Field 1, Field 2, count (*) from table name group by field 1, Field 2 having count (*)> 1
Change the ">" number above to "=" to query the non-duplicated data.
To delete the duplicate data, use the following statement.
Delete from table name a where field 1, Field 2 in
(Select Field 1, Field 2, count (*) from table name group by field 1, Field 2 having count (*)> 1)
The preceding statement is very simple, that is, to delete the queried data. However, the deletion execution efficiency is very low, and the database may be suspended for a large amount of data. Therefore, we recommend that you first Insert the queried duplicate data into a temporary table and then delete it. In this way, you do not need to perform another query When deleting the data. As follows:
Create table temporary TABLE
(Select Field 1, Field 2, count (*) from table name group by field 1, Field 2 having count (*)> 1)
The above statement creates a temporary table and inserts the queried data into it.
You can perform the following deletion operations:
Delete from table name a where field 1, Field 2 in (select Field 1, Field 2 from temporary table );
It is much more efficient to create a temporary table and then delete it.
At this time, everyone may jump out and say, what? If you want to execute such a statement, isn't all the duplicates deleted? We want to keep the latest record in the duplicate data! Don't worry. Let's talk about how to perform this operation.
In oracle, an automatic rowid is hidden, which gives each record a unique rowid. If we want to keep the latest record,
We can use this field to keep the record with the largest rowid in the repeated data.
The following is an example of querying duplicate data:
 
Select a. rowid, a. * from table name
Where a. rowid! =
(
Select max (B. rowid) from table name B
Where a. Field 1 = B. Field 1 and
A. Field 2 = B. Field 2
)
 
The statement in the brackets above is used to query the largest rowid record in duplicate data.
In addition, duplicate data except the maximum rowid is queried.
As a result, we need to delete the duplicate data and keep only the latest data record. Then we can write it like this:
 
Delete from table name
Where a. rowid! =
(
Select max (B. rowid) from table name B
Where a. Field 1 = B. Field 1 and
A. Field 2 = B. Field 2
)
 
To put it simply, the execution efficiency of the preceding statement is very low. You can consider creating a temporary table. We need to judge the repeated fields and rowid to be inserted into the temporary table, and then compare them When deleting the table.
Java
Delete from t where t. rowid not in (
Select max (tt. rowid) from t tt group by tt. tname)
 
 
Create table temporary table
Select a. Field 1, a. Field 2, MAX (a. ROWID) dataid from formal table a group by a. Field 1, a. Field 2;
Delete from table name
Where a. rowid! =
(
Select B. dataid from temporary table B
Where a. Field 1 = B. Field 1 and
A. Field 2 = B. Field 2
);
Commit;
 
Ii. Completely Delete duplicate records
If the two rows in the table have identical records, you can use the following statement to obtain the records after deduplication:
Select distinct * from Table Name
You can place the queried records in a temporary table, delete the original table records, and export the data in the temporary table back to the original table. As follows:
 
Create table temporary table as (select distinct * from TABLE Name );
Truncate table formal table; -- Note: It was originally written as a drop table formal table by mistake; it has been corrected now
Insert into formal table (select * from temporary table );
Drop table temporary table;
 
If you want to delete the duplicate data of a table, you can create a temporary table first, import the data after the duplicate data is removed to the temporary table, and then import the data into the formal table from the temporary table, as follows:
 
Insert into t_table_bak
Select distinct * from t_table;



Author: linw._213

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.