Remove duplicate data in Oracle Database

Source: Internet
Author: User

During normal development, we often encounter repeated data in data tables. How can this problem be solved? Here we will introduce the data deduplication methods in two cases: 1. Completely duplicate data deduplication; 2. Partial field data deduplication.

I. Completely duplicate data deduplication Method

You can use the following SQL statement to deduplicate data in a table.

Code

CREATETABLE "# temp" AS (SELECTDISTINCT * FROM Table Name); -- creates a temporary table and inserts the data after DISTINCT deduplication into the temporary table

TruncateTABLE table name; -- clear the original table data

INSERTINTO table name (SELECT * FROM "# temp"); -- inserts temporary table data into the original table

DROPTABLE "# temp"; -- delete a temporary table

The specific idea is to first create a temporary table, then insert the table data after DISTINCT into the temporary table, then clear the original table data, and then insert the data in the temporary table into the original table; finally, delete the temporary table.

Ii. Data deduplication Methods

First, find duplicate data

Select Field 1, Field 2, count (*) from table name groupby Field 1, Field 2 havingcount (*)> 1

Change the ">" number above to "=" to query the non-duplicated data.

To delete the duplicate data, use the following statement:

Deletefrom table name a where field 1, Field 2 in

(Select Field 1, Field 2, count (*) from table name groupby Field 1, Field 2 havingcount (*)> 1)

The preceding statement is very simple, that is, to delete the queried data. However, the deletion execution efficiency is very low. For large data volumes, the database may be stuck.

Based on the above situation, you can first Insert the queried duplicate data into a temporary table and then delete the data. In this way, you do not need to perform another query When deleting the data. As follows:

CREATETABLE temporary table

(Select Field 1, Field 2, count (*) from table name groupby Field 1, Field 2 havingcount (*)> 1)

You can perform the following deletion operations:

Deletefrom table name a where field 1, Field 2 in (select Field 1, Field 2 from temporary table );

Creating a temporary table before deleting it is much more efficient than simply using a statement to delete it.

The preceding statement deletes all the duplicates. in oracle, an automatic rowid is hidden, and each record is given a unique rowid, if we want to keep the latest record, we can use this field to keep the record with the largest rowid In the duplicate data.

The following is an example of querying duplicate data:

Select a. rowid, a. * from table name

Where a. rowid! =

(

Selectmax (B. rowid) from table name B

Where a. Field 1 = B. Field 1 and

A. Field 2 = B. Field 2

)

The statement in the brackets above is to query the record with the largest rowid in the repeated data. In addition, duplicate data except the maximum rowid is queried.

As a result, we need to delete the duplicate data and keep only the latest data record. Then we can write it like this:

Deletefrom table name

Where a. rowid! =

(

Selectmax (B. rowid) from table name B

Where a. Field 1 = B. Field 1 and

A. Field 2 = B. Field 2

)

Similarly, the execution efficiency of the above Code is low after all. Therefore, we can consider creating a temporary table to insert duplicate fields and rowids into the temporary table and then compare them during deletion.

Createtable temporary table

Select a. Field 1, a. Field 2, MAX (a. ROWID) dataid from formal Table a GROUPBY a. Field 1, a. Field 2;

Deletefrom table name

Where a. rowid! =

(

Select B. dataid from temporary table B

Where a. Field 1 = B. Field 1 and

A. Field 2 = B. Field 2

);

Commit;

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.