Oracle Learning----Removing duplicate data from a table

Source: Internet
Author: User

Duplicate data can have two cases, the first: only some fields in the table, the second: Two rows of records are identical.
First, the deletion of duplicate data for partial fields
Let's start by talking about how to query for duplicate data.
The following statements can query for duplicates of those data:
Select field 1, Field 2,count (*) from table name Group By field 1, Field 2 having count (*) > 1
Change the above > number to the = number to find out the data that is not duplicated.
To delete these duplicated data, you can delete them using the following statement
Delete from table name a where field 1, Field 2 in
(select field 1, Field 2,count (*) from table name Group By field 1, Field 2 having count (*) > 1)
The above statement is very simple, which is to delete the queried data. However, this removal execution is inefficient and may hang the database for large data volumes. So I recommend inserting the duplicate data that you query into a temporary table and then deleting it so that you don't have to do it again when you delete it. As follows:
CREATE Table Temp Table as
(select field 1, Field 2,count (*) from table name Group By field 1, Field 2 having count (*) > 1)
The above phrase is to create a temporary table and insert the queried data into it.
Here's how to do this:
Delete from table name a where field 1, Field 2 in (Select field 1, Field 2 from temp table);
It is much more efficient to delete a pre-built temporary table than to delete it directly with a single statement.

At this time, people may jump out and say, what? You asked us to execute this statement, and that's not to delete all the duplicates? And we want to keep the latest record in duplicate data! Let's not worry, let me tell you how to do this.
In Oracle, there is a hidden automatic rowid, which gives each record a unique rowid, if we want to keep the latest record,
We can use this field to keep the largest record of ROWID in duplicate data.
Here is an example of querying for duplicate data:
Select a.rowid,a.* from table name a
where A.rowid! =
(
Select Max (b.rowid) from table name B
Where a. Field 1 = B. Field 1 and
A. Field 2 = B. Field 2
)
Let me explain below that the statement in parentheses above is the largest record in the rowid of duplicate data.
And the outside is to query out except ROWID the largest other than the duplication of data.
Thus, we want to delete the duplicate data and keep only the latest data, so we can write:
Delete from table name a
where A.rowid! =
(
Select Max (b.rowid) from table name B
Where a. Field 1 = B. Field 1 and
A. Field 2 = B. Field 2
)

By the way, the execution efficiency of the above statement is very low, you can consider the establishment of a temporary table, the need to judge the duplicate fields, rowID inserted in the temporary table, and then deleted when the comparison.
CREATE TABLE Temp table as
Select a. field 1,a. Field 2,max (A.rowid) dataid from official Table a GROUP by a. Field 1,a. field 2;
Delete from table name a
where A.rowid! =
(
Select B.dataid from temp table B
Where a. Field 1 = B. Field 1 and
A. Field 2 = B. Field 2
);
Commit

Second, for a complete duplicate record deletion
For cases where two rows of records are exactly the same in a table, you can use the following statement to get the records after the duplicate data is removed:
SELECT DISTINCT * FROM table name
You can put the records of the query into a temporary table, and then delete the original table records, and finally the data from the temporary table is returned to the original table. As follows:
CREATE Table Temp table AS (SELECT DISTINCT * from table name);
TRUNCATE table formal tables;
Insert into formal table (SELECT * from temp table);
drop table temporary tables;

If you want to delete duplicate data for a table, you can first create a temporary table, import the data that is removed from the duplicate data into a temporary table, and then
The staging table imports the data into the formal table, as follows:
INSERT into T_table_bak
SELECT DISTINCT * from t_table;

Oracle Learning----Removing duplicate data from a table

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.