How can I delete duplicate records in a database?

Source: Internet
Author: User

I. Additional topic

Today, we were so sad that when we made datapatch for user data, we imported one more copy of the data each month, and we were shocked by the cold sweat... this is a product environment. If the boss knows it, it will be dead. Remove the duplicate records and write the following article for future use.


Ii. Preparation

1. First create a student table student:

create table student(       id varchar(10) not null,       name varchar(10) not null,       age number not null);

2. Insert several data entries to the student table:

insert into student values('1', 'zhangs', 20);insert into student values('1', 'zhangs', 20);insert into student values('2', 'zhangs', 20);insert into student values('3', 'lisi', 20);insert into student values('4', 'lisi', 30);insert into student values('5', 'wangwu', 30);



Iii. Processing

1. Use rowid

① Query:

select *  from student s1 where rowid != (select max(rowid)                   from student s2                  where s1.id = s2.id                    and s1.name = s2.name                    and s1.age = s2.age)

Note: rowid is the id that uniquely identifies the physical location of the record, and the largest rowid in the brackets is queried.

② Delete:

delete from student s1 where rowid != (select max(rowid)                   from student s2                  where s1.id = s2.id                    and s1.name = s2.name                    and s1.age = s2.age)

2. Use group by and having

① Query:

select id, name, age, count(*)  from student group by id, name, agehaving count(*) > 1;

② Delete:

delete from student where rowid in (select min(rowid)                   from student                  group by id, name, age                 having count(*) > 1)

3. Use distinct

Create table stud_temp as select distinct * from student; -- create a temporary table stud_temptruncate table student; -- clear the student table insert into student select * from stud_temp; -- import temporary table data to student table drop table stud_temp; -- delete temporary table

Note: distinct is only applicable to small tables. For tables with tens of millions of data, use rowid because it is unique and more efficient.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.