I. Additional topic
Today, we were so sad that when we made datapatch for user data, we imported one more copy of the data each month, and we were shocked by the cold sweat... this is a product environment. If the boss knows it, it will be dead. Remove the duplicate records and write the following article for future use.
Ii. Preparation
1. First create a student table student:
create table student( id varchar(10) not null, name varchar(10) not null, age number not null);
2. Insert several data entries to the student table:
insert into student values('1', 'zhangs', 20);insert into student values('1', 'zhangs', 20);insert into student values('2', 'zhangs', 20);insert into student values('3', 'lisi', 20);insert into student values('4', 'lisi', 30);insert into student values('5', 'wangwu', 30);
Iii. Processing
1. Use rowid
① Query:
select * from student s1 where rowid != (select max(rowid) from student s2 where s1.id = s2.id and s1.name = s2.name and s1.age = s2.age)
Note: rowid is the id that uniquely identifies the physical location of the record, and the largest rowid in the brackets is queried.
② Delete:
delete from student s1 where rowid != (select max(rowid) from student s2 where s1.id = s2.id and s1.name = s2.name and s1.age = s2.age)
2. Use group by and having
① Query:
select id, name, age, count(*) from student group by id, name, agehaving count(*) > 1;
② Delete:
delete from student where rowid in (select min(rowid) from student group by id, name, age having count(*) > 1)
3. Use distinct
Create table stud_temp as select distinct * from student; -- create a temporary table stud_temptruncate table student; -- clear the student table insert into student select * from stud_temp; -- import temporary table data to student table drop table stud_temp; -- delete temporary table
Note: distinct is only applicable to small tables. For tables with tens of millions of data, use rowid because it is unique and more efficient.