PostgreSQL usually finds one of the duplicate data and removes other duplicate values with a unique condition. Oracle deduplication is implemented in many ways. It is commonly used to perform deduplication Based on rowid. How does the PostgreSQL database remove duplicate data from a single table? You can use ctid. The following is the experiment process. 1. Create a test table david
PostgreSQL usually finds one of the duplicate data and removes other duplicate values with a unique condition. Oracle deduplication is implemented in many ways. It is commonly used to perform deduplication Based on rowid. How does the PostgreSQL database remove duplicate data from a single table? You can use ctid. The following is the experiment process. 1. Create a test table david
PostgreSQL deletes duplicate data
Deduplication is generally used to locate one of the duplicate data and remove other duplicate values with a unique condition.
Oracle deduplication is implemented in many ways. It is commonly used to perform deduplication Based on rowid.
How does the PostgreSQL database remove duplicate data from a single table? You can use ctid. The following is the experiment process.
I. Create a test table
David = # create table emp (
David (# id int,
David (# name varchar );
CREATE TABLE
David = #
2. Insert Test Data
David = # insert into emp values (1, 'David ');
INSERT 0 1
David = # insert into emp values (1, 'David ');
INSERT 0 1
David = # insert into emp values (1, 'David ');
INSERT 0 1
David = # insert into emp values (2, 'sandy ');
INSERT 0 1
David = # insert into emp values (2, 'sandy ');
INSERT 0 1
David = # insert into emp values (3, 'renee ');
INSERT 0 1
David = # insert into emp values (4, 'jack ');
INSERT 0 1
David = # insert into emp values (5, 'Rose ');
INSERT 0 1
David = #
Iii. query initialization data
David = # select ctid, * from emp;
Ctid | id | name
------- + ---- + -------
(0, 1) | 1 | david
(0, 2) | 1 | david
(0, 3) | 1 | david
(0, 4) | 2 | sandy
(0, 5) | 2 | sandy
(0, 6) | 3 | renee
(0, 7) | 4 | jack
(0, 8) | 5 | rose
(8 rows)
David = #
Query repeated data count
David = # select distinct id, count (*) from emp group by id having count (*)> 1;
Id | count
---- + -------
1 | 3
2 | 2
(2 rows)
David = #
Three records with id 1 and two records with id 2 are found.
4. query the data to be retained
Take min (ctid) or max (ctid) as the standard.
David = # select ctid, * from emp where ctid in (select min (ctid) from emp group by id );
Ctid | id | name
------- + ---- + -------
(0, 1) | 1 | david
(0, 4) | 2 | sandy
(0, 6) | 3 | renee
(0, 7) | 4 | jack
(0, 8) | 5 | rose
(5 rows)
David = #
5. Delete duplicate data
David = # delete from emp where ctid not in (select min (ctid) from emp group by id );
DELETE 3
David = #
6. view the final result
David = # select ctid, * from emp;
Ctid | id | name
------- + ---- + -------
(0, 1) | 1 | david
(0, 4) | 2 | sandy
(0, 6) | 3 | renee
(0, 7) | 4 | jack
(0, 8) | 5 | rose
(5 rows)
David = #
Note: If the table already has a unique sequence primary key value, you can replace this value with the above ctid and delete it directly.
7. Other Methods
You can also use the following SQL statement to delete duplicate data.
David = # delete from emp
David-# where a. ctid <>
David -#(
David (# select min (B. ctid) from emp B
David (# where a. id = B. id
David (#);
DELETE 3
David = #
Note: This deletion method is highly efficient when the table data volume is large.