How does the PostgreSQL database remove duplicate data from a single table? You can use ctid. The following is the experiment process.
I. Create a test table
The code is as follows: |
Copy code |
David = # create table emp ( David (# id int, David (# name varchar ); CREATE TABLE David = # |
2. Insert test data
The code is as follows: |
Copy code |
David = # insert into emp values (1, 'David '); INSERT 0 1 David = # insert into emp values (1, 'David '); INSERT 0 1 David = # insert into emp values (1, 'David '); INSERT 0 1 David = # insert into emp values (2, 'Sandy '); INSERT 0 1 David = # insert into emp values (2, 'Sandy '); INSERT 0 1 David = # insert into emp values (3, 'Renee '); INSERT 0 1 David = # insert into emp values (4, 'Jack '); INSERT 0 1 David = # insert into emp values (5, 'rose '); INSERT 0 1 David = #
|
III. Query initialization data
The code is as follows: |
Copy code |
David = # select ctid, * from emp; Ctid | id | name ------- + ---- + ------- (0, 1) | 1 | david (0, 2) | 1 | david (0, 3) | 1 | david (0, 4) | 2 | sandy (0, 5) | 2 | sandy (0, 6) | 3 | renee (0, 7) | 4 | jack (0, 8) | 5 | rose (8 rows)
David = # |
Query repeated data count
The code is as follows: |
Copy code |
David = # select distinct id, count (*) from emp group by id having count (*)> 1; Id | count ---- + ------- 1 | 3 2 | 2 (2 rows)
David = # |
Three records with id 1 and two records with id 2 are found.
4. Query the data to be retained
Take min (ctid) or max (ctid) as the standard.
The code is as follows: |
Copy code |
David = # select ctid, * from emp where ctid in (select min (ctid) from emp group by id ); Ctid | id | name ------- + ---- + ------- (0, 1) | 1 | david (0, 4) | 2 | sandy (0, 6) | 3 | renee (0, 7) | 4 | jack (0, 8) | 5 | rose (5 rows)
David = # |
5. Delete duplicate data
The code is as follows: |
Copy code |
David = # delete from emp where ctid not in (select min (ctid) from emp group by id ); DELETE 3 David = # |
6. View the final result
The code is as follows: |
Copy code |
David = # select ctid, * from emp; Ctid | id | name ------- + ---- + ------- (0, 1) | 1 | david (0, 4) | 2 | sandy (0, 6) | 3 | renee (0, 7) | 4 | jack (0, 8) | 5 | rose (5 rows)
|
David = # Note: If the table already has a unique sequence primary key value, you can replace the value with ctid and delete it directly.