Deduplication is generally used to locate one of the duplicate data and remove other duplicate values with a unique condition.
OracleThere are many de-duplication methods, which are commonly used based on rowid.
How to remove duplicate data from a single table in the PostgreSQL databaseWhat about it? You can use ctid. The following is the experiment process.
I. Create a test table
David=#Create TableEMP (David (# IDInt, David (# NameVarchar);Create TableDavid=#
2. Insert Test Data
David = # Insert Into EMP Values ( 1 , ' David ' ); Insert 0 1 David = # Insert Into EMP Values ( 1 , ' David ' ); Insert 0 1 David = # Insert Into EMP Values ( 1 , ' David ' ); Insert 0 1 David = # Insert Into EMP Values (2 , ' Sandy ' ); Insert 0 1 David = # Insert Into EMP Values ( 2 , ' Sandy ' ); Insert 0 1 David = # Insert Into EMP Values ( 3 , ' Renee ' ); Insert 0 1 David = # Insert Into EMP Values ( 4 , ' Jack ' ); Insert 0 1 David = # Insert Into EMP Values ( 5 , ' Rose ' ); Insert 0 1 David = #
Iii. query initialization data
David = # Select Ctid, * From EMP; ctid | ID | Name -- ----- + ---- + ------- ( 0 ,1 ) | 1 | David ( 0 , 2 ) | 1 | David ( 0 , 3 ) | 1 | David ( 0 , 4 ) | 2 | Sandy ( 0 , 5 ) | 2 | Sandy ( 0 , 6 ) | 3 | Renee ( 0 , 7 ) | 4 | Jack ( 0 , 8 )| 5 | Rose ( 8 Rows) David = #
Query repeated data count
David = # Select Distinct ID, Count ( * ) From EMP Group By ID Having Count ( * ) > 1 ; ID | Count -- -- + ------- 1 | 3 2 | 2 ( 2 Rows) David = #
Three records with ID 1 and two records with ID 2 are found.
4. query the data to be retained
Take Min (ctid) or max (ctid) as the standard.
David = # Select Ctid, * From EMP Where CtidIn ( Select Min (Ctid) From EMP Group By ID); ctid | ID | Name -- ----- + ---- + ------- ( 0 , 1 ) | 1 | David ( 0 , 4 ) | 2 | Sandy ( 0 , 6 ) | 3 | Renee ( 0 , 7 ) | 4 | Jack ( 0 , 8 ) | 5 | Rose ( 5 Rows) David = #
5. Delete duplicate data
David=#Delete FromEMPWhereCtidNot In(Select Min(Ctid)FromEMPGroup ByID );Delete 3David=#
6. view the final result
David = # Select Ctid, * From EMP; ctid | ID | Name -- ----- + ---- + ------- ( 0 , 1 ) | 1 | David ( 0 , 4 ) | 2 | Sandy ( 0 , 6 ) | 3 | Renee ( 0 , 7 ) | 4 | Jack ( 0 , 8 ) | 5 | Rose ( 5 Rows) David = #
Note: If the table already has a unique sequence primary key value, you can replace this value with the above ctid and delete it directly.
7. Other Methods
You can also use the following SQL statement to delete duplicate data.
David = # Delete from EMP adavid - # where . ctid David - # (David (# select min (B. ctid) from EMP bdavid (# where . ID = B. iddavid (#); Delete 3 David =#
Note: This method is highly efficient when the table data volume is large.