PostgreSQL deletes duplicate data

Source: Internet
Author: User

Deduplication is generally used to locate one of the duplicate data and remove other duplicate values with a unique condition.

OracleThere are many de-duplication methods, which are commonly used based on rowid.

How to remove duplicate data from a single table in the PostgreSQL databaseWhat about it? You can use ctid. The following is the experiment process.

I. Create a test table

 
David=#Create TableEMP (David (# IDInt, David (# NameVarchar);Create TableDavid=#

2. Insert Test Data

David = # Insert   Into EMP Values ( 1 , '  David  '  );  Insert   0   1  David = # Insert   Into EMP Values ( 1 , '  David  '  );  Insert   0   1  David  = # Insert  Into EMP Values ( 1 , '  David  '  );  Insert   0   1  David  = # Insert   Into EMP Values (2 , '  Sandy  '  );  Insert   0   1  David  = # Insert   Into EMP Values ( 2 , ' Sandy  '  );  Insert   0   1  David  = # Insert   Into EMP Values ( 3 , '  Renee  '  ); Insert   0   1  David  = # Insert   Into EMP Values ( 4 , '  Jack  '  );  Insert   0  1  David  = # Insert   Into EMP Values ( 5 , '  Rose  '  );  Insert   0   1  David = #

Iii. query initialization data

David = # Select Ctid, *   From  EMP; ctid  | ID |  Name  --  ----- + ---- + ------- ( 0 ,1 ) |    1   |  David (  0 , 2 ) |    1   |  David (  0 , 3 ) |   1   |  David (  0 , 4 ) |    2   |  Sandy (  0 , 5 ) |    2   |  Sandy ( 0 , 6 ) |    3   |  Renee (  0 , 7 ) |    4   |  Jack (  0 , 8 )|    5   |  Rose (  8  Rows) David  = #

Query repeated data count

David = # Select   Distinct ID, Count ( * ) From EMP Group  By ID Having   Count ( * ) >   1  ; ID  |   Count   --  -- + -------    1   |       3    2   |      2  (  2  Rows) David  = #

Three records with ID 1 and two records with ID 2 are found.

4. query the data to be retained

Take Min (ctid) or max (ctid) as the standard.

David = # Select Ctid, *   From EMP Where CtidIn ( Select   Min (Ctid) From EMP Group   By  ID); ctid  | ID |  Name  --  ----- + ---- + ------- ( 0 , 1 ) |   1   |  David (  0 , 4 ) |    2   |  Sandy (  0 , 6 ) |    3   |  Renee ( 0 , 7 ) |    4   |  Jack (  0 , 8 ) |    5   |  Rose (  5  Rows) David  = #

5. Delete duplicate data

 
David=#Delete FromEMPWhereCtidNot In(Select Min(Ctid)FromEMPGroup ByID );Delete 3David=#

6. view the final result

David = # Select Ctid, *   From  EMP; ctid  | ID |  Name  --  ----- + ---- + ------- ( 0 , 1 ) |    1  |  David (  0 , 4 ) |    2   |  Sandy (  0 , 6 ) |    3   |  Renee (  0 , 7 ) |    4   |  Jack (  0 , 8 ) |    5   |  Rose (  5  Rows) David  = #

Note: If the table already has a unique sequence primary key value, you can replace this value with the above ctid and delete it directly.

7. Other Methods

You can also use the following SQL statement to delete duplicate data.

 David = #  Delete   from   EMP adavid  - #  where . ctid    David  - # (David (#   select   min  (B. ctid)  from   EMP bdavid (#   where . ID =  B. iddavid (#);   Delete   3   David  =#

Note: This method is highly efficient when the table data volume is large.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.