PostgreSQL deletes duplicate data rows in a table

Source: Internet
Author: User

Take PostgreSQL 9.2 official document as an example:

Create table weather (
City varchar (80 ),
Temp_lo int, -- low temperature
Temp_hi int, -- high temperature
Prcp real, -- precipitation
Date
);

Insert into weather VALUES
('San Francisco ', 46, 50, 0.25, '2017-11-27 '),
('San Francisco ', 43, 57, 0, '2017-11-29 '),
('Hayward ', 37, 54, NULL, '2017-11-29 '),
('Hayward ', 37, 54, NULL, '2017-11-29'); --- duplicated row

Here are the three methods:

Method 1: Replacement

-- Remove duplicate rows and store them to the new table weather_temp.
Select distinct city, temp_lo, temp_hi, prcp, date
INTO weather_temp
FROM weather;
-- Delete the original table
Drop table weather;
-- Rename the new table as weather
Alter table weather_temp rename to weather;

Or

-- Create a weather_temp table that is the same as weather.
Create table weather_temp (LIKE weather including constraints );
-- Fill in weather_temp with data that removes duplicate rows
Insert into weather_temp select distinct * FROM weather;
-- Delete the original table
Drop table weather;
-- Rename it weather.
Alter table weather_temp rename to weather;

Easy to understand, there are a lot of destructive operations such as DROP, and when the data volume is large, it takes a lot of time and space. Not recommended.

Method 2: Add a field

-- Add a new field with the serial type
Alter table weather add column id SERIAL;
-- Delete duplicate rows
Delete from weather WHERE id
Not in (
SELECT max (id)
FROM weather
Group by city, temp_lo, temp_hi, prcp, date
);
-- Delete the added Field
Alter table weather drop column id;

You need to add a field. "I don't know how Postgres handles adding fields. Do you want to append fields directly to the original table or copy the original table to form a new table ?」, If it is an append to the original table, it may be caused by the addition of new fields (generally block: 8 k). If it is a copy, it is a sin. Not good.

Third: System field [view System Columns]

Delete from weather
WHERE ctid
Not in (
SELECT max (ctid)
FROM weather
Group by city, temp_lo, temp_hi, prcp, date
);

Highly targeted [exclusive to ipvs], but simple.

PostgreSQL details: click here
PostgreSQL: click here

Recommended reading:

How to delete duplicate records in PostgreSQL

Configuration and installation of LAPP (Linux + Apache + PostgreSQL + PHP) Environment in Ubuntu

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.