Delete duplicate data from a table and retain the one inserted for the first time and optimize SQL statements.

Source: Internet
Author: User

Business: A table contains a lot of data (ID is the auto-incrementing primary key), and some of the data contains duplicate data.

Objective: To delete the duplicate data and retain the data inserted for the first time, and keep other data unaffected.

Problem solving process:

Step 1: Find the ID (save_id) of all data to be retained)

SELECT id as save_id  FROM yujing.alarm_event_info_snapshot aeis where aeis.event_id in       (SELECT ae.id          FROM yujing.alarm_event ae         where ae.event_uuid like 'yuanwtj_%') group by (aeis.event_id)

After optimization:

SELECT aeis.id as save_id  FROM yujing.alarm_event ae right join yujing.alarm_event_info_snapshot aeis    on aeis.event_id = ae.id where ae.event_uuid like 'yuanwtj_%' group by (aeis.event_id)

Step 2: Obtain the ID (all_id) of all related data)

SELECT aeis.id as all_id  FROM yujing.alarm_event_info_snapshot aeis where aeis.event_id in       (SELECT ae.id          FROM yujing.alarm_event ae         where ae.event_uuid like 'yuanwtj_%') order by aeis.event_id

After optimization:

SELECT aeis.id as all_id  FROM yujing.alarm_event ae right join yujing.alarm_event_info_snapshot aeis    on aeis.event_id = ae.id where ae.event_uuid like 'yuanwtj_%'

Step 3: Get the ID (del_id) of the data to be deleted)

select ad.all_id as del_id  from (SELECT aeis.id as all_id          FROM yujing.alarm_event_info_snapshot aeis         where aeis.event_id in               (SELECT ae.id                  FROM yujing.alarm_event ae                 where ae.event_uuid like 'yuanwtj_%')) as ad where ad.all_id not in (SELECT id as save_id                           FROM yujing.alarm_event_info_snapshot aeis                          where aeis.event_id in                                (SELECT ae.id                                   FROM yujing.alarm_event ae                                  where ae.event_uuid like 'yuanwtj_%')                          group by (aeis.event_id))

After optimization:

select ad.all_id as del_id  from (SELECT aeis.id as all_id          FROM yujing.alarm_event ae         right join yujing.alarm_event_info_snapshot aeis            on aeis.event_id = ae.id         where ae.event_uuid like 'yuanwtj_%') as ad  left join (SELECT aeis.id as save_id               FROM yujing.alarm_event ae              right join yujing.alarm_event_info_snapshot aeis                 on aeis.event_id = ae.id              where ae.event_uuid like 'yuanwtj_%'              group by (aeis.event_id)) as sd    on ad.all_id = sd.save_id where sd.save_id is null

Step 4: delete all nodes by ID. Note that if a large amount of data in MySQL needs to be deleted in batches, I finally use the ETL Tool to delete nodes in batches.

Summary: in MySQL databases, it is recommended that you do not dynamically obtain matching values in queries with the in or not in keywords in SQL statements. When there is a large amount of data, they are very inefficient to use, you can use left and right connections instead of in operations, which will increase the efficiency by many times, especially in large data volumes.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.