Business: A table contains a lot of data (ID is the auto-incrementing primary key), and some of the data contains duplicate data.
Objective: To delete the duplicate data and retain the data inserted for the first time, and keep other data unaffected.
Problem solving process:
Step 1: Find the ID (save_id) of all data to be retained)
SELECT id as save_id FROM yujing.alarm_event_info_snapshot aeis where aeis.event_id in (SELECT ae.id FROM yujing.alarm_event ae where ae.event_uuid like 'yuanwtj_%') group by (aeis.event_id)
After optimization:
SELECT aeis.id as save_id FROM yujing.alarm_event ae right join yujing.alarm_event_info_snapshot aeis on aeis.event_id = ae.id where ae.event_uuid like 'yuanwtj_%' group by (aeis.event_id)
Step 2: Obtain the ID (all_id) of all related data)
SELECT aeis.id as all_id FROM yujing.alarm_event_info_snapshot aeis where aeis.event_id in (SELECT ae.id FROM yujing.alarm_event ae where ae.event_uuid like 'yuanwtj_%') order by aeis.event_id
After optimization:
SELECT aeis.id as all_id FROM yujing.alarm_event ae right join yujing.alarm_event_info_snapshot aeis on aeis.event_id = ae.id where ae.event_uuid like 'yuanwtj_%'
Step 3: Get the ID (del_id) of the data to be deleted)
select ad.all_id as del_id from (SELECT aeis.id as all_id FROM yujing.alarm_event_info_snapshot aeis where aeis.event_id in (SELECT ae.id FROM yujing.alarm_event ae where ae.event_uuid like 'yuanwtj_%')) as ad where ad.all_id not in (SELECT id as save_id FROM yujing.alarm_event_info_snapshot aeis where aeis.event_id in (SELECT ae.id FROM yujing.alarm_event ae where ae.event_uuid like 'yuanwtj_%') group by (aeis.event_id))
After optimization:
select ad.all_id as del_id from (SELECT aeis.id as all_id FROM yujing.alarm_event ae right join yujing.alarm_event_info_snapshot aeis on aeis.event_id = ae.id where ae.event_uuid like 'yuanwtj_%') as ad left join (SELECT aeis.id as save_id FROM yujing.alarm_event ae right join yujing.alarm_event_info_snapshot aeis on aeis.event_id = ae.id where ae.event_uuid like 'yuanwtj_%' group by (aeis.event_id)) as sd on ad.all_id = sd.save_id where sd.save_id is null
Step 4: delete all nodes by ID. Note that if a large amount of data in MySQL needs to be deleted in batches, I finally use the ETL Tool to delete nodes in batches.
Summary: in MySQL databases, it is recommended that you do not dynamically obtain matching values in queries with the in or not in keywords in SQL statements. When there is a large amount of data, they are very inefficient to use, you can use left and right connections instead of in operations, which will increase the efficiency by many times, especially in large data volumes.