How to delete MySQL duplicate data: mysql duplicate data

Source: Internet
Author: User

How to delete MySQL duplicate data: mysql duplicate data

This example describes how to delete duplicate MySQL Data. Share it with you for your reference. The specific method is as follows:

Project Background

In a recent linux performance collection project, it was found that the thread program was slow to store in the database, and then carefully located and found a lot of redundant data in the database. During the collection, only one data point at the same time should be available for the same device. However, multiple data entries are stored in the database. I have no idea about how to cause this result. But to solve the problem of slow warehouse receiving, We need to delete redundant data first.

Problem description

The table structure of the database is very simple, as follows:
Copy codeThe Code is as follows: + ---------------- + ------------ + ------ + ----- + --------- + ------- +
| Field | Type | Null | Key | Default | Extra |
+ ---------------- + -------------- + ------ + ----- + --------- + ------- +
| Id | varcharar (255) | NO | PRI | NULL |
| Conf_id | varchar (255) | NO | MUL | NULL |
| Insert_time | datetime | YES | NULL |
| Cpu_usage | float (11, 2) | YES | NULL |
| Memory_usage | float (11, 2) | YES | NULL |
| Io_usage_write | float (11,2) | YES | NULL |
| Io_usage_read | float (11,2) | YES | NULL |
+ ---------------- + -------------- + ------ + ----- + --------- + ------- +
Query all data volumes
Copy codeThe Code is as follows: select count (*) from perf_linux;
427366 output

Query the data volume of different devices at all time points
Copy codeThe Code is as follows: select count (distinct conf_id, insert_time) from perf_linux;
42387 output

The data above shows that the data redundancy is about 10 times.

Let's look at the time group:
Copy codeThe Code is as follows: select id, conf_id, insert_time from perf_linux order by insert_time, conf_id;
Output:
Copy codeThe Code is as follows: | 2a79f7cd-43a9-4c7b-adb2-450b6c04283e | 1 | 15:09:14 |
| 50d6f6c2-9c8b-45fd-98fd-2be211221cfd | 1 | 15:09:14 |
| 740b52e1-e868-4074-ba36-74e2634401b3 | 1 | 15:09:14 |
| 8b0096a4-9e85-492b-a131-e3505ca79a9c | 1 | 15:09:14 |
| 90a9e882-5220-4268-a56f-8d4ab4a7929b | 1 | 15:09:14 |
| D17403ed-24a4-45e8-b51b-2a95118383d9 | 1 | 15:09:14 |
| 0c2da917-579b-4080-857d-7159f38b44ac | 2 | 15:09:14 |
| 263083eb-8f63-4d2b-a03f-3320aa678735 | 2 | 15:09:14 |
| D6c57a38-080b-465a-a55a-beafd9daf32d | 2 | 15:09:14 |
| F672227b-1fb8-4b85-880d-2cc34b02880d | 2 | 15:09:14 |
| F80020fe-6cb5-48ec-beb0-4e8ebeb0ca57 | 2 | 15:09:14 |
| Ff633a35-824d-49ba-b78c-5bcc5df8d1cc | 2 | 15:09:14 |
| 5c41e48a-abfc-4268-a00e-ca7def7d5a5a | 3 | 15:09:14 |
| 60b7ab9e-c91a-4020-a6d3-7bceb1dc47c5 | 3 | 15:09:14 |
| 7b6cd2b8-ac6d-43eb-8858-e15885e676c8 | 3 | 15:09:14 |
| D53a3df5-08c4-4604-8fac-cb51077935f6 | 3 | 15:09:14 |
| D9e4ba14-f98d-42a8-b3bc-2879d58aa797 | 3 | 15:09:14 |
| F56f82f6-32a7-47f7-ae07-b13168743884 | 3 | 15:09:14 |
| 076c4c1b-0028-4a9c-a8c4-de655bd6ab6b | 4 | 15:09:14 |
| 2a90ad9e-11a5-4707-95e8-78491da658ad | 4 | 15:09:14 |
| 3b17ad1d-e589-4b65-93a7-d61fc99b4071 | 4 | 15:09:14 |
| 6988d6cf-44ef-47f7-808d-09791caf2d90 | 4 | 15:09:14 |
| 8404d281-f9e5-4263-a47e-128c05386758 | 4 | 15:09:14 |
| E042e310-7ff2-4e4d-8c98-71e3e4d57828 | 4 | 15:09:14 |
+ -------------------------------------- + --------- + --------------------- +
It can be seen that the data of the same device at the same time point is redundant. Now we need to remove the redundant data.

Solution

The idea is as follows: first, you should judge by conf_id and time point to query groups (group by). You can obtain one in each group. Grouping is simple, but how can we get one group? I used an intermediate table.

Create an intermediate table and import the data to the intermediate table
Copy codeThe Code is as follows: create table perf_linux_t like perf_linux;
Insert into perf_linux_t select * from perf_linux;
Add a field to the intermediate table. This field is self-incrementing.
Copy codeThe Code is as follows: alter table 'perf _ linux_t'
Add column 'Auto _ id' int not null AUTO_INCREMENT,
Drop primary key,
Add primary key ('Auto _ id ');
Delete useless data

First Query
Copy codeThe Code is as follows: select min (auto_id) as auto_id from perf_linux_t group by insert_time;
Delete incorrect data
Copy codeThe Code is as follows: delete from perf_linux_t where auto_id not in (select min (auto_id) as auto_id from perf_linux_t group by insert_time );
Slow, output error:

You can't specify target table 'perf _ linux_t 'for update in FROM clause

You can only create an intermediate table.

Create another intermediate table
Copy codeThe Code is as follows: create table tmp like perf_linux_t;
Instead of deleting non-conforming data, you can save the conforming data to this new table.
Copy codeThe Code is as follows: insert into tmp select * from perf_linux_t where auto_id in (select min (auto_id) as auto_id from perf_linux_t group by insert_time, conf_id );
Delete useless columns in this table
Copy codeThe Code is as follows: alter table 'tmp'
Drop column 'Auto _ id ',
Drop primary key;
Export back data

Delete original data
Copy codeThe Code is as follows: truncate table perf_linux;
Insert data
Copy codeThe Code is as follows: insert into perf_linux select * from tmp;
Delete intermediate table
Copy codeThe Code is as follows: drop table tmp;
Drop table perf_linux_t;
Summary

In this method, the data is changed to 42387 records, and redundant data is deleted. But in fact, the program problem is not completely located, and it still needs to be observed to locate the problem.

I hope this article will help you design mysql database programs.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.