How to delete MySQL duplicate data: mysql duplicate data
This example describes how to delete duplicate MySQL Data. Share it with you for your reference. The specific method is as follows:
Project Background
In a recent linux performance collection project, it was found that the thread program was slow to store in the database, and then carefully located and found a lot of redundant data in the database. During the collection, only one data point at the same time should be available for the same device. However, multiple data entries are stored in the database. I have no idea about how to cause this result. But to solve the problem of slow warehouse receiving, We need to delete redundant data first.
Problem description
The table structure of the database is very simple, as follows:
Copy codeThe Code is as follows: + ---------------- + ------------ + ------ + ----- + --------- + ------- +
| Field | Type | Null | Key | Default | Extra |
+ ---------------- + -------------- + ------ + ----- + --------- + ------- +
| Id | varcharar (255) | NO | PRI | NULL |
| Conf_id | varchar (255) | NO | MUL | NULL |
| Insert_time | datetime | YES | NULL |
| Cpu_usage | float (11, 2) | YES | NULL |
| Memory_usage | float (11, 2) | YES | NULL |
| Io_usage_write | float (11,2) | YES | NULL |
| Io_usage_read | float (11,2) | YES | NULL |
+ ---------------- + -------------- + ------ + ----- + --------- + ------- +
Query all data volumes
Copy codeThe Code is as follows: select count (*) from perf_linux;
427366 output
Query the data volume of different devices at all time points
Copy codeThe Code is as follows: select count (distinct conf_id, insert_time) from perf_linux;
42387 output
The data above shows that the data redundancy is about 10 times.
Let's look at the time group:
Copy codeThe Code is as follows: select id, conf_id, insert_time from perf_linux order by insert_time, conf_id;
Output:
Copy codeThe Code is as follows: | 2a79f7cd-43a9-4c7b-adb2-450b6c04283e | 1 | 15:09:14 |
| 50d6f6c2-9c8b-45fd-98fd-2be211221cfd | 1 | 15:09:14 |
| 740b52e1-e868-4074-ba36-74e2634401b3 | 1 | 15:09:14 |
| 8b0096a4-9e85-492b-a131-e3505ca79a9c | 1 | 15:09:14 |
| 90a9e882-5220-4268-a56f-8d4ab4a7929b | 1 | 15:09:14 |
| D17403ed-24a4-45e8-b51b-2a95118383d9 | 1 | 15:09:14 |
| 0c2da917-579b-4080-857d-7159f38b44ac | 2 | 15:09:14 |
| 263083eb-8f63-4d2b-a03f-3320aa678735 | 2 | 15:09:14 |
| D6c57a38-080b-465a-a55a-beafd9daf32d | 2 | 15:09:14 |
| F672227b-1fb8-4b85-880d-2cc34b02880d | 2 | 15:09:14 |
| F80020fe-6cb5-48ec-beb0-4e8ebeb0ca57 | 2 | 15:09:14 |
| Ff633a35-824d-49ba-b78c-5bcc5df8d1cc | 2 | 15:09:14 |
| 5c41e48a-abfc-4268-a00e-ca7def7d5a5a | 3 | 15:09:14 |
| 60b7ab9e-c91a-4020-a6d3-7bceb1dc47c5 | 3 | 15:09:14 |
| 7b6cd2b8-ac6d-43eb-8858-e15885e676c8 | 3 | 15:09:14 |
| D53a3df5-08c4-4604-8fac-cb51077935f6 | 3 | 15:09:14 |
| D9e4ba14-f98d-42a8-b3bc-2879d58aa797 | 3 | 15:09:14 |
| F56f82f6-32a7-47f7-ae07-b13168743884 | 3 | 15:09:14 |
| 076c4c1b-0028-4a9c-a8c4-de655bd6ab6b | 4 | 15:09:14 |
| 2a90ad9e-11a5-4707-95e8-78491da658ad | 4 | 15:09:14 |
| 3b17ad1d-e589-4b65-93a7-d61fc99b4071 | 4 | 15:09:14 |
| 6988d6cf-44ef-47f7-808d-09791caf2d90 | 4 | 15:09:14 |
| 8404d281-f9e5-4263-a47e-128c05386758 | 4 | 15:09:14 |
| E042e310-7ff2-4e4d-8c98-71e3e4d57828 | 4 | 15:09:14 |
+ -------------------------------------- + --------- + --------------------- +
It can be seen that the data of the same device at the same time point is redundant. Now we need to remove the redundant data.
Solution
The idea is as follows: first, you should judge by conf_id and time point to query groups (group by). You can obtain one in each group. Grouping is simple, but how can we get one group? I used an intermediate table.
Create an intermediate table and import the data to the intermediate table
Copy codeThe Code is as follows: create table perf_linux_t like perf_linux;
Insert into perf_linux_t select * from perf_linux;
Add a field to the intermediate table. This field is self-incrementing.
Copy codeThe Code is as follows: alter table 'perf _ linux_t'
Add column 'Auto _ id' int not null AUTO_INCREMENT,
Drop primary key,
Add primary key ('Auto _ id ');
Delete useless data
First Query
Copy codeThe Code is as follows: select min (auto_id) as auto_id from perf_linux_t group by insert_time;
Delete incorrect data
Copy codeThe Code is as follows: delete from perf_linux_t where auto_id not in (select min (auto_id) as auto_id from perf_linux_t group by insert_time );
Slow, output error:
You can't specify target table 'perf _ linux_t 'for update in FROM clause
You can only create an intermediate table.
Create another intermediate table
Copy codeThe Code is as follows: create table tmp like perf_linux_t;
Instead of deleting non-conforming data, you can save the conforming data to this new table.
Copy codeThe Code is as follows: insert into tmp select * from perf_linux_t where auto_id in (select min (auto_id) as auto_id from perf_linux_t group by insert_time, conf_id );
Delete useless columns in this table
Copy codeThe Code is as follows: alter table 'tmp'
Drop column 'Auto _ id ',
Drop primary key;
Export back data
Delete original data
Copy codeThe Code is as follows: truncate table perf_linux;
Insert data
Copy codeThe Code is as follows: insert into perf_linux select * from tmp;
Delete intermediate table
Copy codeThe Code is as follows: drop table tmp;
Drop table perf_linux_t;
Summary
In this method, the data is changed to 42387 records, and redundant data is deleted. But in fact, the program problem is not completely located, and it still needs to be observed to locate the problem.
I hope this article will help you design mysql database programs.