This example describes how to delete duplicate MySQL data. Share it with you for your reference. The specific method is as follows: Project background in a recently developed linux performance collection project, the thread program is found to be slow in receiving the database, and then carefully located to find a lot of redundant data in the database. Because in the collection, for the same device, there should be only one
This example describes how to delete duplicate MySQL data. Share it with you for your reference. The specific method is as follows: Project background in a recently developed linux performance collection project, the thread program is found to be slow in receiving the database, and then carefully located to find a lot of redundant data in the database. Because in the collection, for the same device, there should be only one
This example describes how to delete duplicate MySQL data. Share it with you for your reference. The specific method is as follows:
Project Background
In a recent linux performance collection project, it was found that the thread program was slow to store in the database, and then carefully located and found a lot of redundant data in the database. During the collection, only one data point at the same time should be available for the same device. However, multiple data entries are stored in the database. I have no idea about how to cause this result. but to solve the problem of slow warehouse receiving, we need to delete redundant data first.
Problem Description
The table structure of the database is very simple, as follows:
+ ---------------- + -------------- + ------ + ----- + --------- + ------- +
| Field | Type | Null | Key | Default | Extra |
+ ---------------- + -------------- + ------ + ----- + --------- + ------- +
| Id | varcharar (255) | NO | PRI | NULL |
| Conf_id | varchar (255) | NO | MUL | NULL |
| Insert_time | datetime | YES | NULL |
| Cpu_usage | float (11, 2) | YES | NULL |
| Memory_usage | float (11, 2) | YES | NULL |
| Io_usage_write | float (11,2) | YES | NULL |
| Io_usage_read | float (11,2) | YES | NULL |
+ ---------------- + -------------- + ------ + ----- + --------- + ------- +
Query all data volumes
Select count (*) from perf_linux;
427366 output
Query the data volume of different devices at all time points
Select count (distinct conf_id, insert_time) from perf_linux;
42387 output
The data above shows that the data redundancy is about 10 times.
Let's look at the time group:
Select id, conf_id, insert_time from perf_linux order by insert_time, conf_id;
Output:
| 2a79f7cd-43a9-4c7b-adb2-450b6c04283e | 1 | 15:09:14 |
| 50d6f6c2-9c8b-45fd-98fd-2be211221cfd | 1 | 15:09:14 |
| 740b52e1-e868-4074-ba36-74e2634401b3 | 1 | 15:09:14 |
| 8b0096a4-9e85-492b-a131-e3505ca79a9c | 1 | 15:09:14 |
| 90a9e882-5220-4268-a56f-8d4ab4a7929b | 1 | 15:09:14 |
| D17403ed-24a4-45e8-b51b-2a95118383d9 | 1 | 15:09:14 |
| 0c2da917-579b-4080-857d-7159f38b44ac | 2 | 15:09:14 |
| 263083eb-8f63-4d2b-a03f-3320aa678735 | 2 | 15:09:14 |
| D6c57a38-080b-465a-a55a-beafd9daf32d | 2 | 15:09:14 |
| F672227b-1fb8-4b85-880d-2cc34b02880d | 2 | 15:09:14 |
| F80020fe-6cb5-48ec-beb0-4e8ebeb0ca57 | 2 | 15:09:14 |
| Ff633a35-824d-49ba-b78c-5bcc5df8d1cc | 2 | 15:09:14 |
| 5c41e48a-abfc-4268-a00e-ca7def7d5a5a | 3 | 15:09:14 |
| 60b7ab9e-c91a-4020-a6d3-7bceb1dc47c5 | 3 | 15:09:14 |
| 7b6cd2b8-ac6d-43eb-8858-e15885e676c8 | 3 | 15:09:14 |
| D53a3df5-08c4-4604-8fac-cb51077935f6 | 3 | 15:09:14 |
| D9e4ba14-f98d-42a8-b3bc-2879d58aa797 | 3 | 15:09:14 |
| F56f82f6-32a7-47f7-ae07-b13168743884 | 3 | 15:09:14 |
| 076c4c1b-0028-4a9c-a8c4-de655bd6ab6b | 4 | 15:09:14 |
| 2a90ad9e-11a5-4707-95e8-78491da658ad | 4 | 15:09:14 |
| 3b17ad1d-e589-4b65-93a7-d61fc99b4071 | 4 | 15:09:14 |
| 6988d6cf-44ef-47f7-808d-09791caf2d90 | 4 | 15:09:14 |
| 8404d281-f9e5-4263-a47e-128c05386758 | 4 | 15:09:14 |
| E042e310-7ff2-4e4d-8c98-71e3e4d57828 | 4 | 15:09:14 |
+ -------------------------------------- + --------- + --------------------- +
It can be seen that the data of the same device at the same time point is redundant. now we need to remove the redundant data.
Solution
The idea is as follows: first, you should judge by conf_id and time point to query groups (group by). you can obtain one in each group. Grouping is simple, but how can we get one group? I used an intermediate table.
Create an intermediate table and import the data to the intermediate table
Create table perf_linux_t like perf_linux;
Insert into perf_linux_t select * from perf_linux;
Add a field to the intermediate table. This field is self-incrementing.
Alter table 'Perf _ linux_t'
Add column 'auto _ id' int not null AUTO_INCREMENT,
Drop primary key,
Add primary key ('auto _ id ');
Delete useless data
First query
Select min (auto_id) as auto_id from perf_linux_t group by insert_time;
Delete incorrect data
Delete from perf_linux_t where auto_id not in (select min (auto_id) as auto_id from perf_linux_t group by insert_time );
Slow, output error:
You can't specify target table 'Perf _ linux_t 'for update in FROM clause
You can only create an intermediate table.
Create another intermediate table
Create table tmp like perf_linux_t;
Instead of deleting non-conforming data, you can save the conforming data to this new table.
Insert into tmp select * from perf_linux_t where auto_id in (select min (auto_id) as auto_id from perf_linux_t group by insert_time, conf_id );
Delete useless columns in this table
Alter table 'tmp'
Drop column 'auto _ id ',
Drop primary key;
Export back data
Delete original data
Truncate table perf_linux;
Insert data
Insert into perf_linux select * from tmp;
Delete intermediate table
Drop table tmp;
Drop table perf_linux_t;
Summary
In this method, the data is changed to 42387 records, and redundant data is deleted. But in fact, the program problem is not completely located, and it still needs to be observed to locate the problem.
I hope this article will help you design mysql database programs.