The example in this article describes how to delete MySQL duplicate data. Share to everyone for your reference. The specific methods are as follows:
Project background
In a recent Linux performance acquisition project, found that the thread of the program warehousing is very slow, and then carefully positioned to find a lot of redundant data in the database. Because in the acquisition, for the same device, the same point of time should have only one data, however, the database is stored in more than one data. For how to cause this result, for a while did not think clearly, but in order to solve the problem of slow storage, first of all to remove redundant data.
Problem description
The table structure of the database is simple, as follows:
Copy Code code as follows:
+----------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+-------+
| ID | varchar (255) | NO | PRI | NULL | |
| conf_id | varchar (255) | NO | MUL | NULL | |
| Insert_time | datetime | YES | | NULL | |
| Cpu_usage | Float (11,2) | YES | | NULL | |
| Memory_usage | Float (11,2) | YES | | NULL | |
| Io_usage_write | Float (11,2) | YES | | NULL | |
| Io_usage_read | Float (11,2) | YES | | NULL | |
+----------------+--------------+------+-----+---------+-------+
Querying all data volumes
Copy Code code as follows:
Select COUNT (*) from Perf_linux;
Output 427366
Querying the amount of data for different devices at all points in time
Copy Code code as follows:
Select COUNT (Distinct conf_id, insert_time) from Perf_linux;
Output 42387
From the above data can be seen, data redundancy of about 10 times times.
Again by the time group to see:
Copy Code code as follows:
Select ID, conf_id, insert_time from Perf_linux order by Insert_time, conf_id;
Output:
Copy Code code as follows:
| 2a79f7cd-43a9-4c7b-adb2-316b6c04283e | 1 | 2014-12-09 15:09:14 |
| 50D6F6C2-9C8B-45FD-98FD-2BE211221CFD | 1 | 2014-12-09 15:09:14 |
| 740b52e1-e868-4074-ba36-74e2634401b3 | 1 | 2014-12-09 15:09:14 |
| 8b0096a4-9e85-417b-a131-e3505ca79a9c | 1 | 2014-12-09 15:09:14 |
| 90a9e882-5220-4508-a56f-8d4ab4a7929b | 1 | 2014-12-09 15:09:14 |
| D17403ed-24a4-45e8-b51b-2a95118383d9 | 1 | 2014-12-09 15:09:14 |
| 0C2DA917-579B-4080-857D-7159F38B44AC | 2 | 2014-12-09 15:09:14 |
| 263083eb-8f63-4d2b-a03f-3320aa678735 | 2 | 2014-12-09 15:09:14 |
| d6c57a38-080b-465a-a55a-beafd9daf32d | 2 | 2014-12-09 15:09:14 |
| f672227b-1fb8-4b85-880d-2cc34b02880d | 2 | 2014-12-09 15:09:14 |
| F80020fe-6cb5-48ec-beb0-4e8ebeb0ca57 | 2 | 2014-12-09 15:09:14 |
| ff633a35-824d-49ba-b78c-5bcc5df8d1cc | 2 | 2014-12-09 15:09:14 |
| 5c41e48a-abfc-4108-a00e-ca7def7d5a5a | 3 | 2014-12-09 15:09:14 |
| 60B7AB9E-C91A-4020-A6D3-7BCEB1DC47C5 | 3 | 2014-12-09 15:09:14 |
| 7b6cd2b8-ac6d-43eb-8858-e15885e676c8 | 3 | 2014-12-09 15:09:14 |
| D53a3df5-08c4-4604-8fac-cb51077935f6 | 3 | 2014-12-09 15:09:14 |
| d9e4ba14-f98d-42a8-b3bc-2879d58aa797 | 3 | 2014-12-09 15:09:14 |
| f56f82f6-32a7-47f7-ae07-b13168743884 | 3 | 2014-12-09 15:09:14 |
| 076c4c1b-0028-4a9c-a8c4-de655bd6ab6b | 4 | 2014-12-09 15:09:14 |
| 2A90AD9E-11A5-4707-95E8-78491DA658AD | 4 | 2014-12-09 15:09:14 |
| 3b17ad1d-e589-4b65-93a7-d61fc99b4071 | 4 | 2014-12-09 15:09:14 |
| 6988d6cf-44ef-47f7-808d-09791caf2d90 | 4 | 2014-12-09 15:09:14 |
| 8404d281-f9e5-4153-a47e-128c05386758 | 4 | 2014-12-09 15:09:14 |
| e042e310-7ff2-4e4d-8c98-71e3e4d57828 | 4 | 2014-12-09 15:09:14 |
+--------------------------------------+---------+---------------------+
As can be seen from the figure above, the data for the same device at the same point in time is redundant, and now we want to remove the redundant data.
Solving method
The idea is this: first of all, should be based on the conf_id and point of time to determine the group by the query, each group to take another one can be. Grouping is simple, but how do I get one for grouping? I used the form of an intermediate table.
Create an intermediate table and import the data into the middle table
Copy Code code as follows:
CREATE table perf_linux_t like Perf_linux;
INSERT INTO perf_linux_t select * from Perf_linux;
Add a field to the middle table, which is a self growing field.
Copy Code code as follows:
ALTER TABLE ' perf_linux_t '
ADD COLUMN ' auto_id ' INT not NULL auto_increment,
DROP PRIMARY KEY,
ADD PRIMARY KEY (' auto_id ');
Delete unwanted data
Check first
Copy Code code as follows:
Select min (auto_id) as auto_id from perf_linux_t Group by Insert_time;
Delete the wrong data
Copy Code code as follows:
Delete from perf_linux_t where auto_id isn't in (select min (auto_id) as auto_id to perf_linux_t Group by Insert_time);
Wait, output error:
You can ' t specify target table ' perf_linux_t ' for update in FROM clause
Can't delete Ah, that can only build a middle table.
Re-build the middle table
Copy Code code as follows:
CREATE TABLE tmp like perf_linux_t;
Change your mind, and instead of deleting the inconsistent data, save the data to the new table.
Copy Code code as follows:
INSERT INTO TMP select * perf_linux_t where auto_id in (select min (auto_id) as auto_id from perf_linux_t Group by INS ERT_TIME,CONF_ID);
Remove the useless columns from this table
Copy Code code as follows:
ALTER TABLE ' tmp '
DROP COLUMN ' auto_id ',
DROP PRIMARY KEY;
Lead back data
Delete the original data
Copy Code code as follows:
TRUNCATE TABLE perf_linux;
Inserting data
Copy Code code as follows:
INSERT INTO Perf_linux SELECT * from TMP;
Delete an intermediate table
Copy Code code as follows:
DROP table tmp;
drop table perf_linux_t;
Summarize
In this way, the data becomes 42,387 and the redundant data is deleted. But in fact, the problem of the program is not fully positioned, it needs to be observed to locate the problem.
I hope this article is helpful to the design of MySQL database.