Ways to delete Mysql duplicate data _mysql

Source: Internet
Author: User

The example in this article describes how to delete MySQL duplicate data. Share to everyone for your reference. The specific methods are as follows:

Project background

In a recent Linux performance acquisition project, found that the thread of the program warehousing is very slow, and then carefully positioned to find a lot of redundant data in the database. Because in the acquisition, for the same device, the same point of time should have only one data, however, the database is stored in more than one data. For how to cause this result, for a while did not think clearly, but in order to solve the problem of slow storage, first of all to remove redundant data.

Problem description

The table structure of the database is simple, as follows:

Copy Code code as follows:
+----------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+-------+
| ID | varchar (255) | NO | PRI |       NULL | |
| conf_id | varchar (255) | NO | MUL |       NULL | |
| Insert_time | datetime |     YES | |       NULL | |
| Cpu_usage | Float (11,2) |     YES | |       NULL | |
| Memory_usage | Float (11,2) |     YES | |       NULL | |
| Io_usage_write | Float (11,2) |     YES | |       NULL | |
| Io_usage_read | Float (11,2) |     YES | |       NULL | |
+----------------+--------------+------+-----+---------+-------+

Querying all data volumes
Copy Code code as follows:
Select COUNT (*) from Perf_linux;

Output 427366

Querying the amount of data for different devices at all points in time

Copy Code code as follows:
Select COUNT (Distinct conf_id, insert_time) from Perf_linux;

Output 42387

From the above data can be seen, data redundancy of about 10 times times.

Again by the time group to see:

Copy Code code as follows:
Select ID, conf_id, insert_time from Perf_linux order by Insert_time, conf_id;

Output:
Copy Code code as follows:
| 2a79f7cd-43a9-4c7b-adb2-316b6c04283e | 1 | 2014-12-09 15:09:14 |
| 50D6F6C2-9C8B-45FD-98FD-2BE211221CFD | 1 | 2014-12-09 15:09:14 |
| 740b52e1-e868-4074-ba36-74e2634401b3 | 1 | 2014-12-09 15:09:14 |
| 8b0096a4-9e85-417b-a131-e3505ca79a9c | 1 | 2014-12-09 15:09:14 |
| 90a9e882-5220-4508-a56f-8d4ab4a7929b | 1 | 2014-12-09 15:09:14 |
| D17403ed-24a4-45e8-b51b-2a95118383d9 | 1 | 2014-12-09 15:09:14 |
| 0C2DA917-579B-4080-857D-7159F38B44AC | 2 | 2014-12-09 15:09:14 |
| 263083eb-8f63-4d2b-a03f-3320aa678735 | 2 | 2014-12-09 15:09:14 |
| d6c57a38-080b-465a-a55a-beafd9daf32d | 2 | 2014-12-09 15:09:14 |
| f672227b-1fb8-4b85-880d-2cc34b02880d | 2 | 2014-12-09 15:09:14 |
| F80020fe-6cb5-48ec-beb0-4e8ebeb0ca57 | 2 | 2014-12-09 15:09:14 |
| ff633a35-824d-49ba-b78c-5bcc5df8d1cc | 2 | 2014-12-09 15:09:14 |
| 5c41e48a-abfc-4108-a00e-ca7def7d5a5a | 3 | 2014-12-09 15:09:14 |
| 60B7AB9E-C91A-4020-A6D3-7BCEB1DC47C5 | 3 | 2014-12-09 15:09:14 |
| 7b6cd2b8-ac6d-43eb-8858-e15885e676c8 | 3 | 2014-12-09 15:09:14 |
| D53a3df5-08c4-4604-8fac-cb51077935f6 | 3 | 2014-12-09 15:09:14 |
| d9e4ba14-f98d-42a8-b3bc-2879d58aa797 | 3 | 2014-12-09 15:09:14 |
| f56f82f6-32a7-47f7-ae07-b13168743884 | 3 | 2014-12-09 15:09:14 |
| 076c4c1b-0028-4a9c-a8c4-de655bd6ab6b | 4 | 2014-12-09 15:09:14 |
| 2A90AD9E-11A5-4707-95E8-78491DA658AD | 4 | 2014-12-09 15:09:14 |
| 3b17ad1d-e589-4b65-93a7-d61fc99b4071 | 4 | 2014-12-09 15:09:14 |
| 6988d6cf-44ef-47f7-808d-09791caf2d90 | 4 | 2014-12-09 15:09:14 |
| 8404d281-f9e5-4153-a47e-128c05386758 | 4 | 2014-12-09 15:09:14 |
| e042e310-7ff2-4e4d-8c98-71e3e4d57828 | 4 | 2014-12-09 15:09:14 |
+--------------------------------------+---------+---------------------+

As can be seen from the figure above, the data for the same device at the same point in time is redundant, and now we want to remove the redundant data.

Solving method

The idea is this: first of all, should be based on the conf_id and point of time to determine the group by the query, each group to take another one can be. Grouping is simple, but how do I get one for grouping? I used the form of an intermediate table.

Create an intermediate table and import the data into the middle table

Copy Code code as follows:
CREATE table perf_linux_t like Perf_linux;
INSERT INTO perf_linux_t select * from Perf_linux;

Add a field to the middle table, which is a self growing field.
Copy Code code as follows:
ALTER TABLE ' perf_linux_t '
ADD COLUMN ' auto_id ' INT not NULL auto_increment,
DROP PRIMARY KEY,
ADD PRIMARY KEY (' auto_id ');

Delete unwanted data

Check first

Copy Code code as follows:
Select min (auto_id) as auto_id from perf_linux_t Group by Insert_time;

Delete the wrong data
Copy Code code as follows:
Delete from perf_linux_t where auto_id isn't in (select min (auto_id) as auto_id to perf_linux_t Group by Insert_time);

Wait, output error:

You can ' t specify target table ' perf_linux_t ' for update in FROM clause

Can't delete Ah, that can only build a middle table.

Re-build the middle table

Copy Code code as follows:
CREATE TABLE tmp like perf_linux_t;

Change your mind, and instead of deleting the inconsistent data, save the data to the new table.
Copy Code code as follows:
INSERT INTO TMP select * perf_linux_t where auto_id in (select min (auto_id) as auto_id from perf_linux_t Group by INS ERT_TIME,CONF_ID);

Remove the useless columns from this table
Copy Code code as follows:
ALTER TABLE ' tmp '
DROP COLUMN ' auto_id ',
DROP PRIMARY KEY;

Lead back data

Delete the original data

Copy Code code as follows:
TRUNCATE TABLE perf_linux;

Inserting data
Copy Code code as follows:
INSERT INTO Perf_linux SELECT * from TMP;

Delete an intermediate table
Copy Code code as follows:
DROP table tmp;
drop table perf_linux_t;

Summarize

In this way, the data becomes 42,387 and the redundant data is deleted. But in fact, the problem of the program is not fully positioned, it needs to be observed to locate the problem.

I hope this article is helpful to the design of MySQL database.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.