How to delete MySQL duplicate data-mysql tutorial

Source: Internet
Author: User
This example describes how to delete duplicate MySQL data. Share it with you for your reference. The specific method is as follows: Project background in a recently developed linux performance collection project, the thread program is found to be slow in receiving the database, and then carefully located to find a lot of redundant data in the database. Because in the collection, for the same device, there should be only one

This example describes how to delete duplicate MySQL data. Share it with you for your reference. The specific method is as follows: Project background in a recently developed linux performance collection project, the thread program is found to be slow in receiving the database, and then carefully located to find a lot of redundant data in the database. Because in the collection, for the same device, there should be only one

This example describes how to delete duplicate MySQL data. Share it with you for your reference. The specific method is as follows:

Project Background

In a recent linux performance collection project, it was found that the thread program was slow to store in the database, and then carefully located and found a lot of redundant data in the database. During the collection, only one data point at the same time should be available for the same device. However, multiple data entries are stored in the database. I have no idea about how to cause this result. but to solve the problem of slow warehouse receiving, we need to delete redundant data first.

Problem Description

The table structure of the database is very simple, as follows:

+ ---------------- + -------------- + ------ + ----- + --------- + ------- +
| Field | Type | Null | Key | Default | Extra |
+ ---------------- + -------------- + ------ + ----- + --------- + ------- +
| Id | varcharar (255) | NO | PRI | NULL |
| Conf_id | varchar (255) | NO | MUL | NULL |
| Insert_time | datetime | YES | NULL |
| Cpu_usage | float (11, 2) | YES | NULL |
| Memory_usage | float (11, 2) | YES | NULL |
| Io_usage_write | float (11,2) | YES | NULL |
| Io_usage_read | float (11,2) | YES | NULL |
+ ---------------- + -------------- + ------ + ----- + --------- + ------- +


Query all data volumes

Select count (*) from perf_linux;


427366 output

Query the data volume of different devices at all time points

Select count (distinct conf_id, insert_time) from perf_linux;


42387 output

The data above shows that the data redundancy is about 10 times.

Let's look at the time group:

Select id, conf_id, insert_time from perf_linux order by insert_time, conf_id;


Output:

| 2a79f7cd-43a9-4c7b-adb2-450b6c04283e | 1 | 15:09:14 |
| 50d6f6c2-9c8b-45fd-98fd-2be211221cfd | 1 | 15:09:14 |
| 740b52e1-e868-4074-ba36-74e2634401b3 | 1 | 15:09:14 |
| 8b0096a4-9e85-492b-a131-e3505ca79a9c | 1 | 15:09:14 |
| 90a9e882-5220-4268-a56f-8d4ab4a7929b | 1 | 15:09:14 |
| D17403ed-24a4-45e8-b51b-2a95118383d9 | 1 | 15:09:14 |
| 0c2da917-579b-4080-857d-7159f38b44ac | 2 | 15:09:14 |
| 263083eb-8f63-4d2b-a03f-3320aa678735 | 2 | 15:09:14 |
| D6c57a38-080b-465a-a55a-beafd9daf32d | 2 | 15:09:14 |
| F672227b-1fb8-4b85-880d-2cc34b02880d | 2 | 15:09:14 |
| F80020fe-6cb5-48ec-beb0-4e8ebeb0ca57 | 2 | 15:09:14 |
| Ff633a35-824d-49ba-b78c-5bcc5df8d1cc | 2 | 15:09:14 |
| 5c41e48a-abfc-4268-a00e-ca7def7d5a5a | 3 | 15:09:14 |
| 60b7ab9e-c91a-4020-a6d3-7bceb1dc47c5 | 3 | 15:09:14 |
| 7b6cd2b8-ac6d-43eb-8858-e15885e676c8 | 3 | 15:09:14 |
| D53a3df5-08c4-4604-8fac-cb51077935f6 | 3 | 15:09:14 |
| D9e4ba14-f98d-42a8-b3bc-2879d58aa797 | 3 | 15:09:14 |
| F56f82f6-32a7-47f7-ae07-b13168743884 | 3 | 15:09:14 |
| 076c4c1b-0028-4a9c-a8c4-de655bd6ab6b | 4 | 15:09:14 |
| 2a90ad9e-11a5-4707-95e8-78491da658ad | 4 | 15:09:14 |
| 3b17ad1d-e589-4b65-93a7-d61fc99b4071 | 4 | 15:09:14 |
| 6988d6cf-44ef-47f7-808d-09791caf2d90 | 4 | 15:09:14 |
| 8404d281-f9e5-4263-a47e-128c05386758 | 4 | 15:09:14 |
| E042e310-7ff2-4e4d-8c98-71e3e4d57828 | 4 | 15:09:14 |
+ -------------------------------------- + --------- + --------------------- +


It can be seen that the data of the same device at the same time point is redundant. now we need to remove the redundant data.

Solution

The idea is as follows: first, you should judge by conf_id and time point to query groups (group by). you can obtain one in each group. Grouping is simple, but how can we get one group? I used an intermediate table.

Create an intermediate table and import the data to the intermediate table

Create table perf_linux_t like perf_linux;
Insert into perf_linux_t select * from perf_linux;


Add a field to the intermediate table. This field is self-incrementing.

Alter table 'Perf _ linux_t'
Add column 'auto _ id' int not null AUTO_INCREMENT,
Drop primary key,
Add primary key ('auto _ id ');


Delete useless data

First query

Select min (auto_id) as auto_id from perf_linux_t group by insert_time;


Delete incorrect data

Delete from perf_linux_t where auto_id not in (select min (auto_id) as auto_id from perf_linux_t group by insert_time );


Slow, output error:

You can't specify target table 'Perf _ linux_t 'for update in FROM clause

You can only create an intermediate table.

Create another intermediate table

Create table tmp like perf_linux_t;


Instead of deleting non-conforming data, you can save the conforming data to this new table.

Insert into tmp select * from perf_linux_t where auto_id in (select min (auto_id) as auto_id from perf_linux_t group by insert_time, conf_id );


Delete useless columns in this table

Alter table 'tmp'
Drop column 'auto _ id ',
Drop primary key;


Export back data

Delete original data

Truncate table perf_linux;


Insert data

Insert into perf_linux select * from tmp;


Delete intermediate table

Drop table tmp;
Drop table perf_linux_t;


Summary

In this method, the data is changed to 42387 records, and redundant data is deleted. But in fact, the program problem is not completely located, and it still needs to be observed to locate the problem.

I hope this article will help you design mysql database programs.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.