How to delete MySQL duplicate data-mysql tutorial

Last Update:2018-03-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This example describes how to delete duplicate MySQL data. Share it with you for your reference. The specific method is as follows: Project background in a recently developed linux performance collection project, the thread program is found to be slow in receiving the database, and then carefully located to find a lot of redundant data in the database. Because in the collection, for the same device, there should be only one

This example describes how to delete duplicate MySQL data. Share it with you for your reference. The specific method is as follows:

Project Background

In a recent linux performance collection project, it was found that the thread program was slow to store in the database, and then carefully located and found a lot of redundant data in the database. During the collection, only one data point at the same time should be available for the same device. However, multiple data entries are stored in the database. I have no idea about how to cause this result. but to solve the problem of slow warehouse receiving, we need to delete redundant data first.

Problem Description

The table structure of the database is very simple, as follows:

Query all data volumes

Select count (*) from perf_linux;

427366 output

Query the data volume of different devices at all time points

Select count (distinct conf_id, insert_time) from perf_linux;

42387 output

The data above shows that the data redundancy is about 10 times.

Let's look at the time group:

Select id, conf_id, insert_time from perf_linux order by insert_time, conf_id;

Output:

| 2a79f7cd-43a9-4c7b-adb2-450b6c04283e | 1 | 15:09:14 |
| 50d6f6c2-9c8b-45fd-98fd-2be211221cfd | 1 | 15:09:14 |
| 740b52e1-e868-4074-ba36-74e2634401b3 | 1 | 15:09:14 |
| 8b0096a4-9e85-492b-a131-e3505ca79a9c | 1 | 15:09:14 |
| 90a9e882-5220-4268-a56f-8d4ab4a7929b | 1 | 15:09:14 |
| D17403ed-24a4-45e8-b51b-2a95118383d9 | 1 | 15:09:14 |
| 0c2da917-579b-4080-857d-7159f38b44ac | 2 | 15:09:14 |
| 263083eb-8f63-4d2b-a03f-3320aa678735 | 2 | 15:09:14 |
| D6c57a38-080b-465a-a55a-beafd9daf32d | 2 | 15:09:14 |
| F672227b-1fb8-4b85-880d-2cc34b02880d | 2 | 15:09:14 |
| F80020fe-6cb5-48ec-beb0-4e8ebeb0ca57 | 2 | 15:09:14 |
| Ff633a35-824d-49ba-b78c-5bcc5df8d1cc | 2 | 15:09:14 |
| 5c41e48a-abfc-4268-a00e-ca7def7d5a5a | 3 | 15:09:14 |
| 60b7ab9e-c91a-4020-a6d3-7bceb1dc47c5 | 3 | 15:09:14 |
| 7b6cd2b8-ac6d-43eb-8858-e15885e676c8 | 3 | 15:09:14 |
| D53a3df5-08c4-4604-8fac-cb51077935f6 | 3 | 15:09:14 |
| D9e4ba14-f98d-42a8-b3bc-2879d58aa797 | 3 | 15:09:14 |
| F56f82f6-32a7-47f7-ae07-b13168743884 | 3 | 15:09:14 |
| 076c4c1b-0028-4a9c-a8c4-de655bd6ab6b | 4 | 15:09:14 |
| 2a90ad9e-11a5-4707-95e8-78491da658ad | 4 | 15:09:14 |
| 3b17ad1d-e589-4b65-93a7-d61fc99b4071 | 4 | 15:09:14 |
| 6988d6cf-44ef-47f7-808d-09791caf2d90 | 4 | 15:09:14 |
| 8404d281-f9e5-4263-a47e-128c05386758 | 4 | 15:09:14 |
| E042e310-7ff2-4e4d-8c98-71e3e4d57828 | 4 | 15:09:14 |
+ -------------------------------------- + --------- + --------------------- +

It can be seen that the data of the same device at the same time point is redundant. now we need to remove the redundant data.

Solution

The idea is as follows: first, you should judge by conf_id and time point to query groups (group by). you can obtain one in each group. Grouping is simple, but how can we get one group? I used an intermediate table.

Create an intermediate table and import the data to the intermediate table

Create table perf_linux_t like perf_linux;
Insert into perf_linux_t select * from perf_linux;

Add a field to the intermediate table. This field is self-incrementing.

Alter table 'Perf _ linux_t'
Add column 'auto _ id' int not null AUTO_INCREMENT,
Drop primary key,
Add primary key ('auto _ id ');

Delete useless data

First query

Select min (auto_id) as auto_id from perf_linux_t group by insert_time;

Delete incorrect data

Delete from perf_linux_t where auto_id not in (select min (auto_id) as auto_id from perf_linux_t group by insert_time );

Slow, output error:

You can't specify target table 'Perf _ linux_t 'for update in FROM clause

You can only create an intermediate table.

Create another intermediate table

Create table tmp like perf_linux_t;

Instead of deleting non-conforming data, you can save the conforming data to this new table.

Insert into tmp select * from perf_linux_t where auto_id in (select min (auto_id) as auto_id from perf_linux_t group by insert_time, conf_id );

Delete useless columns in this table

Alter table 'tmp'
Drop column 'auto _ id ',
Drop primary key;

Export back data

Delete original data

Truncate table perf_linux;

Insert data

Insert into perf_linux select * from tmp;

Delete intermediate table

Drop table tmp;
Drop table perf_linux_t;

Summary

In this method, the data is changed to 42387 records, and redundant data is deleted. But in fact, the program problem is not completely located, and it still needs to be observed to locate the problem.

I hope this article will help you design mysql database programs.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

How to delete MySQL duplicate data-mysql tutorial

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

How to delete MySQL duplicate data-mysql tutorial

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support