Delete MySQL duplicate data

Last Update:2014-12-16 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Original: Delete MySQL duplicate data

Delete MySQL duplicate data project background

In a recent Linux performance acquisition project, found that the thread of the program storage is slow, and then carefully locate, found in the database a lot of redundant data. Because in the acquisition, for the same device, the same point in time should have only one data, however, the database is stored in a number of data. For how to cause this result, for a while do not think clearly, but in order to solve the problem of slow storage, first to delete redundant data.

Problem description

The table structure of the database is simple, as follows:

+----------------+--------------+------+-----+---------+-------+| Field          | Type         | Null | Key | Default | Extra |+----------------+--------------+------+-----+---------+-------+| id             | varchar(255) | NO   | PRI | NULL    |       || conf_id        | varchar(255) | NO   | MUL | NULL    |       || insert_time    | datetime     | YES  |     | NULL    |       || cpu_usage      | float(11,2)  | YES  |     | NULL    |       || memory_usage   | float(11,2)  | YES  |     | NULL    |       || io_usage_write | float(11,2)  | YES  |     | NULL    |       || io_usage_read  | float(11,2)  | YES  |     | NULL    |       |+----------------+--------------+------+-----+---------+-------+

Querying all data volumes

select count(*) from perf_linux;

Output 427366

Query the amount of data for different devices at all points in time

select count(distinct conf_id, insert_time) from perf_linux ;

Output 42387

From the above data can be seen, data redundancy of about 10 times times.

Then take a look at the time group:

select id, conf_id ,insert_time from perf_linux order by insert_time, conf_id;

Output:

| 2a79f7cd-43a9-4c7b-adb2-316b6c04283e | 1 | 2014-12-09 15:09:14 | | 50D6F6C2-9C8B-45FD-98FD-2BE211221CFD | 1 | 2014-12-09 15:09:14 | | 740b52e1-e868-4074-ba36-74e2634401b3 | 1 | 2014-12-09 15:09:14 | | 8b0096a4-9e85-417b-a131-e3505ca79a9c | 1 | 2014-12-09 15:09:14 | | 90a9e882-5220-4508-a56f-8d4ab4a7929b | 1 | 2014-12-09 15:09:14 | | D17403ed-24a4-45e8-b51b-2a95118383d9 | 1 | 2014-12-09 15:09:14 | | 0C2DA917-579B-4080-857D-7159F38B44AC | 2 | 2014-12-09 15:09:14 | | 263083eb-8f63-4d2b-a03f-3320aa678735 | 2 | 2014-12-09 15:09:14 | | d6c57a38-080b-465a-a55a-beafd9daf32d | 2 | 2014-12-09 15:09:14 | | f672227b-1fb8-4b85-880d-2cc34b02880d | 2 | 2014-12-09 15:09:14 | | F80020fe-6cb5-48ec-beb0-4e8ebeb0ca57 | 2 | 2014-12-09 15:09:14 | | ff633a35-824d-49ba-b78c-5bcc5df8d1cc | 2 | 2014-12-09 15:09:14 | | 5c41e48a-abfc-4108-a00e-ca7def7d5a5a | 3 | 2014-12-09 15:09:14 | | 60B7AB9E-C91A-4020-A6D3-7BCEB1DC47C5 | 3 | 2014-12-09 on 15: 09:14 | | 7b6cd2b8-ac6d-43eb-8858-e15885e676c8 | 3 | 2014-12-09 15:09:14 | | D53a3df5-08c4-4604-8fac-cb51077935f6 | 3 | 2014-12-09 15:09:14 | | d9e4ba14-f98d-42a8-b3bc-2879d58aa797 | 3 | 2014-12-09 15:09:14 | | f56f82f6-32a7-47f7-ae07-b13168743884 | 3 | 2014-12-09 15:09:14 | | 076c4c1b-0028-4a9c-a8c4-de655bd6ab6b | 4 | 2014-12-09 15:09:14 | | 2A90AD9E-11A5-4707-95E8-78491DA658AD | 4 | 2014-12-09 15:09:14 | | 3b17ad1d-e589-4b65-93a7-d61fc99b4071 | 4 | 2014-12-09 15:09:14 | | 6988d6cf-44ef-47f7-808d-09791caf2d90 | 4 | 2014-12-09 15:09:14 | | 8404d281-f9e5-4153-a47e-128c05386758 | 4 | 2014-12-09 15:09:14 | | e042e310-7ff2-4e4d-8c98-71e3e4d57828 | 4 | 2014-12-09 15:09:14 |+--------------------------------------+---------+---------------------+

From the visible, the same time point of the same device data redundancy, now we want to remove these redundant data.

Workaround

The idea is this: first of all, according to CONF_ID and time point to judge, group by query, each group to take one more. Grouping is very simple, but how to take a group? I used the form of an intermediate table.

Create an intermediate table and import the data into the intermediate table

create table perf_linux_t like perf_linux;

insert into perf_linux_t select * from perf_linux;

Add a field to the intermediate table, which is self-growing.

ALTER TABLE `perf_linux_t` ADD COLUMN `auto_id` INT NOT NULL AUTO_INCREMENT ,DROP PRIMARY KEY,ADD PRIMARY KEY (`auto_id`);

Delete useless data

Check it out first.

select min(auto_id) as auto_id from perf_linux_t group by insert_time ;

Delete the incorrect data

delete  from perf_linux_t where auto_id not in (select min(auto_id) as auto_id from perf_linux_t group by insert_time);

Wait, output error:

You can ' t specify target table ' perf_linux_t ' for update in FROM clause

Can not delete Ah, it can only build another intermediate table.

Re-build the intermediate table

create table tmp like perf_linux_t;

Instead of deleting non-conforming data, the data is stored in the new table.

insert into tmp select * from perf_linux_t where auto_id in (select min(auto_id) as auto_id from perf_linux_t group by insert_time,conf_id );

Remove the useless columns from this table

ALTER TABLE `tmp` DROP COLUMN `auto_id`,DROP PRIMARY KEY;

Data back

Delete the original data

truncate table perf_linux;

Inserting data

insert into perf_linux select * from tmp;

Delete an intermediate table

drop table tmp;

drop table perf_linux_t;

Summarize

By this method, the data is changed to 42,387 and the redundant data is deleted. But in fact the problem of the program is not fully positioned, but also need to observe to locate the problem.

And I think that this method should be compared to the method of soil, if you have a better way, I hope the generous enlighten.

Reference:

MySQL method for deleting duplicate record statements

Delete MySQL duplicate data

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Delete MySQL duplicate data

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Delete MySQL duplicate data

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support