Delete MySQL duplicate data

Source: Internet
Author: User

Original: Delete MySQL duplicate data

Delete MySQL duplicate data project background

In a recent Linux performance acquisition project, found that the thread of the program storage is slow, and then carefully locate, found in the database a lot of redundant data. Because in the acquisition, for the same device, the same point in time should have only one data, however, the database is stored in a number of data. For how to cause this result, for a while do not think clearly, but in order to solve the problem of slow storage, first to delete redundant data.

Problem description

The table structure of the database is simple, as follows:

+----------------+--------------+------+-----+---------+-------+| Field          | Type         | Null | Key | Default | Extra |+----------------+--------------+------+-----+---------+-------+| id             | varchar(255) | NO   | PRI | NULL    |       || conf_id        | varchar(255) | NO   | MUL | NULL    |       || insert_time    | datetime     | YES  |     | NULL    |       || cpu_usage      | float(11,2)  | YES  |     | NULL    |       || memory_usage   | float(11,2)  | YES  |     | NULL    |       || io_usage_write | float(11,2)  | YES  |     | NULL    |       || io_usage_read  | float(11,2)  | YES  |     | NULL    |       |+----------------+--------------+------+-----+---------+-------+

Querying all data volumes

select count(*) from perf_linux;

Output 427366

Query the amount of data for different devices at all points in time

select count(distinct conf_id, insert_time) from perf_linux ;

Output 42387

From the above data can be seen, data redundancy of about 10 times times.

Then take a look at the time group:

select id, conf_id ,insert_time from perf_linux order by insert_time, conf_id;

Output:

| 2a79f7cd-43a9-4c7b-adb2-316b6c04283e | 1 | 2014-12-09 15:09:14 | | 50D6F6C2-9C8B-45FD-98FD-2BE211221CFD | 1 | 2014-12-09 15:09:14 | | 740b52e1-e868-4074-ba36-74e2634401b3 | 1 | 2014-12-09 15:09:14 | | 8b0096a4-9e85-417b-a131-e3505ca79a9c | 1 | 2014-12-09 15:09:14 | | 90a9e882-5220-4508-a56f-8d4ab4a7929b | 1 | 2014-12-09 15:09:14 | | D17403ed-24a4-45e8-b51b-2a95118383d9 | 1 | 2014-12-09 15:09:14 | | 0C2DA917-579B-4080-857D-7159F38B44AC | 2 | 2014-12-09 15:09:14 | | 263083eb-8f63-4d2b-a03f-3320aa678735 | 2 | 2014-12-09 15:09:14 | | d6c57a38-080b-465a-a55a-beafd9daf32d | 2 | 2014-12-09 15:09:14 | | f672227b-1fb8-4b85-880d-2cc34b02880d | 2 | 2014-12-09 15:09:14 | | F80020fe-6cb5-48ec-beb0-4e8ebeb0ca57 | 2 | 2014-12-09 15:09:14 | | ff633a35-824d-49ba-b78c-5bcc5df8d1cc | 2 | 2014-12-09 15:09:14 | | 5c41e48a-abfc-4108-a00e-ca7def7d5a5a | 3 | 2014-12-09 15:09:14 | | 60B7AB9E-C91A-4020-A6D3-7BCEB1DC47C5 | 3 | 2014-12-09 on 15: 09:14 | | 7b6cd2b8-ac6d-43eb-8858-e15885e676c8 | 3 | 2014-12-09 15:09:14 | | D53a3df5-08c4-4604-8fac-cb51077935f6 | 3 | 2014-12-09 15:09:14 | | d9e4ba14-f98d-42a8-b3bc-2879d58aa797 | 3 | 2014-12-09 15:09:14 | | f56f82f6-32a7-47f7-ae07-b13168743884 | 3 | 2014-12-09 15:09:14 | | 076c4c1b-0028-4a9c-a8c4-de655bd6ab6b | 4 | 2014-12-09 15:09:14 | | 2A90AD9E-11A5-4707-95E8-78491DA658AD | 4 | 2014-12-09 15:09:14 | | 3b17ad1d-e589-4b65-93a7-d61fc99b4071 | 4 | 2014-12-09 15:09:14 | | 6988d6cf-44ef-47f7-808d-09791caf2d90 | 4 | 2014-12-09 15:09:14 | | 8404d281-f9e5-4153-a47e-128c05386758 | 4 | 2014-12-09 15:09:14 | | e042e310-7ff2-4e4d-8c98-71e3e4d57828 | 4 | 2014-12-09 15:09:14 |+--------------------------------------+---------+---------------------+

From the visible, the same time point of the same device data redundancy, now we want to remove these redundant data.

Workaround

The idea is this: first of all, according to CONF_ID and time point to judge, group by query, each group to take one more. Grouping is very simple, but how to take a group? I used the form of an intermediate table.

Create an intermediate table and import the data into the intermediate table
create table perf_linux_t like perf_linux;
insert into perf_linux_t select * from perf_linux;
Add a field to the intermediate table, which is self-growing.
ALTER TABLE `perf_linux_t` ADD COLUMN `auto_id` INT NOT NULL AUTO_INCREMENT ,DROP PRIMARY KEY,ADD PRIMARY KEY (`auto_id`);
Delete useless data

Check it out first.

select min(auto_id) as auto_id from perf_linux_t group by insert_time ;

Delete the incorrect data

delete  from perf_linux_t where auto_id not in (select min(auto_id) as auto_id from perf_linux_t group by insert_time);

Wait, output error:

You can ' t specify target table ' perf_linux_t ' for update in FROM clause

Can not delete Ah, it can only build another intermediate table.

Re-build the intermediate table
create table tmp like perf_linux_t;

Instead of deleting non-conforming data, the data is stored in the new table.

insert into tmp select * from perf_linux_t where auto_id in (select min(auto_id) as auto_id from perf_linux_t group by insert_time,conf_id );

Remove the useless columns from this table

ALTER TABLE `tmp` DROP COLUMN `auto_id`,DROP PRIMARY KEY;
Data back

Delete the original data

truncate table perf_linux;

Inserting data

insert into perf_linux select * from tmp;

Delete an intermediate table

drop table tmp;
drop table perf_linux_t;
Summarize

By this method, the data is changed to 42,387 and the redundant data is deleted. But in fact the problem of the program is not fully positioned, but also need to observe to locate the problem.

And I think that this method should be compared to the method of soil, if you have a better way, I hope the generous enlighten.

Reference:

MySQL method for deleting duplicate record statements

Delete MySQL duplicate data

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.