Summary and Efficiency Comparison of Methods for deleting duplicate records in MySQL tables

Source: Internet
Author: User

The following methods are used to test on my VM. The memory is 384 Mb, the swap partition is 1024 MB, and the test has 3.5 million data records. The requirement is as follows:

Createdatabase /*! 32312 if not exists */'test '/*! 40100 default Character Set utf8 */;

Use 'test ';

/* Table Structure for table 'test '*/

Droptableifexists 'test ';

Createtable 'test '(
'Id' int (11) notnull auto_increment,
'Name' char (20) defaultnull comment 'name ',
'Age' tinyint (4) defaultnull comment 'age ',
'Mate 'tinyint (4) default '1' comment' with or without a spouse (1-0-none )',
Primarykey ('id '),
Key 'idx _ name' ('name '),
Key 'idx _ age' ('age ')
) Engine = MyISAM auto_increment = 10 default charset = utf8;

/*! 40101 set SQL _mode = @ old_ SQL _mode */;
/*! 40014 set foreign_key_checks = @ old_foreign_key_checks */;

Existing records:

Insertinto 'test' ('id', 'name', 'age', 'mate ') values (2, 'aaaaa ),
(3, 'bbbbb ),
(4, 'cccc ),
(5, 'ddddd', 26, 0 ),
(6, 'eee', 24, 0 ),
(7, 'fffff', 18, 0 ),
(8, 'eeeee ),
(9, 'eeeee', 60, 1 );

To remove the record with duplicate names, use the following method:

1. Change the name field to a unique index:

Dropindex idx_name on test;
Altertable test adduniqueindex (name );

In this way, when the same record is added to the table, the error 1062 is returned.
However, there is a situation where the table already has n Repeated Records. At this time, we remember to add a unique index and then execute the above operations, the database will tell you that there are already repeated records and index creation fails. At this time, we can use the following operations:

Alter ignore table test addunique idx_name (name );

It will delete duplicate records (don't worry, it will keep one record) (but this command is available before mysql5.1.37 and won't work after 5.1.48), and then create a unique index, efficient and user-friendly.

2. method 1 of table reconstruction:

Create another table. To prevent the loss of the original table structure, create a temporary table ",

Use 'test ';

/* Table Structure for table 'test '*/

Droptableifexists 'uniq _ test ';

Createtable 'uniq _ test '(
'Id' int (11) notnull auto_increment,
'Name' char (20) defaultnull comment 'name ',
'Age' tinyint (4) defaultnull comment 'age ',
'Mate 'tinyint (4) default '1' comment' with or without a spouse (1-0-none )',
Primarykey ('id '),
Key 'idx _ name' ('name '),
Key 'idx _ age' ('age ')
) Engine = MyISAM auto_increment = 10 default charset = utf8;

Search for data from the test table and add it to uniq_test:

Insertinto uniq_test select * from test groupby name;
Drop table test;
Rename table uniq_test to test;

3. Delete duplicate records:

Create a table to store the ID of the record to be deleted:

Createtable 'tmp _ kids '(
'Id' int (11 ),
'Name' char ( 20)
) Engine = MyISAM;

If there are not many records to delete, you can create the table as a memory table:

Createtable 'tmp _ kids '(
'Id' int (11 ),
'Name' char ( 20)
) Engine = heap;

Delete duplicate records in the test table:

Insertinto tmp_ids selectmin (ID), name from test groupby name havingcount (*)> 1 order by NULL;
Delete A. * from test A, tmp_ids B where B. Name = A. Name and A. ID> B. ID;
Truncatetable tmp_ids;

4. inefficient methods

Delete test as a from test as,
(
Select *
From Test
Groupby name
Havingcount (1)> 1
Orderbynull
) As B
Where a. Name = B. Name and A. ID> B. ID;

Summary:

The first method has a history of 22 minutes and the system load is about 5 minutes;

The second method is very inefficient. It destroys the unknown index file and terminates execution.

The third method takes 17 minutes.

Insertinto tmp_ids selectmin (ID), name from test groupby name havingcount (*)> 1 orderbynull

It takes 15 minutes, the delete operation takes 2 minutes, and the system load is about 3 minutes.

Method 4: During the execution, all the index files of test are damaged, so the "power" is huge;

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.