MySQL searches for duplicate rows and retains data based on certain conditions

Source: Internet
Author: User
Tags sorted by name

Summary: MySQL searches for duplicate rows and retains a record of the duplicate data according to certain conditions:

 

 

In the first case, the condition for determining whether the record is retained is a primary key or a unique value

 

--
-- Table structure 'test'
--

Create Table if not exists 'test '(
'Id' int (11) not null auto_increment,
'Name' varchar (16) not null,
'Phone' int (11) not null,
Primary Key ('id ')
) Engine = MyISAM default charset = utf8;

--
-- Export the table data 'test'
--

Insert into 'test' ('id', 'name', 'phone') Values
(1, 'a, 1234 ),
(2, 'a, 3333 ),
(3, 'B', 555 ),
(4, 'B', 6773 ),
(5, 'A', 743 ),
(6, 'C', 95434 );

Query,

 

Select * From 'test' group by name
Get
ID name phone
1 A 1234
3 B 555
6 c 95434

But what should we do if we want the name with the largest ID?

 

Select max (ID), ID, name, phone from test group by name
Get
Max (ID) ID name phone
5 1 A 1234
4 3 B 555
6 6 c 95434
We can see that although the maximum ID of each name is obtained, other data is still the first line of each name.

 

Subquery
Select * from (select * from test order by id desc) T group by name
ID name phone
5 A 743
4 B 6773
6 c 95434
This is the expected result. The records appear to be sorted by name when the group is executed (the relationship between group and order needs to be determined ). However, in the case of a large number of rows, this method is equivalent to copying the entire table once, with a low efficiency. It takes 31144 s to test 58.437 pieces of data.

 

Use this seed to query select * from test t where ID in (select max (ID) from test group by name)
Get
ID name phone
4 B 6773
5 A 743
6 c 95434
However, this seed query uses in to test 31144 pieces of data, which is less efficient than 0.15 million s. Therefore, when the data volume is large, it is best not to use this query.

 

In the second case, the condition for determining the record to be retained is the repeated value.

 

--
-- Table structure 'message'
--

Create Table 'message '(
'Pid 'int (11) not null,
'Nums' int (11) default null,
'Source _ type' char (255) default null,
'Source _ id' int (11) default null,
'Source _ name' char (255) default null,
'Columnid' int (11) default null,
'Category' char (255) default null,
'Cp _ id' char (255) default null,
'Company' char (255) default null,
'Online 'char (255) default null,
'Last _ modify_time 'char (255) default null,
'Operator _ system' char (255) default null,
'Java _ platform' char (255) default null,
'Screen _ length' int (11) default null,
'Screen _ width' int (11) default null,
Primary Key ('pid ')
) Engine = InnoDB default charset = utf8;

 

The data is determined to be repeated by the following nine fields and grouped and sorted in the order of fields:
Online, source_type,

Columnid, source_name, category, operator_system, java_platform, screen_length, screen_width
Record with repeated conditions is determined by the field last_modify_time. The earliest record is retained and other duplicated data is deleted.

 

Select * from (select * From message21 order by last_modify_time) T group by online, source_type,

Columnid, source_name, category, operator_system, java_platform, screen_length, screen_width

 

Similarly, for sorting, in fact, the group has reached the sorting result.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.