Summary: MySQL searches for duplicate rows and retains a record of the duplicate data according to certain conditions:
In the first case, the condition for determining whether the record is retained is a primary key or a unique value
--
-- Table structure 'test'
--
Create Table if not exists 'test '(
'Id' int (11) not null auto_increment,
'Name' varchar (16) not null,
'Phone' int (11) not null,
Primary Key ('id ')
) Engine = MyISAM default charset = utf8;
--
-- Export the table data 'test'
--
Insert into 'test' ('id', 'name', 'phone') Values
(1, 'a, 1234 ),
(2, 'a, 3333 ),
(3, 'B', 555 ),
(4, 'B', 6773 ),
(5, 'A', 743 ),
(6, 'C', 95434 );
Query,
Select * From 'test' group by name
Get
ID name phone
1 A 1234
3 B 555
6 c 95434
But what should we do if we want the name with the largest ID?
Select max (ID), ID, name, phone from test group by name
Get
Max (ID) ID name phone
5 1 A 1234
4 3 B 555
6 6 c 95434
We can see that although the maximum ID of each name is obtained, other data is still the first line of each name.
Subquery
Select * from (select * from test order by id desc) T group by name
ID name phone
5 A 743
4 B 6773
6 c 95434
This is the expected result. The records appear to be sorted by name when the group is executed (the relationship between group and order needs to be determined ). However, in the case of a large number of rows, this method is equivalent to copying the entire table once, with a low efficiency. It takes 31144 s to test 58.437 pieces of data.
Use this seed to query select * from test t where ID in (select max (ID) from test group by name)
Get
ID name phone
4 B 6773
5 A 743
6 c 95434
However, this seed query uses in to test 31144 pieces of data, which is less efficient than 0.15 million s. Therefore, when the data volume is large, it is best not to use this query.
In the second case, the condition for determining the record to be retained is the repeated value.
--
-- Table structure 'message'
--
Create Table 'message '(
'Pid 'int (11) not null,
'Nums' int (11) default null,
'Source _ type' char (255) default null,
'Source _ id' int (11) default null,
'Source _ name' char (255) default null,
'Columnid' int (11) default null,
'Category' char (255) default null,
'Cp _ id' char (255) default null,
'Company' char (255) default null,
'Online 'char (255) default null,
'Last _ modify_time 'char (255) default null,
'Operator _ system' char (255) default null,
'Java _ platform' char (255) default null,
'Screen _ length' int (11) default null,
'Screen _ width' int (11) default null,
Primary Key ('pid ')
) Engine = InnoDB default charset = utf8;
The data is determined to be repeated by the following nine fields and grouped and sorted in the order of fields:
Online, source_type,
Columnid, source_name, category, operator_system, java_platform, screen_length, screen_width
Record with repeated conditions is determined by the field last_modify_time. The earliest record is retained and other duplicated data is deleted.
Select * from (select * From message21 order by last_modify_time) T group by online, source_type,
Columnid, source_name, category, operator_system, java_platform, screen_length, screen_width
Similarly, for sorting, in fact, the group has reached the sorting result.