How to filter duplicate records in mysql using distinct

Source: Internet
Author: User
Tags mysql manual
This article mainly describes how to use distinct in mysql to filter restart records. For more information, see.

This article mainly describes how to use distinct in mysql to filter restart records. For more information, see.

Below we will use a few simple Query examples to demonstrate the implementation of DISTINCT.

1. First, let's take a look at how to complete the DISTINCT operation through a loose index scan:

The Code is as follows:
Sky @ localhost: example 11:03:41> explain select distinct group_id
-> FROM group_messageG
* *************************** 1. row ***************************
Id: 1
SELECT_type: SIMPLE
Table: group_message
Type: range
Possible_keys: NULL
Key: idx_gid_uid_gc
Key_len: 4
Ref: NULL
Rows: 10
Extra: Using index for group-
1 row in set (0.00 sec)

We can clearly see that the Extra information in the execution plan is "Using index for group-by". What does this mean? When I did not perform the GROUP BY operation, the execution plan will tell me that the GROUP BY operation is performed through the index? In fact, this is related to the implementation principle of DISTINCT. During the implementation of DISTINCT, it is also necessary to group and then retrieve one from each group of data and return it to the client. The Extra information here tells us that MySQL has completed the entire operation by using loose index scanning. Of course, if MySQL Query Optimizer can replace the information here with "Using index for distinct", it will be easier to understand.

2. Let's take a look at the compact index scan example:

The Code is as follows:
Sky @ localhost: example 11:03:53> explain select distinct user_id
-> FROM group_message
-> WHERE group_id = 2G
* *************************** 1. row ***************************
Id: 1
SELECT_type: SIMPLE
Table: group_message
Type: ref
Possible_keys: idx_gid_uid_gc
Key: idx_gid_uid_gc
Key_len: 4
Ref: const
Rows: 4
Extra: Using WHERE; Using index
1 row in set (0.00 sec)

The display here is exactly the same as implementing group by through compact index scanning. In fact, during the implementation of this Query, MySQL will let the storage engine scan all the index keys of group_id = 2 to get all user_id, and then use the sorted feature of the index, each time you replace the index key value of user_id, you can complete the entire DISTINCT operation after scanning all the index keys of gruop_id = 2.

3. Let's look at how DISTINCT can be completed without using an index separately:

The Code is as follows:
Sky @ localhost: example 11:04:40> explain select distinct user_id
-> FROM group_message
-> WHERE group_id> 1 AND group_id <10G
* *************************** 1. row ***************************
Id: 1
SELECT_type: SIMPLE
Table: group_message
Type: range
Possible_keys: idx_gid_uid_gc
Key: idx_gid_uid_gc
Key_len: 4
Ref: NULL
Rows: 32
Extra: Using WHERE; Using index; Using temporary
1 row in set (0.00 sec)

When MySQL cannot perform the DISTINCT operation simply by relying on indexes, it has to use a temporary table for corresponding operations. However, we can see that when MySQL uses a temporary table to complete DISTINCT, there is a difference from processing group by, that is, filesort is missing. In fact, in the MySQL grouping algorithm, grouping operations are not necessarily performed BY sorting. I have mentioned this in the GROUP BY optimization tips above. In fact, MySQL implements the DISTINCT operation after grouping without sorting, so the filesort sorting operation is missing.

4. Try again with group:

The Code is as follows:
Sky @ localhost: example 11:05:06> explain select distinct max (user_id)
-> FROM group_message
-> WHERE group_id> 1 AND group_id <10
-> Group by group_idG
* *************************** 1. row ***************************
Id: 1
SELECT_type: SIMPLE
Table: group_message
Type: range
Possible_keys: idx_gid_uid_gc
Key: idx_gid_uid_gc
Key_len: 4
Ref: NULL
Rows: 32
Extra: Using WHERE; Using index; Using temporary; Using filesort
1 row in set (0.00 sec)

Finally, let's take a look at the example of using an aggregate function with group by. Compared with the third example above, we can see that there are more filesort sorting operations, it is because we use the MAX function. To obtain the MAX value after the group, you cannot use the index to complete the operation. You can only sort the operation.

When using mysql, you sometimes need to query records with unique fields. Although mysql provides the keyword distinct to filter out redundant duplicate records, only one record is retained, but it is often used to return the number of records that do not repeat, instead of returning all values that do not record the record. The reason is that distinct can only return its target field, but cannot return other fields. This problem has plagued me for a long time.


Let's take a look at the example below:

The Code is as follows:


Table
Id name
1
2 B
3 c
4 c
5 B

The library structure is like this. This is just a simple example, and the actual situation will be much more complicated.

For example, if you want to use a statement to query all data with no duplicate names, you must use distinct to remove redundant duplicate records.


The Code is as follows:
Select distinct name from table


The result is:

The Code is as follows:
Name
A
B
C

It seems that the effect has been achieved, but what I want to get is the id value? Modify the query statement:

The Code is as follows:
Select distinct name, id from table


The result is:

The Code is as follows:


Id name
1
2 B
3 c
4 c
5 B

How does distinct not work? It works, but it also serves two fields, that is, it must have the same id and name to be excluded. Then we can modify the query statement:

The Code is as follows:
Select id, distinct name from table


Unfortunately, you cannot get anything except the error message. distinct must be placed at the beginning, so it is difficult to put distinct in the where condition? Yes. An error is reported as a result .......

I tried it for a long time, and I couldn't do it. I finally found a usage in the mysql manual. I realized what I needed with group_concat (distinct name) and group by name, try it now


Error ............ Depressed ....... I can't even go through the mysql manual. I gave me hope first, and then pushed me to disappointment ....

Check again. The group_concat function is supported by 4.1, dizzy. I have 4.0. No way. Upgrade. The upgrade is successful ......


Finally, the customer had to be asked to upgrade.

Suddenly, the ghost machine flashed. Since the group_concat function can be used, can other functions be used?

Use the count function to try it out. I am a success ....... It takes so much time to cry ........ It turns out to be so simple ......

Now release the complete statement:

The Code is as follows:

Select id, name, count (distinct name) from table group by name

Result:

The Code is as follows:


Id name count (distinct name)
1 a 1
2 B 1
3 c 1


The last item is redundant, so you don't have to worry about it. The goal is achieved .....

Oh, yes. by the way, group by must be placed before order by and limit. Otherwise, an error will be reported. I am busy ......

Original

I transferred this article from someone else and encountered this problem in my project. My SQL statements are written as follows:

The Code is as follows:

SELECT attention_join.memberID, nickName, headpic, attention_join.time

FROM attention_join

JOIN member ON attention_join.memberID = member. memberID

JOIN member_meta ON member. memberID = member_meta.memberID

Group by attention_join.memberID

Order by attention_join.time DESC

It means 'checking the members in the group in descending order of the time of joining/Following the group', but the count () keyword mentioned above is not used in the statement, this makes me puzzled. mysql has not learned in detail. Its group by keyword usage seems to be very different from that of sqlserver. It's time to wait. Check it out, no time now

Oh, by the way, my mysql version is:

Server version: 5.1.54-1 ubuntu4

Protocol Version: 10

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.