How to filter duplicate records in mysql using distinct

Last Update:2018-07-12 Source: Internet

Author: User

Tags mysql manual

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article mainly describes how to use distinct in mysql to filter restart records. For more information, see.

Below we will use a few simple Query examples to demonstrate the implementation of DISTINCT.

1. First, let's take a look at how to complete the DISTINCT operation through a loose index scan:

The Code is as follows:
Sky @ localhost: example 11:03:41> explain select distinct group_id -> FROM group_messageG * ************************* 1. row ************************* Id: 1 SELECT_type: SIMPLE Table: group_message Type: range Possible_keys: NULL Key: idx_gid_uid_gc Key_len: 4 Ref: NULL Rows: 10 Extra: Using index for group- 1 row in set (0.00 sec)

We can clearly see that the Extra information in the execution plan is "Using index for group-by". What does this mean? When I did not perform the GROUP BY operation, the execution plan will tell me that the GROUP BY operation is performed through the index? In fact, this is related to the implementation principle of DISTINCT. During the implementation of DISTINCT, it is also necessary to group and then retrieve one from each group of data and return it to the client. The Extra information here tells us that MySQL has completed the entire operation by using loose index scanning. Of course, if MySQL Query Optimizer can replace the information here with "Using index for distinct", it will be easier to understand.

2. Let's take a look at the compact index scan example:

The Code is as follows:
Sky @ localhost: example 11:03:53> explain select distinct user_id -> FROM group_message -> WHERE group_id = 2G * ************************* 1. row ************************* Id: 1 SELECT_type: SIMPLE Table: group_message Type: ref Possible_keys: idx_gid_uid_gc Key: idx_gid_uid_gc Key_len: 4 Ref: const Rows: 4 Extra: Using WHERE; Using index 1 row in set (0.00 sec)

The display here is exactly the same as implementing group by through compact index scanning. In fact, during the implementation of this Query, MySQL will let the storage engine scan all the index keys of group_id = 2 to get all user_id, and then use the sorted feature of the index, each time you replace the index key value of user_id, you can complete the entire DISTINCT operation after scanning all the index keys of gruop_id = 2.

3. Let's look at how DISTINCT can be completed without using an index separately:

The Code is as follows:

Sky @ localhost: example 11:04:40> explain select distinct user_id
-> FROM group_message
-> WHERE group_id> 1 AND group_id <10G
* *************************** 1. row ***************************
Id: 1
SELECT_type: SIMPLE
Table: group_message
Type: range
Possible_keys: idx_gid_uid_gc
Key: idx_gid_uid_gc
Key_len: 4
Ref: NULL
Rows: 32
Extra: Using WHERE; Using index; Using temporary
1 row in set (0.00 sec)

When MySQL cannot perform the DISTINCT operation simply by relying on indexes, it has to use a temporary table for corresponding operations. However, we can see that when MySQL uses a temporary table to complete DISTINCT, there is a difference from processing group by, that is, filesort is missing. In fact, in the MySQL grouping algorithm, grouping operations are not necessarily performed BY sorting. I have mentioned this in the GROUP BY optimization tips above. In fact, MySQL implements the DISTINCT operation after grouping without sorting, so the filesort sorting operation is missing.

4. Try again with group:

The Code is as follows:

Sky @ localhost: example 11:05:06> explain select distinct max (user_id)
-> FROM group_message
-> WHERE group_id> 1 AND group_id <10
-> Group by group_idG
* *************************** 1. row ***************************
Id: 1
SELECT_type: SIMPLE
Table: group_message
Type: range
Possible_keys: idx_gid_uid_gc
Key: idx_gid_uid_gc
Key_len: 4
Ref: NULL
Rows: 32
Extra: Using WHERE; Using index; Using temporary; Using filesort
1 row in set (0.00 sec)

Finally, let's take a look at the example of using an aggregate function with group by. Compared with the third example above, we can see that there are more filesort sorting operations, it is because we use the MAX function. To obtain the MAX value after the group, you cannot use the index to complete the operation. You can only sort the operation.

When using mysql, you sometimes need to query records with unique fields. Although mysql provides the keyword distinct to filter out redundant duplicate records, only one record is retained, but it is often used to return the number of records that do not repeat, instead of returning all values that do not record the record. The reason is that distinct can only return its target field, but cannot return other fields. This problem has plagued me for a long time.

Let's take a look at the example below:

The Code is as follows:
Table Id name 1 2 B 3 c 4 c 5 B

The library structure is like this. This is just a simple example, and the actual situation will be much more complicated.

For example, if you want to use a statement to query all data with no duplicate names, you must use distinct to remove redundant duplicate records.

The Code is as follows:
Select distinct name from table

The result is:

The Code is as follows:
Name A B C

It seems that the effect has been achieved, but what I want to get is the id value? Modify the query statement:

The Code is as follows:
Select distinct name, id from table

The result is:

The Code is as follows:
Id name 1 2 B 3 c 4 c 5 B

How does distinct not work? It works, but it also serves two fields, that is, it must have the same id and name to be excluded. Then we can modify the query statement:

The Code is as follows:
Select id, distinct name from table

Unfortunately, you cannot get anything except the error message. distinct must be placed at the beginning, so it is difficult to put distinct in the where condition? Yes. An error is reported as a result .......

I tried it for a long time, and I couldn't do it. I finally found a usage in the mysql manual. I realized what I needed with group_concat (distinct name) and group by name, try it now

Error ............ Depressed ....... I can't even go through the mysql manual. I gave me hope first, and then pushed me to disappointment ....

Check again. The group_concat function is supported by 4.1, dizzy. I have 4.0. No way. Upgrade. The upgrade is successful ......

Finally, the customer had to be asked to upgrade.

Suddenly, the ghost machine flashed. Since the group_concat function can be used, can other functions be used?

Use the count function to try it out. I am a success ....... It takes so much time to cry ........ It turns out to be so simple ......

Now release the complete statement:

The Code is as follows:
Select id, name, count (distinct name) from table group by name

Result:

The Code is as follows:
Id name count (distinct name) 1 a 1 2 B 1 3 c 1

The last item is redundant, so you don't have to worry about it. The goal is achieved .....

Oh, yes. by the way, group by must be placed before order by and limit. Otherwise, an error will be reported. I am busy ......

Original

I transferred this article from someone else and encountered this problem in my project. My SQL statements are written as follows:

The Code is as follows:

SELECT attention_join.memberID, nickName, headpic, attention_join.time

FROM attention_join

JOIN member ON attention_join.memberID = member. memberID

JOIN member_meta ON member. memberID = member_meta.memberID

Group by attention_join.memberID

Order by attention_join.time DESC

It means 'checking the members in the group in descending order of the time of joining/Following the group', but the count () keyword mentioned above is not used in the statement, this makes me puzzled. mysql has not learned in detail. Its group by keyword usage seems to be very different from that of sqlserver. It's time to wait. Check it out, no time now

Oh, by the way, my mysql version is:

Server version: 5.1.54-1 ubuntu4

Protocol Version: 10

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More