A series of thoughts on distinct questions and distinct questions

Source: Internet
Author: User

A series of thoughts on distinct questions and distinct questions

Someone raised this question and sorted it out for your reference.

Suppose there is a table like this:

The data here has the following features: In a dimension mentid, there may be multiple names, and vice versa. That is to say, the relationship between Name and inclumentid is many-to-many.

Now we want to implement this query: after sorting by the sorted mentid (step 1), we can get the non-repeated value of the Name column (step 2 ), and keep the relative order after step 1. In this exampleReturns three values in sequence.Yes:ACB

We will first think of the following statement:

select distinct name from Sample order by DepartmentId

In terms of semantics, this is natural. Unfortunately, this statement cannot be executed at all. The error message is:

This error indicates that if DISTINCT (DISTINCT value) is used, the field that appears after OrderBy must also appear after SELECT, however, if the distinct mentid appears after the SELECT statement, it is clear that there will be no repeated values, so the result is definitely incorrect.

select distinct name,DepartmentId from Sample order by DepartmentId

 

So, since DISINCT and OrderBy are used together, is it possible to make some changes, such as the following:

SELECT distinct a.NameFROM (select top 100 percent name from Sample order by DepartmentId) a

We use the subquery technology to compare the previous statements. In terms of semantics, it is still very intuitive and clear. I want to sort by dimension mentid first and then repeat the value. However, the returned result is as follows:

Although repeated values are indeed removed, the returned order is incorrect. We want to sort by repeated mentid first, then remove the repeated values, and retain the relative order after sorting.

Why is the above result? It is because DISTINCT itself performs sorting, and this behavior cannot be changed (this can be seen in the execution plan ). In fact, the Order by we made previously will be meaningless here. [In fact, if you observe a similar query generated by ORM tools such as ADO. NET Entity Framework, it will automatically discard the Order by setting]

In this case, is it impossible to meet the requirements? Although this requirement is rare, it is reasonable to sort DISTINCT as the last operation in most cases.

I have taken this into consideration. Since DISTINCT's behavior is built-in, can this operation be bypassed? In the end, I used A solution: Can I add A number to each Name? For example, if there are two A s, the first A is numbered 1, the second number is 2, and so on. Then, during the query, I first sorted and then filtered out the names numbered 1. In this way, the deduplication is realized.

SQL Server 2005 provides a ROW_NUMBER function. With this function, I implement the following query:

select a.Name from (select top 100 percentName,DepartmentId,ROW_NUMBER() over(partition by name order by departmentid) rowfrom Sample order by DepartmentId) awhere a.row=1order by a.DepartmentId

Then, I got the following result. I think it should meet the requirement mentioned earlier.

In comparison, the query efficiency will be lower, which is foreseeable (you can see some clues ). However, if the requirement is hard, it is not surprising to sacrifice some performance. Of course, we can study whether there are better writing methods. In any case, the built-in standard implementation is usually relatively fast.

The above is a series of ideas about the distinct question, hoping to help you learn.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.