A series of thoughts on distinct questions and distinct questions
Someone raised this question and sorted it out for your reference.
Suppose there is a table like this:
The data here has the following features: In a dimension mentid, there may be multiple names, and vice versa. That is to say, the relationship between Name and inclumentid is many-to-many.
Now we want to implement this query: after sorting by the sorted mentid (step 1), we can get the non-repeated value of the Name column (step 2 ), and keep the relative order after step 1. In this exampleReturns three values in sequence.Yes:ACB
We will first think of the following statement:
select distinct name from Sample order by DepartmentId
In terms of semantics, this is natural. Unfortunately, this statement cannot be executed at all. The error message is:
This error indicates that if DISTINCT (DISTINCT value) is used, the field that appears after OrderBy must also appear after SELECT, however, if the distinct mentid appears after the SELECT statement, it is clear that there will be no repeated values, so the result is definitely incorrect.
select distinct name,DepartmentId from Sample order by DepartmentId
So, since DISINCT and OrderBy are used together, is it possible to make some changes, such as the following:
SELECT distinct a.NameFROM (select top 100 percent name from Sample order by DepartmentId) a
We use the subquery technology to compare the previous statements. In terms of semantics, it is still very intuitive and clear. I want to sort by dimension mentid first and then repeat the value. However, the returned result is as follows:
Although repeated values are indeed removed, the returned order is incorrect. We want to sort by repeated mentid first, then remove the repeated values, and retain the relative order after sorting.
Why is the above result? It is because DISTINCT itself performs sorting, and this behavior cannot be changed (this can be seen in the execution plan ). In fact, the Order by we made previously will be meaningless here. [In fact, if you observe a similar query generated by ORM tools such as ADO. NET Entity Framework, it will automatically discard the Order by setting]
In this case, is it impossible to meet the requirements? Although this requirement is rare, it is reasonable to sort DISTINCT as the last operation in most cases.
I have taken this into consideration. Since DISTINCT's behavior is built-in, can this operation be bypassed? In the end, I used A solution: Can I add A number to each Name? For example, if there are two A s, the first A is numbered 1, the second number is 2, and so on. Then, during the query, I first sorted and then filtered out the names numbered 1. In this way, the deduplication is realized.
SQL Server 2005 provides a ROW_NUMBER function. With this function, I implement the following query:
select a.Name from (select top 100 percentName,DepartmentId,ROW_NUMBER() over(partition by name order by departmentid) rowfrom Sample order by DepartmentId) awhere a.row=1order by a.DepartmentId
Then, I got the following result. I think it should meet the requirement mentioned earlier.
In comparison, the query efficiency will be lower, which is foreseeable (you can see some clues ). However, if the requirement is hard, it is not surprising to sacrifice some performance. Of course, we can study whether there are better writing methods. In any case, the built-in standard implementation is usually relatively fast.
The above is a series of ideas about the distinct question, hoping to help you learn.