In today's article, I want to talk about the very controversial and complex topic of SQL Server: The ambiguity of the ORDER by clause.
View and Order by
Let's start with a very simple SELECT statement.
1 -- A very simple SELECT statement 2 SELECT * from Person.person 3 ORDER by LastName 4 GO
from the code just listed you can see that we only want to return records from the Person.person table sorted by LastName column. Because we want to be able to reuse that SQL statement as simple as possible, we finally put it in the view, as follows:
1 -- This doesn ' t work 2 CREATE VIEW v_persons 3 as 4 SELECT * from Person.person 5 ORDER by LastName 6 GO
But you will see that SQL Server cannot create that view and only returns an error message:
This error message tells you that when you do not use the Top,offset or FOR XML expression, you are not allowed to use the ORDER BY clause in the view. Based on that error message, we can easily fix the problem by adding the top percent clause to the view.
1 --Let's make it work!2 CREATE VIEWv_persons3 as4 SELECT TOP - PERCENT * fromPerson.person5 ORDER byLastName6 GO
Now the view is created without any problems! We execute a SELECT statement on the view.
1 SELECT * from v_persons 2 GO
The SELECT statement itself can be executed, but when you look at the returned data, something crazy happens: The returned data is not sorted by LastName column--sql Server is sorted by the clustered key column on the businessentityid--table!
Is this a bug in SQL Server? No, it's not--it's "intentional"! Let's explain why. First you need to know that the ORDER BY clause has 2 different contexts in SQL (the programming language itself):
- You can define the sort returned to your client program by using the order by clause
- An ORDER by clause is used to define which rows are returned from the top expression
The most important thing you must know is that you define the so-called set (set), inline function, derived table, subquery, and universal table expression (common table expressions (CTE)) with the view. A collection is a mathematical concept in which a relational database (for example, SQL Server) is composed of set theory . The collection itself is not sorted. So using a view definition with an ORDER by combination is not allowed-as you have just seen. If you try to do this, SQL Server does not allow you to do so and gives you an error message.
Of course you can use ORDERby in combination with the TOP expression. But basically you're fooling SQL Server and yourself, because order by doesn't tell SQL Server how to sort the data back to the client program. Let's say you use TOP ten PERCENT. What is the top 10% of the table? You need to define the sort in a certain way.
And because we have to use the top percent and the order by combination, the query optimizer does not actually introduce sort operators in the execution plan. Top percent means everything, so as you can see in the execution plan the top operator does not need to sort the input.
In this example, our return rows are sorted by reading from the underlying data structure. This is done by the storage engine of SQL Server to determine the sort of returned rows. Here we read the rows from the clustered index. So the data we get is sorted by BusinessEntityID, which is the aggregation key value in the index column.
Now we modify the view definition to return 10% rows from the Person.person table value. We have also specified an ORDER BY clause.
1 --Alter The View2 ALTER VIEWv_persons3 as4 SELECT TOP Ten PERCENT * fromPerson.person5 ORDER byLastName6 GO
When you look at the result set now, you will see that the rows returned are sorted by the LastName column. Right now, because you see the sort operator in the execution plan (not appearing in SQL Server 2014), because the top operator can finally return the first 10% of the data that provides the input row.
Of course you can return 10% rows to your client program by using the ORDER BY clause in a different sort of view that you refer to.
1 SELECT * from v_persons 2 ORDER by FirstName 3 GO
Now when you look at the execution plan, you'll see 2 of them in the plan (only 1 in SQL Server 2014).
The 1th (right) sort operator pre-sorts the top operator (returns to the top 10%). The 2nd (left) sort operator is used to last define the sort, which is returned to the client program. When you force order by--by adding a view defined by top percent, you are basically fooling SQL Server ...
top without ORDER by
Another problem is that the top expression without an ORDER BY clause does not provide you with deterministic results. We can use specific examples to illustrate the problem. Suppose you have the following SELECT statement:
1 SELECT TOP 1 from Person.person 2 GO
This SQL statement uses the TOP 1 the expression returns the first row of the Person.person table-no ordering is defined with an ORDER BY clause. This sort is based on the index selected in the execution plan. In this example, SQL Server returns you "Abbas" as the result, as this is the 1th available record in the execution plan for the query optimizer to select a nonclustered index.
So the 1th record returned from this query depends on the index selected in the execution plan. If we now deactivate the nonclustered index.
1 -- Let's deactivate this index 2 ALTER INDEX [ix_person_lastname_firstname_middlename] on Person.person 3 DISABLE 4 GO
Then when you execute the SELECT statement again, SQL Server returns your Sánchez value, which means that only the 1th record of the clustered index is now selected in the execution plan. SQL Server returned a 1th row with a BusinessEntityID value of 1 from the clustered index.
So when you're dealing with non-deterministic records: Your results depend on the index chosen in the execution plan! You can easily achieve the clarity of the ordering of query results by adding an ORDER by clause. In this case, the order by clause makes the record OK for the top expression-so that you have the sort (TOP N sort) operator in the execution plan.
1 SELECT TOP 1 from Person.person 2 ORDER by LastName 3 GO
in an execution plan, it is not important for SQL Server to read rows from which index--thesort (Top N sort) operator physically pre-sorts rows in the execution plan and returns nth rows from it--quite simply, isn't it?
Summary
The ORDER by clause in SQL (the programming language itself) is not the simplest concept. As you learned in this article,ORDER by uses 2 different contexts, so you always have to consider which context you want to use. Never add TOP PERCENT to your view definition to fool SQL Server and yourself-it won't be sorted in the final recordset.
Thanks for your attention!
Ambiguity of order by in SQL Server