Ambiguity of order by in SQL Server

Source: Internet
Author: User
Tags sorts

In today's article, I want to talk about the very controversial and complex topic of SQL Server: The ambiguity of the ORDER by clause.

View and Order by

Let's start with a very simple SELECT statement.

1 -- A very simple SELECT statement 2 SELECT *  from Person.person 3 ORDER  by LastName 4 GO

from the code just listed you can see that we only want to return records from the Person.person table sorted by LastName column. Because we want to be able to reuse that SQL statement as simple as possible, we finally put it in the view, as follows:

1 -- This doesn ' t work 2 CREATE VIEW v_persons 3  as 4     SELECT *  from Person.person 5     ORDER  by LastName 6 GO

But you will see that SQL Server cannot create that view and only returns an error message:

This error message tells you that when you do not use the Top,offset or FOR XML expression, you are not allowed to use the ORDER BY clause in the view. Based on that error message, we can easily fix the problem by adding the top percent clause to the view.

1 --Let's make it work!2 CREATE VIEWv_persons3  as4     SELECT TOP  - PERCENT *  fromPerson.person5     ORDER  byLastName6 GO

Now the view is created without any problems! We execute a SELECT statement on the view.

1 SELECT *  from v_persons 2 GO

The SELECT statement itself can be executed, but when you look at the returned data, something crazy happens: The returned data is not sorted by LastName column--sql Server is sorted by the clustered key column on the businessentityid--table!

Is this a bug in SQL Server? No, it's not--it's "intentional"! Let's explain why. First you need to know that the ORDER BY clause has 2 different contexts in SQL (the programming language itself):

    1. You can define the sort returned to your client program by using the order by clause
    2. An ORDER by clause is used to define which rows are returned from the top expression

The most important thing you must know is that you define the so-called set (set), inline function, derived table, subquery, and universal table expression (common table expressions (CTE)) with the view. A collection is a mathematical concept in which a relational database (for example, SQL Server) is composed of set theory . The collection itself is not sorted. So using a view definition with an ORDER by combination is not allowed-as you have just seen. If you try to do this, SQL Server does not allow you to do so and gives you an error message.

Of course you can use ORDERby in combination with the TOP expression. But basically you're fooling SQL Server and yourself, because order by doesn't tell SQL Server how to sort the data back to the client program. Let's say you use TOP ten PERCENT. What is the top 10% of the table? You need to define the sort in a certain way.

And because we have to use the top percent and the order by combination, the query optimizer does not actually introduce sort operators in the execution plan. Top percent means everything, so as you can see in the execution plan the top operator does not need to sort the input.

In this example, our return rows are sorted by reading from the underlying data structure. This is done by the storage engine of SQL Server to determine the sort of returned rows. Here we read the rows from the clustered index. So the data we get is sorted by BusinessEntityID, which is the aggregation key value in the index column.

Now we modify the view definition to return 10% rows from the Person.person table value. We have also specified an ORDER BY clause.

1 --Alter The View2 ALTER VIEWv_persons3  as4     SELECT TOP Ten PERCENT *  fromPerson.person5     ORDER  byLastName6 GO

When you look at the result set now, you will see that the rows returned are sorted by the LastName column. Right now, because you see the sort operator in the execution plan (not appearing in SQL Server 2014), because the top operator can finally return the first 10% of the data that provides the input row.

Of course you can return 10% rows to your client program by using the ORDER BY clause in a different sort of view that you refer to.

1 SELECT *  from v_persons 2 ORDER  by FirstName 3 GO

Now when you look at the execution plan, you'll see 2 of them in the plan (only 1 in SQL Server 2014).

The 1th (right) sort operator pre-sorts the top operator (returns to the top 10%). The 2nd (left) sort operator is used to last define the sort, which is returned to the client program. When you force order by--by adding a view defined by top percent, you are basically fooling SQL Server ...

top without ORDER by

Another problem is that the top expression without an ORDER BY clause does not provide you with deterministic results. We can use specific examples to illustrate the problem. Suppose you have the following SELECT statement:

1 SELECT TOP 1  from Person.person 2 GO

This SQL statement uses the TOP 1 the expression returns the first row of the Person.person table-no ordering is defined with an ORDER BY clause. This sort is based on the index selected in the execution plan. In this example, SQL Server returns you "Abbas" as the result, as this is the 1th available record in the execution plan for the query optimizer to select a nonclustered index.

So the 1th record returned from this query depends on the index selected in the execution plan. If we now deactivate the nonclustered index.

1 -- Let's deactivate this index 2 ALTER INDEX [ix_person_lastname_firstname_middlename]  on Person.person 3 DISABLE 4 GO

Then when you execute the SELECT statement again, SQL Server returns your Sánchez value, which means that only the 1th record of the clustered index is now selected in the execution plan. SQL Server returned a 1th row with a BusinessEntityID value of 1 from the clustered index.

So when you're dealing with non-deterministic records: Your results depend on the index chosen in the execution plan! You can easily achieve the clarity of the ordering of query results by adding an ORDER by clause. In this case, the order by clause makes the record OK for the top expression-so that you have the sort (TOP N sort) operator in the execution plan.

1 SELECT TOP 1  from Person.person 2 ORDER  by LastName 3 GO

in an execution plan, it is not important for SQL Server to read rows from which index--thesort (Top N sort) operator physically pre-sorts rows in the execution plan and returns nth rows from it--quite simply, isn't it?

Summary

The ORDER by clause in SQL (the programming language itself) is not the simplest concept. As you learned in this article,ORDER by uses 2 different contexts, so you always have to consider which context you want to use. Never add TOP PERCENT to your view definition to fool SQL Server and yourself-it won't be sorted in the final recordset.

Thanks for your attention!

Ambiguity of order by in SQL Server

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.