SQL Server Tuning Series advanced (how the query optimizer runs)

Source: Internet
Author: User

Objective

In the previous articles, we introduced a series of basic descriptions of operators and the optimization methods and techniques for each operator. This includes a series of our common operators, including how to view execution plans, how several datasets are commonly used, how to join operators, and parallel operators. Interested children's shoes can be clicked to view.

This article describes how the query optimizer works in SQL Server, that is, how a good execution plan is formed, and how it is evaluated as an advanced chapter in the series.

Talk less, start the subject of this article.

Technical preparation

The database version is SQL SERVER2008R2 and is analyzed using a more concise case library (Northwind) from Microsoft.

Body Content

The preferred process to go through when we throw a T-SQL statement to SQL Server ready for execution is the compilation process, and if the statement was previously executed in SQL Server, it detects the existence of a compiled execution plan that has been cached for reuse.

However, the process of compiling the compilation requires a series of optimization processes, which are broadly divided into two phases:

1. First, SQL Server performs some simplification of the T-SQL statements we write, usually by querying itself to find interactivity and the order in which the operations are rescheduled.

In this process, SQL Server focuses on statement-writing adjustments, without too much consideration for cost or analysis of index availability, and the most important goal is to produce a valid query.

SQL Server then loads the metadata, including the index's statistics, into the second phase.

2, at this stage is SQL Server a complex optimization process, this phase of SQL Server will be based on the execution plan operators formed in the previous phase to evaluate and try, and even reorganize the execution plan, so the relative optimization process is a time-consuming process.

Use the following flowchart to understand the process:

This diagram looks a bit complicated, we have to analyze it in detail, is to divide this optimization phase into 3 sub-stages

<1> This phase only considers the serial plan, also says the single processor to run, if this phase finds a good serial plan, the optimizer will not enter the next stage. Therefore, for the case of low data volume, or simple execution of the statement, the basic use of the serial plan.

Of course, if the cost of this phase is relatively large, then it will go to the 2nd stage and then optimize it.

<2> This phase first optimizes the 1th phase of the serial plan and then, if the environment supports parallelization, Parallelize, compares, and then optimizes the cost if the lower is the output execution plan, and if the cost is relatively high, go to the 2nd stage, and then continue to optimize.

<3> actually arrives at this stage is the final phase of optimization, this phase will be the 2nd phase of the serial and parallel comparison results for the final step optimization, if the serial execution is good then further optimization, of course, if the parallel execution is good, then continue to parallel optimization.

In fact, the 3rd stage is the frustration of the query optimizer, when reaching the 3rd stage is a remediation phase, only the final optimization, the optimization is not good can only follow the execution plan.

So what are the principles of optimization at each stage in the above process:

The most important principle of these optimizer is: as far as possible to reduce the scanning range, whether it is a table or index, of course, the index is better than the table, the amount of the index is less the better, the ideal case is only one or a few.

Therefore, SQL Server also respects the above principles, has been around this principle to optimize.

I. Screening conditions analysis

The so-called filter condition is actually the condition behind the where statement in the T-SQL statement that we write, and we use the statements in this to minimize the scope of the data scan, which SQL Server optimizes by using these statements.

The general format is as follows:

Column operator <constant or variable>

Or

<constant or variable> operator column

Operator in this format include: =, >, <, = =, <=, between, like

For example: Name= ' Liudehua ', price>4000, 4000<price, name like ' liu% ', name= ' Liudehua ' and price >1000

The above statements are the most commonly used in the statements we write, and this method will be used by SQL Server to reduce the scan, and these columns are indexed, that will try to take the index to get the value, but SQL Server is not omnipotent, and some of the wording it is not recognized, It's also something we want to avoid by writing statements:

A, where name like '%liu ' can not be recognized by the SQL Server optimizer, so it can only be performed by a full table scan or an index scan.

B, name= ' Liudehua ' OR price >1000, this is also invalid, because it can not use two filter conditions to gradually reduce the scanning.

C, price+4>100 This same is not recognized

D, name not in (' Liudehua ', ' Zhourunfa '), and of course, similar: not, not as

As an example:

Select CustomerID from orderswhere CustomerID='vinet'SELECT CustomerID from Orderswhere UPPER (CustomerID)='vinet'

So the above-mentioned way to write the sentence should try to avoid, or take a flexible way to achieve.

Second, index optimization

After the determination of the filter range above, SQL Server immediately starts the selection of the index, the first thing to determine is whether the filter field has an index entry, that is, whether the index is overwritten.

Of course, if the query item is the best to cover the index, if it is not overwritten by the index, then in order to take full advantage of the characteristics of the index, the bookmark lookup (bookmark) section is introduced.

So, in view of this, when we create the index, the reference property value is the column of the filter condition.

About the choice of using index optimization:

CREATE INDEX employeesname on Employees (firstname,lastname) INCLUDE (HireDate) with (ONLINE= on) goselect Firstname,lastname,hiredate,employeeid from employeeswhere FirstName='Anne'

Of course not. As long as the query column has an index overlay to perform an index lookup, depending on how much content is scanned, the extent of the index is also dependent on how much content is fetched

To give an example:

CREATE INDEX nameindex   *'K%'*'Y%'GO

Exactly the same query statement to look at the execution plan:

Exactly the same query statement, the resulting query plan is completely different, one is the index scan, and the other is an efficient index lookup.

Here I only tell you: FirstName like ' K% ' has 1255 lines, whereas FirstName like ' Y% ' has only 37 lines, of which

In fact, the reason for this is that the statistical information is in mischief.

Therefore, a specific T-SQL statement does not necessarily generate a specific query plan, the same particular query plan is not necessarily the optimal way, the impact of its many factors: about the index, about the hardware, about the contents of the table, about the statistical information and many other factors.

About the statistical information this piece is the large content, we put in the later space introduction, has the interest can advance the attention.

If you have any questions, you can leave a message or private messages, and look forward to an in-depth study of your child's shoes with SQL Server. Learn together and progress together.

At the end of the article gives the previous several connections, the following content basically covers our daily write query operation decomposition, it seems necessary to sort out a directory ....

SQL Server Tuning Series Basics

SQL Server Tuning Series Basics (Summary of common operators)

SQL Server Tuning Series Basics (Union operator summary)

SQL Server Tuning Series basics (Parallel operations Summary)

SQL Server Tuning Series basics (Parallel operations Summary chapter II)

SQL Server Tuning Series Basics (Index operations Summary)

SQL Server Tuning Series Basics (subquery operations Summary)

If you read this blog, feel that you have something to gain, please do not skimp on your " recommendation ".

SQL Server Tuning Series advanced (how the query optimizer runs)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.