SQL Server Tuning series play Turn II (how to run with the aggregation joint hint (Hint) bootstrap statement)

Source: Internet
Author: User
Tags logical operators

Original: SQL Server Tuning series play Turn II (how to run with the aggregation joint hint (Hint) bootstrap statement)

Objective

In the previous article, we analyzed the usage of query hint as the first of the last play module of tuning series. Interested can click to view: SQL Server Tuning Series play turn (how to run with query hint (Hint) bootstrap statement)

This article continues to play the contents of the module, the same, or want to grasp the previous series of content, before entering the content of this module analysis.

Gossip, enter the content of this article.

Technical preparation

The database version is SQL Server2012, which is analyzed using Microsoft's previous case library (Northwind), and part of the content is also applied to Microsoft's other case library AdventureWorks.

Trust the Friends of SQL Server, both libraries are not too unfamiliar.

Misunderstanding correction

Before starting the topic of this article, correct some of the pitfalls of novice tuning the database. Also hope that in the daily application to solve problems, remember hearsay, conform, Grandpa Mao said: practice is the only standard to test the truth.

Two myths:

1. When a table scan is found in the query plan, as if the root cause was found, as many people have said this way is bad performance, and the way to get rid of it is the index, and think that the index will be much faster.

2, SQL Server statement optimization is to create an index, and the creation of an index is simpler, look for a query statement, look at the Where condition ... There are several filter criteria that can be created by creating several nonclustered indexes.

look at the first question: is the table scan really as bad as the legendary performance?

First, we know the superiority of comparison query statement performance, there are just a few key indicators: Run IO, run time, consumption: CPU, Memory, compile time and so on.

Look, the following statement, the exact same table structure, table data, the difference is that a table is a heap table, one is a table with a clustered index, we can open two reply to test comparison.

Heap Table Query:

SET STATISTICS IOon--* to neworders from Orders--* from Newordersset STATISTICS IO OFF

A test table query that has a clustered index:

SET STATISTICS IOon--* to newclorders from Orders-- Add clustered index create CLUSTERED index Cl_ OrderID on newclorders (OrderID desc) GO–* from newclordersset STATISTICS IO OFF

The reason for this comparison is that many people think that the way to optimize a database is by adding an index, and that finding (seek) is better than scanning (scan).

This makes sure that the heap table is scanned for tables (table Scan), while the latter is scanned by a clustered index (index Scan)

First look at the comparison between IO:

Let's take a look at the IO information for the heap table

Heap table performance: Logical read 20 times, pre-read 2 times, where the number of pre-read is actually an important indicator of performance, because it is read directly from the disk, so the performance is the worst, of course, SQL Server is used in parallel processing, and the first read number is cached in memory, to prevent the disk interaction again.

Then look at the IO information of the clustered index table

A table with a clustered index, with a logical IO of 23 times, and read-ahead soared to 22 times.

In comparison:

The same query statement, the heap table query logical reads 20 times, pre-read 2 times, no physical read ..., and the table with the clustered index logic read 23 times, read 22 times! There are also statistical inaccuracies that result in physical reads 1 times! .... So it's a lot better than a stack table scan.

Of course, in order to further analysis, in fact, the reason for the difference is simple: The table with the clustered index because of its stored structure (b-tree), so the logical IO is definitely 3 more, because from the index root node to the leaf node, that is, the need to go through three index pages to get to the data page.

The same reason why the gap in pre-reading is so large: getting data from a heap table is a contiguous page of data (literally reading 64 pages <512KB>) at a time, and all indexed tables can't! Indexing can only read one data page (8KB) sequentially, which is also the limitation of the index.

On the query plan logical reading, pre-reading, physical reading and other IO detailed logic information, you can refer to my previous article, analysis is very detailed: SQL Server Tuning Series Advanced (query statement running several indicator value monitoring)

Then we compare the execution time, and believe that this is also more concerned about:

See, get exactly the same amount of data, heap table execution time takes 157 milliseconds, and the analysis and compilation is not time-consuming, this is very simple, because it is a heap table, do not need to optimize and select based on statistics, and the table with the clustered index is different, T-SQL statements need to be optimized and compiled based on the statistics of the index, which takes up to 79 milliseconds, and then requires more read-ahead IO when executing, and additional physical IO if the optimizer is not optimized, so it takes 298 milliseconds ...

In my test table, there are only more than 800 rows of data that produce such a gap value. If the amount of data is much ... The performance is worrying ....

About CPU and memory values I can't ... Above we analyzed the addition of a clustered index, resulting in a series of query optimizer process ... And compiling is the need of CPU resources .....

From the above results, I would like to express:

First of all, in the query plan we've seen, don't look at the table scan and feel that the operator is slow or time-consuming. What's more, it feels like the problem is on top of it, and for a lot of people who are spurned as "the Evil Watch Scan" ....

Second, keep in mind that any of the operators you see in SQL Server are basically the best in the environment you've set up today .... Not even that operator good with that operator rotten one said ... such as paranoid think hash connection is faster than nested loop ... Index lookups are better than index scans ..... What we're going to do is to use the right scenario, the best way to comply with SQL Server performance.

Again, after the above analysis of the problem, do not fall into another extreme misunderstanding: Table scan is better than the clustered index scan! In subsequent articles I will show you the usefulness of a clustered index that is better than a table scan ... In the world of SQL Server, only the essence of your true touch will not be confused ... To see that everything so-called dogma tuning is not absolute!

The misunderstanding of the second problem, in fact, is a lot of people's misunderstanding, mistaken for the strength of the nonclustered index, mistakenly think that the column in addition to the index can be fully applied. This article will not correct, you can refer to my previous article, I believe that after reading the basic also understand the pros and cons of nonclustered indexes, connect: SQL Server Tuning series advanced (How to index tuning)

First, GROUP tips (Hints)

To continue the content of this article, the last one we analyzed the query of several important hints, this article we look at the group hint, group query is also we write T-SQL statements are often used, about the grouping operator also has two: Order group and Hash group. In fact, the ranking is always the most headache in the database operation. This operator is also a large consumption of memory, if the amount of data is large, the way SQL server processing is also through the hashing algorithm optimization.

Of course, about grouping query operator decomposition, see the article in the basic article: SQL Server Tuning Series Basics (Common operators summary)

Let's look at an example:

SELECT Customerid,max (OrderDate) from Ordersgroup by CustomerID

Above the query statement, we want to get the maximum order date for each order.

With the query plan we can extrapolate that there must be an index in the CustomerID column, so that SQL Serer can sort it directly, but even so the cost is soaring to 56% .... The maximum order date is then calculated by adding a stream aggregation.

Of course, this approach is also an optimal way for SQL Server to think, but if there is a lot of data, this approach will cause serious memory consumption.

So, we can use group hint to prompt and change it to hash group. The code is as follows:

SELECT Customerid,max (OrderDate) from Ordersgroup by CustomerIDOPTION (HASH GROUP)

Of course, this may not be the best way to demonstrate, but if the amount of underlying data increases, I also believe that SQL Server will automatically change to a hash match.

Second , combination tips (Hints)

In most cases, the T-SQL statements We write are not simple, there are a lot of nested queries, and if this query statement, our hint (Hints) may not be single.

Let's see how this can be guided in such a way. Let's look at a simple example:

SELECT o.orderidfrom Customers C JOIN Orders ojoin Employees EON o.employeeid=E.employeeidon c.customerid
    =o.customeridwhere c.city=n'London' and e.city=n'London ' OPTION (Force Order,hash JOIN)

Not only that, we can also write query plans manually for query statements.

That is, we write our own XML query plan, let T-SQL statements in accordance with our custom query plan to proceed, of course, this is a big move, we stay in the last use.

Reference documents

    • Microsoft Books Online logical operators and physical operator references
    • Refer to the book SQL. server.2005. Technical Insider "series

Conclusion

This article first to this bar, about the SQL Server Tuning tool hint the use of a lot of content, followed by the introduction, the interest of children's shoes can be noticed in advance.

If you have any questions, you can leave a message or private messages, and look forward to an in-depth study of your child's shoes with SQL Server. Learn together and progress together.

At the end of the article gives the previous several connections, the following content basically covers our daily write query operations decomposition and tuning content items, are original ...

The first basic module focuses on basic content mastery, a total of 7 articles completed, covering a series of basic algorithms, detailed analysis of how to view the execution plan, master Execution Plan optimization points, and a list of the daily we usually write T-SQL statements will apply the operator:

SQL Server Tuning Series Basics

SQL Server Tuning Series Basics (Summary of common operators)

SQL Server Tuning Series Basics (Union operator summary)

SQL Server Tuning Series basics (Parallel operations Summary)

SQL Server Tuning Series basics (Parallel operations Summary chapter II)

SQL Server Tuning Series Basics (Index operations Summary)

SQL Server Tuning Series Basics (subquery operations Summary)

The second advanced module focuses on SQL Server execution of T-SQL statements in the case of some insider parsing, a total of 5 articles, including: How to run the query optimizer, the operation of several optimization indicator value detection, statistical information, the use of the index and a series of content. This piece of content lets us understand how SQL Server optimizes and runs the T-SQL statements we write.

SQL Server Tuning Series advanced (how the query optimizer runs)

SQL Server Tuning Series Advanced (query statements run several indicator value monitoring)

SQL Server Tuning series advanced (in-depth profiling statistics)

SQL Server Tuning series advanced (How to index tuning)

SQL Server Tuning series advanced (how to maintain database indexes)

The third Topsy module focuses on specific issues followed by specific tips (Hints), based on the analysis of the first two modules.

SQL Server Tuning Series Play the chapter (how to run with query hint (Hint) bootstrap statement)

If you read this blog, feel that you have something to gain, please do not skimp on your " recommendation ".

SQL Server Tuning series play Turn II (how to run with the aggregation joint hint (Hint) bootstrap statement)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.