Impact of statistical information on the Execution Plan (I)

Last Update:2018-12-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

We know that statistics directly determine the execution plan produced by the relational engine.ArticleTwo examples are demonstrated.

1,Impact of statistical information on connection modes

2,Impact of statistical information on data acquisition methods in a single table

The following is a training I have conducted. The test environment is sqlserver 2008 R2. If you do not understand what the statistical information is, click here

First, let's take a look at the impact of statistics on connection methods,

Create a test database, set "auto create statistics" and "auto update statistics" to off, and then runCodeCreate related tables and write records

 Create   Table  Check2_t1 (ID  Int  , C1  Char ( 800  ))  Create  Table  Check2_t2 (ID  Int  , C1  Char ( 8000  ))  --  Insert data into two tables  Begin   Tran  Declare   @ I   Int  Set   @ I  = 1  While   @ I  <=  100000   Begin      Insert   Into Check2_t1 Select   @ I , '  C1  '          Set   @ I  = @ I  +  1   End  Commit   Tran  Begin   Tran  Declare   @ I   Int  Set   @ I  =  1  While   @ I <=  500000   Begin      Insert   Into Check2_t2 Select   @ I , '  C2  '          Set   @ I  =  @ I  +  1   End Commit   Tran

Then execute the following two SQL statements. The SQL statement in ② uses the table prompt hash to simulate the statistical information. The statistical information will be displayed later.

The following two SQL statements show that the result set is 0 based on the query conditions. Let's look at their execution plan.

① Select A. *   From Check2_t1 Inner   Join Check2_t2 B On A. ID = B. ID Where A. C1 =  '  C3 '  ②  Select A. *   From Check2_t1 Inner Hash Join Check2_t2 B On A. ID = B. ID Where A. C1 =  '  C3  '

The execution plan is as follows:

Compared with the above two execution plans, we found that the two table connections in 1 use the "merge join", and check2_t2 is scanned by the entire table, now you should understand the meaning of simulating the check2_t2 table into a record with an 8 K page. In fact, this table alone occupies 50 million pages and approximately 4 GB of disk space, it is slow to scan tables.

After reading the execution plan of ①, we know that optimization is to remove this "merge connection" and change it to "nested loop" or "hash connection" to achieve our results, as a result, in SQL, we force the "hash matching" request through the connection prompt. We can see that SQL is instantaneous and achieves our goal, of course, we can also change the "hash" prompt to the "loop" prompt to let the query go through the "nested loop" method, and its effect is instantaneous. The query plan is as follows:

Now we can see that we have reached the desired execution plan by adding connections, but this is not our suggestion. We can see a yellow triangle exclamation mark in the table scan icon in the execution plan, move the mouse up and it prompts that the table lacks XX statistics. For example, the system prompts that the column C1 lacks statistics, which directly leads to a bad execution plan for sqlserver, we manually create the statistical information wx_c1 on the C1 column of the check2_t1 table,

After the creation is complete, execute the SQL statement ① again, and find that the default execution plan begins to use "hash matching". The query is also completed instantly. The execution plan is as follows:

So far, we have shown the impact of statistical information on table join in the execution plan. The next article will show the impact of statistical information on the single-Table query plan.

It should be noted that, in fact, I cannot clearly explain how parsing statistical information leads to different query plans. This is something inside the sqlserver engine and there are relatively few materials, however, you should understand the importance of statistical information. As Microsoft's official documentation says, expired statistical information may seriously affect system performance. It is best to keep the default database "auto create statistics and auto update statistics" open, unless you have enough reason not to use the auto create and auto update statistics function.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Impact of statistical information on the Execution Plan (I)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Impact of statistical information on the Execution Plan (I)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support