Impact of statistical information on the Execution Plan (I)

Source: Internet
Author: User

We know that statistics directly determine the execution plan produced by the relational engine.ArticleTwo examples are demonstrated.

1,Impact of statistical information on connection modes

2,Impact of statistical information on data acquisition methods in a single table

The following is a training I have conducted. The test environment is sqlserver 2008 R2. If you do not understand what the statistical information is, click here

First, let's take a look at the impact of statistics on connection methods,

Create a test database, set "auto create statistics" and "auto update statistics" to off, and then runCodeCreate related tables and write records

 Create   Table  Check2_t1 (ID  Int  , C1  Char ( 800  ))  Create  Table  Check2_t2 (ID  Int  , C1  Char ( 8000  ))  --  Insert data into two tables  Begin   Tran  Declare   @ I   Int  Set   @ I  = 1  While   @ I  <=  100000   Begin      Insert   Into Check2_t1 Select   @ I , '  C1  '          Set   @ I  = @ I  +  1   End  Commit   Tran  Begin   Tran  Declare   @ I   Int  Set   @ I  =  1  While   @ I <=  500000   Begin      Insert   Into Check2_t2 Select   @ I , '  C2  '          Set   @ I  =  @ I  +  1   End Commit   Tran 

Then execute the following two SQL statements. The SQL statement in ② uses the table prompt hash to simulate the statistical information. The statistical information will be displayed later.

The following two SQL statements show that the result set is 0 based on the query conditions. Let's look at their execution plan.

① Select A. *   From Check2_t1 Inner   Join Check2_t2 B On A. ID = B. ID Where A. C1 =  '  C3 '  ②  Select A. *   From Check2_t1 Inner Hash Join Check2_t2 B On A. ID = B. ID Where A. C1 =  '  C3  ' 

The execution plan is as follows:

Compared with the above two execution plans, we found that the two table connections in 1 use the "merge join", and check2_t2 is scanned by the entire table, now you should understand the meaning of simulating the check2_t2 table into a record with an 8 K page. In fact, this table alone occupies 50 million pages and approximately 4 GB of disk space, it is slow to scan tables.

After reading the execution plan of ①, we know that optimization is to remove this "merge connection" and change it to "nested loop" or "hash connection" to achieve our results, as a result, in SQL, we force the "hash matching" request through the connection prompt. We can see that SQL is instantaneous and achieves our goal, of course, we can also change the "hash" prompt to the "loop" prompt to let the query go through the "nested loop" method, and its effect is instantaneous. The query plan is as follows:

Now we can see that we have reached the desired execution plan by adding connections, but this is not our suggestion. We can see a yellow triangle exclamation mark in the table scan icon in the execution plan, move the mouse up and it prompts that the table lacks XX statistics. For example, the system prompts that the column C1 lacks statistics, which directly leads to a bad execution plan for sqlserver, we manually create the statistical information wx_c1 on the C1 column of the check2_t1 table,

After the creation is complete, execute the SQL statement ① again, and find that the default execution plan begins to use "hash matching". The query is also completed instantly. The execution plan is as follows:

So far, we have shown the impact of statistical information on table join in the execution plan. The next article will show the impact of statistical information on the single-Table query plan.

It should be noted that, in fact, I cannot clearly explain how parsing statistical information leads to different query plans. This is something inside the sqlserver engine and there are relatively few materials, however, you should understand the importance of statistical information. As Microsoft's official documentation says, expired statistical information may seriously affect system performance. It is best to keep the default database "auto create statistics and auto update statistics" open, unless you have enough reason not to use the auto create and auto update statistics function.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.