Impact of statistical information on execution plans (1) We know that statistical information directly determines the execution plans produced by the relational engine, this article demonstrates two examples: 1. The impact of statistical information on the connection method 2. The impact of statistical information on the way data is obtained from a single table. The following is a training I have conducted. content, the test environment is SQLServer2008.
Impact of statistical information on execution plans (1) We know that statistical information directly determines the execution plans produced by the relational engine, this article demonstrates two examples: 1. The impact of statistical information on the connection method 2. The impact of statistical information on the way data is obtained from a single table. The following is a training I have conducted. content, the test environment is SQLServer 2008
Impact of statistical information on the Execution Plan (I)
We know that statistical information directly determines the execution plan generated by the relational engine. This article demonstrates two examples.
1. Impact of statistical information on connection modes
2. Impact of statistical information on data acquisition methods in a single table
The following is a training I have conducted. The test environment is SQLServer 2008 R2. If you do not understand what the statistical information is, click here
First, let's take a look at the impact of statistics on connection methods,
First, create a Test DB, set "auto create statistics" and "auto update statistics" to OFF, and then run the code to create related tables and write records.
Create table CHECK2_T1 (id int, C1 CHAR (800) create table CHECK2_T2 (id int, C1 CHAR (8000) CHECK2_T1 CHECK2_T2
Then execute the following two SQL statements. The SQL statement in ② uses the table prompt HASH to simulate the statistical information. The statistical information will be displayed later.
The following two SQL statements show that the result set is 0 based on the query conditions. Let's look at their execution plan.
① CHECK2_T2 B ②
The execution plan is as follows:
Comparing the preceding two execution plans, we found that the two table connections in Table 1 use the "merge join", the US space, and CHECK2_T2 are scanned by the entire table, now you should understand the meaning of simulating the CHECK2_T2 table into a record with an 8 K page. In fact, this table alone occupies 50 million pages and approximately 4 GB of disk space, it is slow to scan tables.
After reading the execution plan of ①, we know that optimization is to remove this "merge connection" and change it to "nested loop" or "hash connection" to achieve our results, as a result, in SQL, we force the "hash matching" request through the connection prompt. We can see that SQL is instantaneous and achieves our goal, of course, we can also change the "HASH" prompt to the "LOOP" prompt to let the query go through the "nested LOOP" method, and its effect is instantaneous. The query plan is as follows:
Now we can see that the desired execution plan is achieved by adding a connection prompt, but this is not our suggestion. The Hong Kong virtual host, we can see that the table scan icon in the execution plan has a yellow triangle exclamation mark, the U.S. space, move the mouse up, it prompts us that the table lacks XX statistics, for example, on CHECK2_T1, the system prompts that the C1 column lacks statistics, which directly leads to a bad execution plan for sqlserver. We manually create the WX_C1 statistical information on the C1 column of the CHECK2_T1 table,
After the creation is complete, execute the SQL statement ① again, and find that the default execution plan begins to use "hash matching". The query is also completed instantly. The execution plan is as follows:
So far, we have shown the impact of statistical information on table join in the execution plan. The next article will show the impact of statistical information on the single-Table query plan.
It should be noted that, in fact, I cannot clearly explain how parsing statistical information leads to different query plans. This is something inside the sqlserver engine and there are relatively few materials, however, you should understand the importance of statistical information. As Microsoft's official documentation says, expired statistical information may seriously affect system performance. It is best to keep the default database "auto create statistics and auto update statistics" open, unless you have enough reason not to use the auto create and auto update statistics function.
Posted on