We know that statistics directly determine the execution plan produced by the relational engine.ArticleTwo examples are demonstrated.
1,Impact of statistical information on connection modes
2,Impact of statistical information on data acquisition methods in a single table
The following is a training I have conducted. The test environment is sqlserver 2008 R2. If you do not understand what the statistical information is, click here
First, let's take a look at the impact of statistics on connection methods,
Create a test database, set "auto create statistics" and "auto update statistics" to off, and then runCodeCreate related tables and write records
Create Table Check2_t1 (ID Int , C1 Char ( 800 )) Create Table Check2_t2 (ID Int , C1 Char ( 8000 )) -- Insert data into two tables Begin Tran Declare @ I Int Set @ I = 1 While @ I <= 100000 Begin Insert Into Check2_t1 Select @ I , ' C1 ' Set @ I = @ I + 1 End Commit Tran Begin Tran Declare @ I Int Set @ I = 1 While @ I <= 500000 Begin Insert Into Check2_t2 Select @ I , ' C2 ' Set @ I = @ I + 1 End Commit Tran
Then execute the following two SQL statements. The SQL statement in ② uses the table prompt hash to simulate the statistical information. The statistical information will be displayed later.
The following two SQL statements show that the result set is 0 based on the query conditions. Let's look at their execution plan.
① Select A. * From Check2_t1 Inner Join Check2_t2 B On A. ID = B. ID Where A. C1 = ' C3 ' ② Select A. * From Check2_t1 Inner Hash Join Check2_t2 B On A. ID = B. ID Where A. C1 = ' C3 '
The execution plan is as follows:
Compared with the above two execution plans, we found that the two table connections in 1 use the "merge join", and check2_t2 is scanned by the entire table, now you should understand the meaning of simulating the check2_t2 table into a record with an 8 K page. In fact, this table alone occupies 50 million pages and approximately 4 GB of disk space, it is slow to scan tables.
After reading the execution plan of ①, we know that optimization is to remove this "merge connection" and change it to "nested loop" or "hash connection" to achieve our results, as a result, in SQL, we force the "hash matching" request through the connection prompt. We can see that SQL is instantaneous and achieves our goal, of course, we can also change the "hash" prompt to the "loop" prompt to let the query go through the "nested loop" method, and its effect is instantaneous. The query plan is as follows:
Now we can see that we have reached the desired execution plan by adding connections, but this is not our suggestion. We can see a yellow triangle exclamation mark in the table scan icon in the execution plan, move the mouse up and it prompts that the table lacks XX statistics. For example, the system prompts that the column C1 lacks statistics, which directly leads to a bad execution plan for sqlserver, we manually create the statistical information wx_c1 on the C1 column of the check2_t1 table,
After the creation is complete, execute the SQL statement ① again, and find that the default execution plan begins to use "hash matching". The query is also completed instantly. The execution plan is as follows:
So far, we have shown the impact of statistical information on table join in the execution plan. The next article will show the impact of statistical information on the single-Table query plan.
It should be noted that, in fact, I cannot clearly explain how parsing statistical information leads to different query plans. This is something inside the sqlserver engine and there are relatively few materials, however, you should understand the importance of statistical information. As Microsoft's official documentation says, expired statistical information may seriously affect system performance. It is best to keep the default database "auto create statistics and auto update statistics" open, unless you have enough reason not to use the auto create and auto update statistics function.