Preface
In the previous article, we analyzed how the query optimizer works, including the detailed operation steps, filtering condition analysis, and index item optimization of the query optimizer.
In this article, we analyze several key indicator values during our operation.
These indicator values are used to analyze the statement running problems and analyze the optimization methods.
Through this article, we can learn several useful tools that are often used in optimization!
Start the topic of this article.
Technical preparation
The database version is SQL Server2008R2 and is analyzed using Microsoft's simpler case Library (Northwind.
Tip 1: IO statistics
With this IO statistics, we can analyze the number of data pages to be scanned by the current query statement. There are several important concepts here, which we will analyze in sequence.
The method is simple. You can do it with one line of code:
SET STATISTICS IO ON
Let's look at an example.
SET STATISTICS IO ON
GO
SELECT * FROM Person. Contact
Here we can see the number of operations performed by this statement on the data table, based on the data page scan items.
The so-called data page is the underlying data storage method of the database. SQL Server stores table row data in the form of data pages. Each data page is 8 kB,
8 K = 8192 bytes-96 bytes (page header)-36 bytes (row offset) = 8060 bytes
That is to say, the pure data content stored on a data page is 8060 bytes.
Let's explain several concepts of reading in sequence:
Logical read
The total number of pages to be accessed. That is to say, the total number of data pages to be read by a query statement.
The data page may come from memory or hard disk reading.
Physical read
This is the number of data pages read from the hard disk. We know that SQL Server stores the data pages read in the memory whenever possible to facilitate the next direct reading and improve the reading speed.
So here, we propose an indicator about the next access probability of data pages stored in memory: cache hit rate.
Cache hit rate = (logical read-physical Read)/logical read
The purpose of this indicator is to measure the effectiveness of the data pages cached in the memory. For example, if the data page in the cache and memory is used once, it should be removed from the memory in time. After all, it is very expensive for memory resources. It should be used to cache data pages with a high hit rate.
Pre-read
Pre-reading is actually the number of data pages in the memory that are read in advance when the SQL statement is optimized. This pre-read data page is evaluated in advance, that is, what the query optimizer described in the previous article will do.
Of course, these pre-read data pages are sometimes not all used, but they can basically cover the data pages used for queries.
It is worth noting that the pre-read data is read by another thread and the statement optimization thread runs in parallel without the same thread. This aims to quickly obtain data and improve the query speed.
We can analyze many problems from this indicator. For example:
Add a new test table. The script is as follows:
-- Execute the following script to generate a new table
SELECT *
INTO NewOrders
FROM Orders
GO
-- Add one column
Alter table NewOrders
ADD Full_Details CHAR (2000) not null default 'full details'
GO
Use the following script to check the table size.
EXEC sp_spaceused NewOrders, TRUE
GO
We can see that the total size of the data page of this table is 2216KB. We know that a page is 8 kB. We can infer that the data page of this table is:
2216 (total data page size)/8 (one data page size) = 277 pages
That is to say, this data table has 277 data pages.
Of course, we can also view the page's data pages through the following DMV view
SELECT *
From sys. dm_db_index_physical_stats
(DB_ID ('northwind '), object_id ('neworders'), NULL, NULL, 'detailed ')
After the above analysis,
We can speculate that when querying this table for full table scanning, the logical reading of the theoretical data page should be 277 times.
Verify with the following statement:
-- Clear cache data first, and use it with caution on the production machine
DBCC DROPCLEANBUFFERS
SET STATISTICS IO ON
SELECT * FROM NewOrders
I'm going...
Here, the logic is read as 1047 pages, which is inconsistent with the above inference 277 page... why is it !!!
This is because of the split of data pages to be analyzed. Because we add a new column of data in the newly created table: Full_Details, data column of type CHAR (2000), the original data page is not enough to fill the newly added column data, therefore, data page splitting is generated.
For information about data page split, refer to Song Xiong's article introduction http://www.cnblogs.com/CareySon/archive/2012/01/06/2313897.html
The article also provides a solution to this problem.
The following DMV view shows the number of split pages in the table.
SELECT *
From sys. dm_db_index_physical_stats
(DB_ID (N 'northwind '), object_id ('neworders'), NULL, NULL, 'detailed ')
We can see that the split page here is 770 pages, so we can deduce the number of logical reads.
277 (original data page) + 770 (Split page) = 1047 page
Therefore, we have analyzed the cause of the above problem.
This table also shows a page split problem that affects query performance.
There are many solutions. The simplest way is to re-build the clustered index.
Create clustered index orderID_C ON NewOrders (OrderID)
GO
Drop index NewOrders. orderID_C
GO
SET STATISTICS IO ON
SELECT * FROM NewOrders
GO
I/O statistical items can be used to analyze the fragmentation caused by the previous page split. More importantly, they can be used to compare the number of reads between different query statements, optimize the statement by reducing the number of reads.
With regard to the pre-read situation, we have analyzed above, its data through another thread in the T-SQL query statement optimization when the data is pre-loaded.
Therefore, this thread actually has a reference value when prereading data. Only the data read from this reference value can ensure that most of the data is useful, that is, to increase the cache hit rate mentioned above.
I have analyzed this reference value, but it is actually divided into two situations.
First, if the data table is a heap table, SQL Server can only scan the entire table. To avoid repeated reads and increase consumption, one pre-read operation does not read a data page,
It is a physical row with 64 consecutive pages.
Let's take a look at the official explanations of online books:
The pre-read mechanism allows the database engine to read a maximum of 64 consecutive pages (kb) from a single file ). This read is executed as a spread-aggregation read of the corresponding number of (possibly non-adjacent) buffers in the buffer cache. If any page in this range already exists in the buffer cache, when the read is complete, the corresponding page will be discarded. If the corresponding page already exists in the cache, you can also "crop" the page range from any end.
Therefore, if our table is not a consecutive page physically, the number of reads is hard to determine.
Let's look at a heap table example.
SET STATISTICS IO ON
-- Create a test table
SELECT * INTO NewOrders_TEST FROM NewOrders
SELECT * FROM NewOrders_TEST
Here, the number of pre-reads is 8, so I guess the underlying data page is definitely not continuous. Therefore, three more requests are generated.
We can query dbcc ind () to verify my inference.
Dbcc ind ('northwind ', 'neworders _ test', 1)
I paste the data into an Excel worksheet and create a line chart. The part that is painted out does not have a data page, so there is no read.
There is also a problem with the mark of the Read order, but the total number must be 8 times .....
I hope you can understand this method... I hope you can also understand it.
Secondly, if the table is not a heap table, that is, there are clustered index items, then SQL Server can easily find its pre-read reference: statistical information.
In addition, we know that the data is stored in the B-Tree number, and the data pages read exist with leaf nodes. Therefore, there is basically no continuous reading.
A leaf node is a data page, and a data page is a pre-read.
Let's look at an example:
We add the clustered index items to the above table, clear the cache again, and execute the query. The script is as follows:
Create clustered index NewOrders_TESTIndex ON NewOrders_TEST (OrderID)
GO
SELECT * FROM NewOrders_TEST
The clustered index is added here. The SQL Server seems to have seen the rescue star. The data can be pre-read based on the statistical information.
Therefore, if there is an error in the statistical information, it will lead to the random reading of the pre-read .... then the cache hit rate is seriously reduced ..... then the speed of swap-in memory is greatly increased .... CPU added ....
Now, let's continue with the article. The pre-read data row we mentioned above can be found in the following DMV.
SELECT *
From sys. dm_db_index_physical_stats
(DB_ID (N 'northwind '), object_id ('neworders _ test'), NULL, NULL, 'detailed ')
From the DMV view, we can see that the statistical information of this table is 277 data pages, so 277 pre-reads are formed.
However, the fact that this data table is 279 pages, that is to say, there is a problem with the statistical information, resulting in two data pages read less, in order to make up for this statistical error, two physical reads were made and retrieved from the hard disk again.
Tip 2: time statistics
About time statistics this is very simple, is to count the T-SQL execution statement execution time items, including CPU usage time, statement compilation time, statement execution time and so on.
It's easy to use. A line of code
SET STATISTICS TIME ON
This parameter can be used to analyze the preceding information. It is used to compare the execution time in query statement optimization. Our goal is to reduce the execution time.
For example, we can compare the preceding query statement by enabling time statistics to understand the importance of cache by comparing the time when the first run and later run (the data has been cached).
Execution time again
The article is a bit long... come here first.
The optimization content is too broad. We will introduce it in the future. If you are interested, please pay attention to it in advance.