Read Catalogue
- Objective
- Technical preparation
Objective
In the previous article, we analyzed how the query optimizer works, including: detailed operation steps of the query optimizer, analysis of filter conditions, optimization of index items, and other information.
In this article we analyze the detection of several key indicator values in the course of our operation.
These indicator values are analyzed to analyze the operation of the statement and to analyze how it is optimized.
Through this article we can learn the tuning of a few of the most frequently used tools!
Talk less, start the subject of this article.
Back to top technical preparation
The database version is SQL SERVER2008R2 and is analyzed using a more concise case library (Northwind) from Microsoft.
Weapon one, IO statistics
Through this IO statistic, we can analyze the number of data pages to be scanned by the current query statement. There are several important concepts that we analyze in turn.
The method is simple and one line of code is done:
?
Let's see an example.
?
123 |
SET STATISTICS IO ON GO SELECT * FROM Person.Contact |
Here you can see how many times this statement is used for the data table, based on the scanned item of the data page.
The so-called data page is the underlying data storage for the database, and SQL Server stores the table row data as a data page. Each data page is 8K,
8k=8192 bytes-96 bytes (page header)-36 bytes (row offset) = 8060 bytes
It also says that a data page stores 8060 bytes of pure data content.
Let's explain in turn that there are several read concepts:
Logical Read
Represents the total number of access pages required to process a query. That is, the total number of data pages that need to be read to complete a query statement.
The data page here is likely to come from memory, or it may be from a hard disk read.
Physical Reading
This means the number of pages read from the hard disk. We know that SQL Server will be reading the data page as much as possible in memory, to facilitate the next direct read, improve read speed.
So here's the probability of the next visit to the data page stored in memory, which presents a metric: Cache Hit ratio
Cache Hit Ratio = (logical read-physical read)/logical read
The proposed indicator is designed to measure the validity of cached data pages in memory. For example: if the cache and in-memory data pages are not used once, for this should be removed from memory in a timely manner, after all, for memory resources is very expensive. It should be used to cache data pages with high hit ratios.
Pre-read
Read-ahead is actually the number of pages of data that SQL statements pre-read into memory when optimized. This pre-read data page is evaluated in advance, which is what the query optimizer is going to do in the previous article.
Of course, these pre-read data pages are sometimes not all used, but it basically covers the data pages that the query uses.
It is to be suggested here that the read-ahead data is Cheng Fei by another thread, and the statement optimization line runs in parallel with the same thread, with the goal of quickly acquiring data and increasing the speed of query acquisition.
From this indicator we can analyze a lot of problems, to give an example:
Let's add a new test table with the following script
?
123456789 |
--perform the following script to a new table select * into neworders from orders go --add a new column alter table neworders add FULL_DETAILS  char not null default ' full Details ' go |
Then use the following script to look at the size of this table
?
12 |
EXEC sp_spaceused NewOrders, TRUE GO |
We can see the total size of this table data page is 2216KB, we know a page is 8KB, we can infer that the table data page has:
2216 (total data page size)/8 (one data page size) = 277 pages
This means that the data table has 277 data pages.
Of course, we can also view the page's data pages by using the DMV view below
?
123 |
SELECT * FROM SYS.dm_db_index_physical_stats (DB_ID( ‘Northwind‘ ),object_id( ‘NewOrders‘ ), NULL , NULL , ‘detailed‘ ) |
Through the above analysis,
We can speculate that when querying this table for a full table scan, the logical reading of the theoretical data page should be 277 times
Verify the following statement by following the
?
123456 |
--先清空缓存数据,生产机慎用 DBCC DROPCLEANBUFFERS SET STATISTICS IO ON SELECT * FROM NewOrders |
I'll go...
The logic read here is 1047 pages, and the inferred 277 page above does not match ... Wipe... God horse reason!!!
Here is the data page we are analyzing forwarded the record phenomenon. Because we are in the newly created table, the newly added column of data: Full_details, a data column of type char (2000), when the variable-length column in the data row grows so that the original page cannot hold the data row, the data is moved to the new page, and a pointer to the new page is left in the original location. This is called: Forwarded record
We can view the number of pages formed by the forwarded record of the table by the DMV view below.
?
123 |
SELECT * FROM SYS.dm_db_index_physical_stats (DB_ID(N ‘Northwind‘ ),object_id( ‘NewOrders‘ ), NULL , NULL , ‘detailed‘ ) |
Correct: The 770 data page for the forwarded record page, the concept of non-split page (thanks to the friends of the hospital wy123 pointed out).
See, the forwarded record page here is 770 pages, then we can extrapolate the number of logical readings we have.
277 (Original data page) +770 (forwarded record page) = 1047 page
So the above questions are analyzed for reasons.
We also showed a problem with the forwarded record page through this table: it affects query performance.
There are many ways to solve this, and the simplest way is to rebuild the clustered index.
?
1234567 |
CREATE CLUSTERED INDEX orderID_C ON NewOrders(OrderID) GO DROP INDEX NewOrders.orderID_C GO SET STATISTICS IO ON SELECT * FROM NewOrders GO |
With IO statistics, in addition to analyzing the fragments caused by the forwarded record page above, it is more important to compare the number of reads between different query statements and to optimize the statements by reducing the number of reads.
As for the pre-read scenario, we have previously analyzed that the data is preloaded by another thread when the T-SQL query statement is optimized.
So this thread actually has a reference value when reading the data, and the data can be read from this reference to ensure that most of the data is useful, that is, to increase the cache hit rate mentioned above.
About this reference value, I analyzed the next, in fact, is divided into two cases of analysis.
first , if the data table is a heap table, SQL Server gets it only through full table scan. This way, in order to avoid repeated reads, increase the consumption, so that the one-time pre-reading is not read a data page,
It's a physics 64 pages in a row.
Take a look at the official explanations for Books Online:
The read-ahead mechanism allows the database engine to read up to 64 contiguous pages (512KB) from a file. The read is performed as a scatter-gather read of the corresponding number of buffers in the buffer cache (which may be non-contiguous). If any page in this range already exists in the buffer cache, the corresponding page read will be discarded when the read is complete. If the page already exists in the cache, you can also "crop" the extent of the page from either end.
So, if our tables are not physically contiguous pages, then the number of reads is not good how to determine.
Let's look at an example of a heap table.
?
1234 |
SET STATISTICS IO ON --新建个测试表 SELECT * INTO NewOrders_TEST FROM NewOrders SELECT * FROM NewOrders_TEST |
The number of pre-reading here is 8 times, so I estimate that the underlying data page is definitely not sequential. So the result was 3 more times.
We can make a query under DBCC IND () to verify my inference.
?
1 |
DBCC IND( ‘Northwind‘ , ‘NewOrders_TEST‘ ,1) |
Data information is more, I paste it into Excel, and then make a line chart, where the part of the painting is actually no data page, so does not produce a read.
There is also a problem with the reading order, but the total number is definitely 8 times ....
Hope this way, you crossing can read ... I hope I can express my understanding.
second , if the table is not a heap table, that is, there is a clustered index entry, well, SQL Server is easy to find its pre-reading reference: statistical information.
Also, we know that the data is stored in b-tree numbers, and the data pages that are read are present with the leaf nodes. So basically there is no continuous reading of the gratitude.
A leaf node is a data page, and a data page is a pre-read.
Let's look at an example:
We add the above table to the clustered index entry, empty the cache again, execute the query, and the script is as follows
?
123 |
CREATE CLUSTERED INDEX NewOrders_TESTIndex ON NewOrders_TEST(OrderID) GO SELECT * FROM NewOrders_TEST |
The clustered index is added here, and SQL Server looks like a savior, and according to statistics, read-ahead data.
So if there is an error in the statistics, it causes read-ahead reading .... Then severely reduces the cache hit rate ... and then seriously increases the speed of swap-in in memory .... Added CPU ....
Well, let's continue the article, the pre-read data line we mentioned above can be found in the following DMV.
?
123 |
SELECT * FROM SYS.dm_db_index_physical_stats (DB_ID(N ‘Northwind‘ ),object_id( ‘NewOrders_TEST‘ ), NULL , NULL , ‘detailed‘ ) |
From this DMV view, you can see that this table statistic is 277 data pages, so 277 read-ahead is formed.
However, the fact that the data table is 279 pages, that is, the statistical information is problematic, resulting in a few read read 2 data pages, and in order to compensate for this statistical negligence, there are 2 physical reads, re-obtained from the hard disk.
Tool two, TIME statistics
The simple thing about time statistics is that the statistics T-SQL executes the statement execution time items, including CPU time, statement compile time, total statement execution time, and so on.
The use of the method is also very simple, a line of code
?
Through this parameter, we can analyze the above information, which is mainly used to compare the execution time in the query statement tuning, our goal is to reduce the execution time.
Example: We compare the above query statement to the time of the first run and later run (the data has been cached) to understand the importance of the cache by turning on time statistics.
Time to execute again
Cache tracking (added on December 25, 2014)
Of course, we can go a little further, if you want to see the details of this part of the data cache in memory, you can see the following DMV script
?
1234 |
SELECT * FROM sys.dm_os_buffer_descriptors WHERE DB_NAME(database_id)= ‘Northwind‘ AND page_type= ‘DATA_PAGE‘ ORDER BY page_id |
The DMV can also be used to analyze the size ratio of each library in memory, and the script is as follows:
?
1234567891011 |
--清除缓存
dbcc dropcleanbuffers
--查看缓存内容中在内存大小
SELECT COUNT(*)*
8
/
1024 as ‘Cached Size(MB)‘
,CASE database_id
WHEN
32767 THEN
‘ResourceDB‘
ELSE DB_NAME(database_id)
END AS
‘Database‘
FROM sys.dm_os_buffer_descriptors
GROUP BY DB_NAME(database_id),database_id
ORDER BY
‘Cached Size(MB)‘ DESC
|
After this query, this table has all been cached in memory, because the whole table is a total of 2MB size
SQL Server Tuning Series Advanced (query statements run several indicator value monitoring)