Objective
In the previous article, we analyzed how the query optimizer works, including: detailed operation steps of the query optimizer, analysis of filter conditions, optimization of index items, and other information.
In this article we analyze the detection of several key indicator values in the course of our operation.
These indicator values are analyzed to analyze the operation of the statement and to analyze how it is optimized.
Through this article we can learn the tuning of a few of the most frequently used tools!
Talk less, start the subject of this article.
Technical preparation
The database version is SQL SERVER2008R2 and is analyzed using a more concise case library (Northwind) from Microsoft.
Weapon one, IO statistics
Through this IO statistic, we can analyze the number of data pages to be scanned by the current query statement. There are several important concepts that we analyze in turn.
The method is simple and one line of code is done:
SET STATISTICS IO on
Let's see an example.
SET STATISTICS IO Ongoselect * from Person.Contact
Here you can see how many times this statement is used for the data table, based on the scanned item of the data page.
The so-called data page is the underlying data storage for the database, and SQL Server stores the table row data as a data page. Each data page is 8K,
8k=8192 bytes-96 bytes (page header)-36 bytes (row offset) = 8060 bytes
It also says that a data page stores 8060 bytes of pure data content.
Let's explain in turn that there are several read concepts:
Logical Read
Represents the total number of access pages required to process a query. That is, the total number of data pages that need to be read to complete a query statement.
The data page here is likely to come from memory, or it may be from a hard disk read.
Physical Reading
This means the number of pages read from the hard disk. We know that SQL Server will be reading the data page as much as possible in memory, to facilitate the next direct read, improve read speed.
So here's the probability of the next visit to the data page stored in memory, which presents a metric: Cache Hit ratio
Cache Hit Ratio = (logical read-physical read)/logical read
The proposed indicator is designed to measure the validity of cached data pages in memory. For example: if the cache and in-memory data pages are not used once, for this should be removed from memory in a timely manner, after all, for memory resources is very expensive. It should be used to cache data pages with high hit ratios.
Pre-read
Read-ahead is actually the number of pages of data that SQL statements pre-read into memory when optimized. This pre-read data page is evaluated in advance, which is what the query optimizer is going to do in the previous article.
Of course, these pre-read data pages are sometimes not all used, but it basically covers the data pages that the query uses.
It is to be suggested here that the read-ahead data is Cheng Fei by another thread, and the statement optimization line runs in parallel with the same thread, with the goal of quickly acquiring data and increasing the speed of query acquisition.
From this indicator we can analyze a lot of problems, to give an example:
Let's add a new test table with the following script
-- Execute the following script into a table select * into newordersfrom ordersgo-- New Add a row ALTER TABLE Newordersadd full_details CHAR (' Full details'GO
Then use the following script to look at the size of this table
EXEC sp_spaceused Neworders,truego
We can see the total size of this table data page is 2216KB, we know a page is 8KB, we can infer that the table data page has:
2216 (total data page size)/8 (one data page size) = 277 pages
This means that the data table has 277 data pages.
Of course, we can also view the page's data pages by using the DMV view below
SELECT * from sys.dm_db_index_physical_stats (db_id ('Northwind'), object_id ('neworders' ), Null,null,'detailed')
Through the above analysis,
We can speculate that when querying this table for a full table scan, the logical reading of the theoretical data page should be 277 times
Verify the following statement by following the
-- Empty the cache data first, the production machine uses DBCC DROPCLEANBUFFERSSET STATISTICS IO ONSELECT * from Neworders
I'll go...
The logic read here is 1047 pages, and the inferred 277 page above does not match ... Wipe... God horse reason!!!
Here is the data page we are analyzing forwarded the record phenomenon. Because we are in the newly created table, a new column of data is added later: full_details, a data column of type char (2000), when the variable-length column in the data row grows so that the original page cannot hold the data row, the data is moved to the new page. And leave a pointer to the new page in the original location, which is called: Forwarded record
We can view the number of pages formed by the forwarded record of the table by the DMV view below.
SELECT * from sys.dm_db_index_physical_stats (db_id (N'Northwind'), object_id ('neworders '), Null,null,'detailed')
Correct: The 770 data page for the forwarded record page, the concept of non-split page (thanks to the friends of the hospital wy123 pointed out).
See, the forwarded record page here is 770 pages, then we can extrapolate the number of logical readings we have.
277 (Original data page) +770 (forwarded record page) = 1047 page
So the above questions are analyzed for reasons.
We also showed a problem with the forwarded record page through this table: it affects query performance.
There are many ways to solve this, and the simplest way is to rebuild the clustered index.
CREATE CLUSTERED index Orderid_c on neworders (OrderID) godrop INDEX neworders.orderid_cgoset STATISTICS IO ONSELECT * F ROM Newordersgo
With IO statistics, in addition to analyzing the fragments caused by the forwarded record page above, it is more important to compare the number of reads between different query statements and to optimize the statements by reducing the number of reads.
As for the pre-read scenario, we have previously analyzed that the data is preloaded by another thread when the T-SQL query statement is optimized.
So this thread actually has a reference value when reading the data, and the data can be read from this reference to ensure that most of the data is useful, that is, to increase the cache hit rate mentioned above.
About this reference value, I analyzed the next, in fact, is divided into two cases of analysis.
first , if the data table is a heap table, SQL Server gets it only through full table scan. This way, in order to avoid repeated reads, increase the consumption, so that the one-time pre-reading is not read a data page,
It's a physics 64 pages in a row.
Take a look at the official explanations for Books Online:
The read-ahead mechanism allows the database engine to read up to 64 contiguous pages (512KB) from a file. The read is performed as a scatter-gather read of the corresponding number of buffers in the buffer cache (which may be non-contiguous). If any page in this range already exists in the buffer cache, the corresponding page read will be discarded when the read is complete. If the page already exists in the cache, you can also "crop" the extent of the page from either end.
So, if our tables are not physically contiguous pages, then the number of reads is not good how to determine.
Let's look at an example of a heap table.
SET STATISTICS IO on-- Create a new test table select * into neworders_test from Newordersselect * from Neworders_test
The number of pre-reading here is 8 times, so I estimate that the underlying data page is definitely not sequential. So the result was 3 more times.
We can make a query under DBCC IND () to verify my inference.
DBCC IND ('Northwind','neworders_test',1)
Data information is more, I paste it into Excel, and then make a line chart, where the part of the painting is actually no data page, so does not produce a read.
There is also a problem with the reading order, but the total number is definitely 8 times ....
Hope this way, you crossing can read ... I hope I can express my understanding.
second , if the table is not a heap table, that is, there is a clustered index entry, well, SQL Server is easy to find its pre-reading reference: statistical information.
Also, we know that the data is stored in b-tree numbers, and the data pages that are read are present with the leaf nodes. So basically there is no continuous reading of the gratitude.
A leaf node is a data page, and a data page is a pre-read.
Let's look at an example:
We add the above table to the clustered index entry, empty the cache again, execute the query, and the script is as follows
CREATE CLUSTERED INDEX neworders_testindex on Neworders_test (OrderID) goselect * from Neworders_test
The clustered index is added here, and SQL Server looks like a savior, and according to statistics, read-ahead data.
So if there are errors in the statistics, it causes the read-ahead of the random read .... The cache hit rate is then severely reduced ..... And then seriously increase the speed of in-memory swap-in .... Increased CPU ....
Well, let's continue the article, the pre-read data line we mentioned above can be found in the following DMV.
SELECT * from sys.dm_db_index_physical_stats (db_id (N'Northwind'), object_id ('neworders_ TEST'), Null,null,'detailed')
From this DMV view, you can see that this table statistic is 277 data pages, so 277 read-ahead is formed.
However, the fact that the data table is 279 pages, that is, the statistical information is problematic, resulting in a few read read 2 data pages, and in order to compensate for this statistical negligence, there are 2 physical reads, re-obtained from the hard disk.
Tool two, TIME statistics
The simple thing about time statistics is that the statistics T-SQL executes the statement execution time items, including CPU time, statement compile time, total statement execution time, and so on.
The use of the method is also very simple, a line of code
SET STATISTICS time on
Through this parameter, we can analyze the above information, which is mainly used to compare the execution time in the query statement tuning, our goal is to reduce the execution time.
Example: We compare the above query statement to the time of the first run and later run (the data has been cached) to understand the importance of the cache by turning on time statistics.
Time to execute again
Cache tracking (added on December 25, 2014)
Of course, we can go a little further, if you want to see the details of this part of the data cache in memory, you can see the following DMV script
SELECT * from sys.dm_os_buffer_descriptorswhere db_name (database_id) ='Northwind' andpage_type= 'data_page'
The DMV can also be used to analyze the size ratio of each library in memory, and the script is as follows:
--Clear Cache DBCC dropcleanbuffers-- View cache contents in memory size Select COUNT (*) *8/ 1024 as cached Size (MB) case database_id when 32767 then resourcedb ' ELSE db_name (database_id) END as '
After this query, this table has all been cached in memory, because the whole table is a total of 2MB size
I turn from: http://www.cnblogs.com/zhijianliutang/p/4179110.html
Fingertip flow
SQL Server Tuning Series Advanced (query statements run several indicator value monitoring) (RPM)