Cause:
As a result of recent work needs and past doubts, so decided to start studying dynamic SQL. As a result of the development of a few years away from the front line, a lot of technical things are not too much research, as the DBA and "SQL Server performance optimization and management of the art" book independent author, more attention to meet the functional requirements of the premise of performance issues. But I think this article is not only useful to DBAs, but also to the database developers and designers, architects, etc. have some reference value.
Objective:
Whether the reader has encountered a similar function: An application (whether B/S or C/s structure), there is a certain/some function, allowing users to choose different conditions to query (for the sake of simplicity, here only for the query function). In extreme cases, users can select different display columns, column display order, sort form, or even two queries from the result set through the interface. These features can be implemented using the front-end programming language, sometimes even better, but the front-end implementations require backing from the backend database, so this is more focused on the database level.
In the above example, the user may see this, but as a developer, you need to consider more than that, assuming that we use stored procedures to implement the data support of this function, then you need to consider: Different query conditions may need to correlate different tables, For example, you need to filter both the date and the item number, the two conditions are distributed in a, b two tables, then you should be associated when querying. However, when the user chooses to filter the item number and the customer name, the data may be distributed in the B and C tables, then you need to associate the B and C tables in the same stored procedure.
Due to the limitations of relational database theory, it is difficult to build this logic flexibly while keeping the code simple and efficient. In addition to a large number of cursors to judge this almost any function but often extremely inefficient way, dynamic SQL appears in our scope of consideration. We need some tools for two purposes: getting the right data and getting reasonable performance.
Brief introduction:
In modern information systems, the above examples are widely available. But one of the core issues that comes up is that in SQL Server (which is estimated to be in all RDBMS), there is no single execution plan that can well support all possible query conditions. You also want SQL Server to optimize for user input.
There are usually two ways to implement this requirement at the database level:
1. Write static SQL with query hints, such as option (RECOMPILE), forcing SQL Server to compile the query every time.
2. Use dynamic SQL to create a string that overrides the user-defined query criteria and execute the statement.
Both of these methods are common, each with its pros and cons, can not simply say which one is not good, this series will introduce two types of related content.
Note: This series will use SQL R2 and Microsoft sample database AdventureWorks2008 R2 as a demonstration.
Basic knowledge:
In order for the reader to have some knowledge to prepare, this question first introduces some basic knowledge, these basic knowledge are:
- Execution plan
- Execution plan Cache
- Statistical information
- Parameter sniffing (Introduction)
Note: In order to control the length of the article, this series of articles do not intend to delve into the above knowledge, more detailed information can be consulted in my book "SQL Server performance optimization and management of the art" in the relevant information.
Execution Plan:
When an SQL statement (SQL statement/batch or stored procedure) is submitted to SQL Server, SQL Server parses the syntax of the statement, stops subsequent operations immediately, and returns an error message to the client if it does not meet the requirements.
When SQL statement parsing is passed, SQL Server finds out whether an execution plan is available based on the text of the statement in the memory's plan cache (plan cache, which is part of the buffer pool of memory), and, if so, executes directly and, if none, the compile phase. After compiling the build execution plan is submitted to the storage engine for execution and caching. The approximate process is as follows:
Execution plans are at the heart of performance, and the storage engine accesses and returns data to the client based on the execution plan, and a good execution plan can efficiently and quickly complete the data request, while an unreasonable execution plan can increase the run time of a simple request from a few seconds to a few hours. and execution planning is also a core tool for analyzing performance issues. In SQL Server Management Studio (SQL 2000 called Query Analyzer, SQL2005 starts short for ssms), viewing execution plans can be obtained through shortcut keys: ctrl+m (actual execution plan) and ctrl+l (estimated execution plan), It can also be obtained through the graphical interface:
Execution Plan cache:
After the SQL Server optimizer compiles and optimizes the statement, it commits to the storage engine and caches the execution plan to the memory's plan cache for subsequent reuse.
The most important purpose of the planning cache is to reduce unnecessary compilation and recompilation, which can lead to significant CPU pressure in heavily loaded systems. If a stored procedure is called over hundreds of times in a single second, avoiding a compilation process that requires only 5ms, you can reduce the execution plan of the stored procedure from 5 minutes to 100ms.
In addition, the plan cache is closely related to the text of the statement, which is described later, and for the following two statements, their execution plan can be reused:
SELECT id,a,b from TB where Id=1select id,a,b from TB where id=10
However, the following two statements are considered to be two different statements, resulting in the second statement being compiled with new statements, often resulting in performance problems and resource wastage:
SELECT id,a,bfrom TB where Id=1select id,a,b from TB where id=10 --Note spaces
For such problems, the usual solution is to rationalize the coding specification and use parameterized queries, which are described later.
In addition, the plan cache has advanced pre-existing features, assuming the following stored procedure:
Createproc TEST (@ID INT) as SELECT id,a,b from TB WHERE [email Protected]go
It is also assumed that the data distribution of the ID column on the table TB is extremely uneven, such as the id=1 data is 90% on the column, while the id=10 data is only 0.01% on the column, the data row is over million, and the appropriate index is supported. In this case, for id=1 queries, it should be reasonable to do index scanning, and for id=10 query, it should be reasonable to find the index. This is true when the stored procedure is first run and the @id=1 is the parameter, and the execution plan is generated and cached by the index scan. However, when the second execution is a @id=10 parameter, the execution plan may be reused for some reason, and the index scan mode will still be used, at which point you may feel a noticeable decrease in performance. This is the case and vice versa. This is detailed in a series of articles on parameter sniffing that will be collated later.
Finally, the plan cache is in memory, that is, it is unstable, restarting the service/server or some commands and actions can cause the plan to be brushed out of memory. You can periodically collect and store them in a dedicated monitoring database or save the execution plan as a file, if necessary.
Statistical information:
Statistics is a "table" that describes the table data, which is simply the data that describes the data. The data distribution that we often say is stored in statistics for SQL Server. Statistics can be created automatically or created manually, and typically create corresponding statistics when the index is created. The details of the statistics are shown in the sixth chapter of the Art of SQL Server performance optimization and management.
Statistical information is also the core of performance optimization, SQL Server after parsing the statement syntax, based on the content of the statistics, the selection of the execution plan to use the Index and Table Association algorithm, statistics of the outdated and inaccurate will seriously affect the execution plan generation and the overall performance of the server.
Also, in the optimization process, if the statistics are accurate enough, the optimizer can avoid unnecessary table associations by judging whether the data needs to be correlated. Subsequent sections of the series will also be shown.
Reminder: The current statistics of the algorithm at the latest from the beginning of the SQL2000, but from the beginning of SQL 2014 to make a large change, the algorithm is more reasonable, so if conditional, we recommend using SQL 2014 or later version.
Parameter sniffing:
Many places see this function as a performance killer or a derogatory term, but the existence is reasonable, we should specifically analyze the situation. So what is the parameter sniffing? The example is already described in the Execution Plan cache section. In some stored procedures or other objects, the method of passing parameters is used, but SQL Server does not know what parameters you pass in when it is actually executed, and SQL Server uses a pre-estimate to generate the execution plan, but in some cases SQL Server also "sniffs" The actual parameters passed in, and it is necessary to recompile whether the execution plan in the cache is unreasonable for the current statement and its arguments.
So this kind of function is actually necessary, and a lot of things are useful. But if for some reason, when it is not necessary to use this function, it will lead to the original very reasonable execution plan was abandoned, choose unreasonable execution plan, the result is query run well, but suddenly become extremely slow.
In general, it is necessary to understand the nature of parameter sniffing in order to draw conclusions. A follow-up will be devoted to an article on parameter sniffing, once the conference has added a link to this article.
Environment Preparation:
Software environment: SQL Server R2, best Enterprise Edition. Operating system is arbitrary.
Sample database: AdventureWorks r2,:http://msftdbprodsamples.codeplex.com/releases/view/59211
In addition to the above environment, but also in the database to build a stored procedure template, said to be a template, in fact, is the subsequent demonstration process, in this template add some statements:
CREATE PROCEDURE sp_get_orders @salesorderid int = NULL, @fromdate datetime = NULL, @todate datetime = NULL, @minprice money = null, @maxprice money = null, @custid nchar (5) = null, @custname nvarch AR (+) = null, @prodid int = null, @prodname nvarchar (+) = null, @employeestrvarchar (MAX) = NULL, @employeetblintlist_tbltype readonlyas SELECT O.salesorderid, O. OrderDate, OD. UnitPrice, OD. OrderQty, C.customerid, per. FirstName as Customername,p.productid, p.name as ProductName, per. BusinessEntityID as Empolyeeidfrom sales.salesorderheader oinner JOIN sales.salesorderdetail od on o.SalesOrderID = OD . Salesorderidinner join Sales.Customer c on o.customerid = C.customeridinner join Person.person per on c.personid=per.b UsinessentItyidinner JOIN production.product p on p.productid = od. Productidwhere??? ORDER by O.salesorderidgo
Notice where the upper??? This means that the subsequent example will be demonstrated by adding or subtracting the where condition and creating a new named stored procedure.
Briefly describe the parameters of the stored procedure:
@salesorderid |
Order ID |
@fromdate |
Start date |
@todate |
End Date |
@minprice |
Lowest price |
@maxprice |
High price |
@custid |
Customer ID |
@custname |
Customer Name |
@prodid |
Product ID |
@prodname |
Product Name |
@employeestr |
String combination of employee IDs, separated by commas |
@employeetbl |
Table-valued parameter for employee ID |
If the user does not use any query criteria, then the query condition will not be filtered, so a simple exec sp_get_orders is required to return the functionality of all database orders.
By rewriting stored procedures, you can address the following requirements in your business:
- The user can choose how to sort the results.
- Depending on the input parameters, the statement can access different tables or columns.
- The user can choose to compare operators, such as @custname= ' xxxx ' or @custname! = ' xxxx '.
- Users can add or remove columns from the output results, or they can choose what to aggregate in the aggregate query.
- Other needs you can think of that you can't even think of but the customer will think of.
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
T-SQL Dynamic Query (1)--Introduction