In order to optimize the performance of Microsoft SQL Server most effectively, you must be clear about where performance will be maximized in terms of changing circumstances and focus on these aspects. Otherwise, you may spend a lot of time and effort on these issues, while performance is not significantly improved.
Most of the following information does not address performance issues caused by concurrent use by multiple users. The article "Maximizing database Consistency and Concurrency" (maximizing the consistency and concurrency of databases) and other Knowledge Base articles will have a separate discussion of this complex topic, which can be used from the SQL Server version 4.2x Programmer ' s Reference for C "(C Programmer's Reference) is found in Appendix E. This topic is not available in the version 6.0 documentation, but can be found under the appropriate title on the MSDN (Microsoft Developer Network) CD.
This article does not discuss the theory in a theoretical way, but focuses on the experience of the Microsoft SQL Server support team that has been summed up over the years to have real value for real-world situations.
Experience shows that you can get the most out of SQL Server performance from the general aspects of logical Database design, index design, query design, and application design. Conversely, the biggest performance problems are usually caused by defects in these areas. If you are concerned with performance issues, you should focus on these aspects first, because you typically need to devote relatively little time to significantly improving performance.
While other system-level performance issues, such as memory, caching, and hardware, are also worth considering, experience has shown that performance improvements from these areas are increasing. SQL Server automatically manages available hardware resources, which greatly reduces the need for a large number of manual adjustments at the system level (and thus reduces the benefits of manual tuning).
Microsoft SQL Server 6.0 provides new opportunities for platform-level performance improvements by providing features such as high-capacity memory, symmetric multiplexing, parallel data scanning, enhanced optimizer, and disk stripe. This performance improvement is adequate, but after all, the scope is limited. The fastest computers can also be paralysed by inefficient queries or poorly designed applications. Therefore, although the performance improvements allowed by SQL Server 6.0 are increased, it is important to optimize databases, indexes, queries, and application programming.
If you focus on the server side only, many performance issues cannot be satisfactorily resolved. The server is essentially the "puppet" of the client, because the client controls the sending of the query and thus controls the acquisition or release of the lock. Although some adjustments can be made on the server side, the ability to satisfactorily resolve performance issues usually depends on the level of recognition of the client's dominant role in the problem and the analysis of client application behavior. Here are some recommendations based on experience that can make a noticeable improvement in performance:
The design of the canonical logical database and the reasonable specification of the logical database will produce the best performance. A large number of "narrow" tables are characteristics of a normalized database. A small number of "wide" tables is a feature that does not have a normalized database. Highly normalized databases are often associated with complex relational joins, which can compromise performance. But as long as there is a valid index, the SQL Server optimizer can select a fast, efficient join very efficiently.
The benefits of the specification include:
- Because the table is narrower, the speed of sorting and creating indexes is accelerated.
- More clustered indexes are allowed because of the large number of tables.
- The index becomes narrower and more compact.
- The reduced index in each table helps improve the performance of the UPDATE operation.
- Reduced data and redundant data, making the database more compact.
- Reduces the concurrency impact of DBCC diagnostics because the necessary table locks only affect less data.
Therefore, for SQL Server, reasonable specifications tend to be beneficial rather than detrimental to performance. As the degree of normalization increases, the number and complexity of joins required to retrieve data increases correspondingly. Based on experience, Microsoft recommends that you always do this procedure unless the normalization process causes 4 or more joins to occur for many queries.
In the case that the logical design of the database has been determined and the overall redesign is not feasible, it is possible to choose a large table which is analyzed to show the bottleneck of performance. If access to the database is made through stored procedures, you can make changes to the schema without affecting the application. Otherwise, you can create a view (which looks like a table) to hide this change.
Using a valid index design differs from many non-relational systems in that relational indexes are not part of the logical design of a database. Indexes can be deleted, added, or modified, with the exception of performance, which does not affect the database schema or the aspects of the application design. A valid index design is critical to getting excellent SQL Server performance. For these reasons, you should not hesitate to try out various indexes.
In most cases, the optimizer reliably chooses the most efficient index. The overall strategy of the index design should be to provide an optimal set of index choices for the optimizer and trust that it will make the right decision. This reduces analysis time and provides the best performance when the situation is constantly changing.
Here are some index design recommendations:
-
- Check the WHERE clause of the SQL query, as this is the focus of the optimizer.
Each column listed in the WHERE clause can consider using an index. If you are checking for too many queries, you might want to select a representative collection or just check for slower queries. If your development tools generate SQL code transparently, it will be more difficult to check. Many of these tools allow the generated SQL syntax to be recorded in a file or screen for debugging purposes. You may need to know from the tool vendor whether this feature is available.
-
- Use narrow indexes.
Narrow indexes are often more efficient than multi-column combined indexes. Narrow indexes can improve performance by having more rows on each page, with fewer index levels.
The optimizer can quickly and efficiently analyze hundreds of, or even thousands of, indexes and join possibilities. Having more narrow indexes will give the optimizer more room to choose from, which often helps improve performance. Conversely, with fewer multi-column wide indexes, there is little room for the optimizer to choose from, which can compromise performance.
It is often best not to adopt a strategy that emphasizes full coverage of queries. If all the columns in the SELECT clause are overwritten by a non-clustered index, the optimizer recognizes this and can provide good performance. However, this usually causes the index to be too wide and relies excessively on the likelihood that the optimizer will use the policy. Typically, you should use more narrow indexes, which can provide better performance for a large number of queries.
To get enough read performance, you should not have more indexes, because updating these indexes requires considerable overhead. However, even the most update-oriented operations require more read operations than write operations. Therefore, if you find it helpful to use a new index, do not hesitate to delete it at any time.
-
- Use clustered indexes.
Proper use of clustered indexes can greatly improve performance. Even the "update" and "delete" operation speeds are greatly accelerated by using clustered indexes, because these operations also require a lot of reading. Only one clustered index is allowed per table, so be careful when you use it. Queries that return many rows, or queries involving a range of values, are suitable for use with clustered indexes to speed up execution.
Example:
SELECT * FROM PHONEBOOK
WHERE LASTNAME=‘SMITH‘
-or-
SELECT * FROM MEMBERTABLE
WHERE MEMBER_NO > 5000
AND MEMBER_NO < 6000
Conversely, if such queries are common, then the LASTNAME or member_no columns above are less suitable for non-clustered indexes. Non-clustered indexes should be used as much as possible on columns with a few rows returned.
-
- Check the uniqueness of the column.
This helps you decide what columns are appropriate for use with clustered indexes, non-clustered indexes, or no indexes.
The following is a sample query that checks the uniqueness of a column:
SELECT COUNT (DISTINCT colname) from TABLENAME
It returns the number of unique values in the column. Compare this quantity to the total number of rows in the table. For a 10,000-row table, if a column has 5,000 unique values, it is a good fit for non-clustered indexes. In the same table, if a column has 20 unique values, it is appropriate to use a clustered index. If there are only 3 unique values, do not use an index. These are just examples, not dead, hard rules. Keep in mind that the index should be placed on the columns listed in the query where clause.
-
- Check the data distribution of the indexed columns.
If you create an index on a column that has only a few unique values, or if a join is performed on such a column, the query is often slow. This is a significant problem for data and queries, and cannot be resolved without understanding the situation. For example, if all people in a city are called Smith or Jones, it is unlikely that you will be able to quickly find someone in the city's phone book in alphabetical order by surname. You can use the GROUP by query to view the data distribution of the index key values, in addition to the single metric that the query above can give to the out-of-order uniqueness. It provides a higher resolution view of the data and provides a better perspective for the optimizer to view the data.
The following is a sample query that examines the distribution of the index key values (assuming that the COL1, COL2 two columns are keywords):
SELECT COL1, COL2, COUNT(*)
FROM TABLENAME
GROUP BY COL1, COL2
Each key value will return a row with an instance count of each value. To reduce the number of rows returned, you can use the HAVING clause to exclude part of the returned result. For example, clauses
Having COUNT (*) > 1
All rows with unique keywords will be excluded.
The number of rows returned by the query is also an important factor in the selection of relational indexes. The optimizer considers a non-clustered index to consume at least one page of I/O for each returned row. At this rate, scanning the entire table becomes more efficient. This is another reason to limit the size of the result set, or to find large results with a clustered index.
Do not equate using indexes with good performance, and vice versa. If using an index always produces the best performance, the optimizer's work is simply too simple, as long as any available index is used. The reality is that using index retrieval incorrectly can result in poor performance. Therefore, the task of the optimizer is to select Index retrieval that is useful for performance, and to avoid index retrieval that is detrimental to performance.
Using a valid query to design certain types of queries is a resource-intensive one. This is related to the basic database and indexing problems common to most relational database management systems (RDBMS), not SQL Server specific. They are not inefficient because the optimizer will implement these queries in the most efficient way possible. But they still consume a lot of resources, and SQL's set-oriented features make them inefficient. The optimizer has no way to eliminate the inherent consumption of these structures for resources. They are very resource-intensive compared to simpler queries. Although SQL Server will use the best access plan, it is fundamentally limited by the potential resource-intensive.
For example:
- Large result set
- In, not in, and or queries
- Highly non-unique WHERE clause
- ! = (Not equal to) comparison operators
- Some column functions, such as SUM
- Expressions or data transformations in the WHERE clause
- Local variables in the WHERE clause
- Complex view of GROUP by
Many factors make it necessary to use these query structures. If the optimizer can limit the size of the result set before executing a portion of a large resource-intensive query, the performance impact of these query structures will be reduced. Here are some examples:
A resource-intensive query:
SELECT SUM (SALARY) from TABLE
Queries that consume less resources:
SELECT SUM(SALARY) FROM TABLE WHERE
ZIP=‘98052‘
A resource-intensive query:
SELECT * from TABLE WHERE [email protected]
Queries that consume less resources:
SELECT * FROM TABLE
WHERE [email protected] AND ZIP=‘98052‘
In the first example, the SUM operation cannot speed up the index. Because each row has to read and accumulate. Assuming there is an index on the ZIP column, the optimizer will most likely use it to limit the result set before applying SUM. This will speed up.
In the second example, local variables are resolved only at run time. However, the optimizer cannot defer the selection of the access plan to run time; it must be selected at compile time. However, if an access plan is established at compile time, the @VAR value is still unknown and cannot be selected as an entry for the index.
The techniques used to improve performance in the example above involve limiting the result set with an AND clause. As an alternative technique, you can use a stored procedure and pass the value used for the @VAR as a parameter to the stored procedure.
In some cases, a best practice is to use a simple set of queries and store intermediate results through temporary tables, which is much better than using a single complex query.
Large result sets are resource-intensive for most RDBMS. You should try to avoid returning a large result set as the final data selection to the client by browsing. Limiting the size of the result set allows the database system to perform some intrinsic functions, which is more efficient. This reduces network I/O and makes the application more suitable for deployment over slow remote communication links. As the application expands to more users, it also improves concurrency-related performance.
The role of design application design in SQL Server performance using an effective application cannot be fully expressed in words. Consider the client as a control entity, and the server is just a "puppet" of the client, more accurate than the server as the dominant role. SQL Server is completely under the control of the client in terms of query type, when to submit queries, and how to handle results. This, in turn, has a significant impact on the type of lock and duration, server I/O, and CPU load, and therefore determines the quality of the performance.
For this reason, it is important to make the right decisions during the application design phase. Even if you are experiencing performance problems with subcontracted applications (that is, modifying client applications seems unlikely), the fundamental factors that affect performance will not change-that is, the client plays a decisive role, and many performance issues will not be resolved if the client is not modified.
With a perfectly designed application, SQL Server may support thousands of concurrent users. And if the application is poorly designed, even the most powerful server platform can be paralysed in the case of only a few users.
The following recommendations are used when designing client applications, resulting in excellent SQL Server performance:
-
- Use a small result set. Retrieving unnecessary large result sets for client browsing (for example, thousands of rows) will increase the CPU and network I/O burden, making applications less efficient for remote use and limiting multiuser scalability. Therefore, the application you are designing is best to prompt the user for sufficient information so that the submitted query can produce the most appropriate result set.
Application design techniques to help achieve this include restricting the use of wildcards when establishing queries, forcing input of certain fields, and prohibiting arbitrary queries.
-
- Use Dbcancel () correctly in the Db-library application. All applications should allow cancellation of in-progress queries. No application can force a user to cancel a query by restarting the computer. Failure to follow this principle will lead to an unresolved performance issue. When using dbcancel (), a certain amount of attention should be paid to the transaction level. For additional information, see the following Microsoft Knowledge Base article:
117143: INF: A condition or method using dbcancel () or sqlcancel () if an ODBC sqlcancel () call is used, the same problem can occur with ODBC applications.
-
- Be sure to work through all the results. Do not design applications that stop results row processing without canceling the query, or use such subcontracting applications. Doing so often results in blocking and performance degradation.
-
- Be sure to implement the query timeout. You cannot allow queries to run indefinitely. Set the query timeout with the appropriate db-library or ODBC call. In Db-library, query timeouts are implemented through Dbsettime () calls, which are implemented through SQLSetStmtOption () in ODBC.
-
- Do not use application development tools that do not allow explicit control of SQL statements sent to the server. Do not use a tool that generates SQL statements transparently based on higher-level objects unless it provides important features such as query cancellation, query timeouts, full transaction control, and so on. If all of the "Transparent SQL" is generated by the application itself, it is often unlikely to maintain good performance or resolve performance issues because this will not allow explicit control of transactional and locking issues, which are critical to performance.
-
- Do not mix decision support with online transaction processing (OLTP) queries.
-
- Do not design an application that forces a user to cancel a query by restarting the client, or use such a subcontract. Because there may be a connection that is in a stateless state, this causes many performance problems that are difficult to resolve. For more information, see the following Microsoft Knowledge Base article:
137983 INF: How to troubleshoot problems with no primary connection in SQL Server
It is tempting to analyze performance problems by simply adjusting the performance of the system-level server. For example, memory size, file system type, number of processors, type, and so on. The experience of the Microsoft SQL Server support team shows that most performance issues cannot be resolved in this way. These problems must be resolved by analyzing the applications, the queries that the application submits to the database, and how those queries interact with the database schema.
First, isolate one or more queries that are slow. Typically, only a handful of SQL queries are slow when the entire application looks slow. Failure to carefully decompose and isolate slow queries often does not solve performance problems. If you are using a development tool that transparently generates SQL, use the tool's diagnostic or debug mode to capture the generated SQL. In many cases there are some tracking features available, but they may not be exposed. Contact the application's technical support to determine if there is no tracking capability to monitor the SQL statements generated by your application.
This is much easier for application development tools that use Embedded SQL because SQL is visible.
If your development tool or end-user application does not provide tracking capabilities, there are several alternative methods to choose from:
- Use the 4032 trace flag according to the instructions in the SQL Server 4.2x Troubleshooting Guide and SQL Server 6.0 "transact reference". This will capture the SQL statement sent to the server in the SQL error log.
- Monitor the query by using a network analyzer such as Microsoft Network Monitor (which is part of the Systems Management Server).
- For ODBC applications, use the ODBC Administrator program to select Trace ODBC calls. For more information, see the ODBC documentation.
- Use third-party client tools that intercept SQL at the db-library or ODBC layer. Blue Lagoon Software's SQL Inspector is an example.
- Use the Sqleye analysis tool provided as an example in the Microsoft TechNet CD. Note: Sqleye is not within Microsoft technical support.
After you isolate the slow query, do the following:
-
- Use query tools such as ISQL to independently run queries that are suspected of being slow and validate their actual speed. As a best practice, run the query on the server computer with ISQL and local pipelines, and then redirect the output to the file. This helps eliminate complex factors, such as network and screen I/O, and the impact of application result buffering.
-
- Use SET STATISTICS IO on to check the I/O consumed by the query. Note the logical page I/O count. The goal of the optimizer is to minimize the I/O count. Records the logical I/O count. This will be a benchmark for measuring performance improvements. Focusing on the STATISTICS IO output and experimenting with different query and index types is more efficient than using SET SHOWPLAN on. Explaining and effectively applying the SHOWPLAN output requires some research, and it may be more efficient to spend the time it takes on some experimental tests. If these simple recommendations do not solve performance problems, you can use SHOWPLAN to investigate the optimizer behavior more thoroughly.
-
- If the query involves a view or stored procedure, extract it from the view or stored procedure and run it separately. This allows changes to the access plan when you try to use a different index. This also helps to position the problem in the query itself, not how the optimizer handles views or stored procedures. If the query itself is not a problem, and if the query is running as part of a view or stored procedure, running the query independently will also help determine this.
-
- Note that there may be triggers on the tables that are involved in the query, which produce significant I/O at run time. The triggers involved in the slow query should be deleted. This will help determine whether the problem is related to the query itself or to the trigger or view to determine the direction of your effort.
-
- Check the index of the table used by the slow query. Use the techniques listed above to determine whether these indexes are reasonable and make the necessary changes to them. The first step in your effort should be to index each column in the WHERE clause. Typically, performance problems arise because the columns in the WHERE clause are not indexed, or if there are no valid indexes on those columns.
-
- Examine the data uniqueness and distribution of each column mentioned in the WHERE clause, especially each indexed column, using the query mentioned above. In many cases, simply checking queries, tables, indexes, and data allows you to immediately find the cause of the problem. For example, a performance problem typically occurs because a keyword that has only 3 or 4 unique values is indexed, or a join is performed on such a column, or the number of rows is returned to the client.
-
- On the basis of these studies, make the necessary changes to the application, query, or index. When the changes are complete, run the query again and observe the changes in the I/O count.
-
- If you find that the speed has improved, run the main application to see if overall performance has improved.
Check how the program behaves by I/O or CPU throttling. Typically, it is useful to determine whether a query is subject to I/O or CPU throttling. This helps to keep working on real bottlenecks to improve performance. For example, if a query is CPU bound, then even adding more memory to SQL Server will not significantly improve its performance because more memory can only improve cache hit ratios, where it is already very high.
How to check I/O constrained and CPU-constrained query behavior:
- Use Windows NT Performance Monitor to monitor I/O and CPU activity. Monitors all instances of the LogicalDisk object "% Disk Time" counter. Also monitors the "% total Processor time" counter for the System object. To view valid disk performance information, you must first issue "diskperf-y" from the command prompt, open the Windows NT diskperf settings, and then restart the system. For more information, see the Windows NT documentation.
- When running a query, if the CPU graphics curve has been high (for example, above 70), and the "% Disk time" value has been low, this is a CPU-constrained state.
- When running a query, if the CPU graphics curve has been low (for example, below 50), and "% Disk time" has been high, this is an I/O constrained state.
- Compares the CPU graph curve to the STATISTICS IO information.
Conclusion SQL Server can achieve high performance in large databases. This is especially true in SQL Server 6.0. To exploit this performance potential, you must use a valid database, index, query, and application design. Because it is most likely to significantly improve performance from these aspects. Make every query as effective as possible, so that when the application expands to more users, it can support the multi-user load of the group. We encourage you to use the guidelines provided in this article to study the behavior of client applications, the queries submitted by applications, and to try out various indexes. You can significantly improve performance by using methods that analyze performance problems and devote relatively little time.