Database query Optimization (reprint)

Last Update:2014-05-26 Source: Internet

Author: User

Tags server memory

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Use the Set NOCOUNT on option:

By default, each time a SQL statement is executed, a message is sent from the server to the client to display the number of rows affected by the SQL statement. This information is rarely useful to clients. By turning this default off, you can reduce network traffic on both the server and the client, helping to improve the overall performance of your servers and applications. To turn off this feature of the stored procedure level, include the "SET NOCOUNT on" statement at the beginning of each stored procedure.

2. Use Union and UNION all correctly:

Many people do not fully understand how the Union and Union select work, and as a result, a lot of unnecessary SQL Server resources are wasted. When you use union, it is equivalent to executing select DISTINCT on the result set. In other words, union will federate two similar recordsets, then search for duplicate records and exclude them. If this is your goal, then using Union is the right thing to do. But if you have two recordsets with union that don't have duplicate records, using union will waste resources because it looks for duplicate records, even if you're sure they don't exist.

So if you know that there are no duplicates in the recordset you want to federate, then you want to use union all instead of union. The Union all Union recordset, but does not search for duplicate records, thus reducing the use of SQL Server resources to improve performance.

3. Try not to select *:

In most cases, do not use * to replace the field list returned by the query, the advantage of * is that the code is small, even if the table structure or the View column changes, the written query SQL statement does not change, all the fields are returned. But when the database server resolves, if it encounters a *, the structure of the table is parsed, and then all the field names of the table are listed again. This increases the time for analysis.

4. Use Select DISTINCT with caution:

The DISTINCT clause is used only for certain functions, that is, when a duplicate record is excluded from the recordset. This is because the DISTINCT clause gets the result set first and then goes heavy, which increases the use of SQL Server useful resources. Of course, if you need to do it, you have to do it.

When you know that the SELECT statement will never return duplicate records, use the DISTINCT statement to waste SQL Server resources unnecessarily.

5. Use fewer cursors:

Any kind of cursor can degrade SQL Server performance. Some situations cannot be avoided, and most of them can be avoided. So if your application is currently using a TSQL cursor, see if the code can be rewritten to avoid them. Consider one or more of these options to replace the use of a cursor if you need to perform an action on a single line:

Working with temporary tables
Using the While loop
Using derived tables
Working with related sub-queries
Using Case statements
Using multiple queries

Each of the above can replace the cursor and execute faster. If you can't avoid using cursors, at least try to increase their speed and find out how to speed up cursors.

6. Select the most efficient table name order:

The parser for SQL Server processes the table names in the FROM clause in a right-to-left order, so the table that is written in the FROM clause (the underlying table driving tables) will be processed first, and in the case of multiple tables in the FROM clause, the tables with the fewest number of record bars must be selected as the underlying table. When SQL Server processes multiple tables, it uses sorting and merging to connect them. First, scan the first table (the last table in the FROM clause) and sort the records, then scan the second table (the last second table in the FROM clause), and finally merge all the records retrieved from the second table with the appropriate records from the first table.

For example: Table TAB1 has 16,384 records, table TAB2 has 5 records, select TAB2 as the base table (best method):

Select COUNT (*) from TAB1 A, TAB2 b

Select TAB1 as the base table (poor method):

Select COUNT (*) from TAB2 A, TAB1 b

If you have more than 3 tables connected to the query, you need to select the crosstab (intersection table) as the underlying table, which refers to the table that is referenced by the other table.

7. Alias using the table:

When you concatenate multiple tables in an SQL statement, use the alias of the table and prefix the alias to each column, which reduces the parsing time and reduces the syntax errors caused by the column ambiguity.

8.SARG your WHERE Condition:

Arge "SARG", which is derived from the first letter of "search Argument", refers to the comparison of columns and constants in the WHERE clause. If the WHERE clause is sargable (SARG), this means that it can use the index to accelerate the completion of the query. If the WHERE clause is not sarg, this means that the WHERE clause cannot take advantage of the index (or at least partially) and performs a full table or index scan, which can cause a performance degradation of the query.

Non-SARG search conditions in the WHERE clause such as "is NULL", "<>", "! =", "!>", "!<", "not", "not EXISTS", "Not in", "not" and "like '%500 ' , usually (but not always) prevents the query optimizer from using the index to perform a search. An expression that includes a function on a column, an expression that uses the same column on either side, or a column (not a constant) is not sarg.

Not every non-SARG WHERE clause is destined for a full table scan. If the WHERE clause includes two sarg and a non-sarg clause, then at least SARG clauses can use the index (if present) to help quickly access the data.

In most cases, if the table has an overlay index that includes the columns used by all the Select, JOIN, WHERE clauses in the query, the overlay index can replace the full table scan to return the queried data, even if it has an sarg where clause. But remember that overwriting an index is especially a flaw in itself, so often producing a wide index increases read disk I/O. In some cases, the non-SARG WHERE clause can be rewritten as a SARG clause. For example:

WHERE SUBSTRING (firstname,1,1) = ' m '

Can be written as:

WHERE firstname like ' m% '

The two WHERE clauses have the same result, but the first one is not SARG (because the function is used) will run slower, and the second is SARG and will run faster.

If you don't know if a particular where clause is SARG, check the query execution plan in Query Analyzer. In doing so, you can quickly know whether the query is using an index or a full table scan to return the data. Careful analysis, many non-sarg queries can be written as Sarg queries. The following points explain the sarg of the Where condition.

Connection order in a 8.1.WHERE clause

SQL Server parses the WHERE clause in a bottom-up order, and according to this principle, the connection between tables must be written before other where conditions, and those conditions that can filter out the maximum number of records must be written at the end of the WHERE clause. For example:

(Low efficiency)

SELECT * from EMP E

WHERE SAL > 50000

and JOB = ' MANAGER '

< (SELECT COUNT (*) from EMP WHERE mgr=e.empno)

Efficient

SELECT * from EMP E

where < (SELECT COUNT (*) from EMP WHERE mgr=e.empno)

and SAL > 50000

and JOB = ' MANAGER '

8.2. Regular expressions to avoid difficulties:

The matches and like keywords support wildcard matching, which is technically known as a regular expression. But this is a particularly time-consuming match. For example:

SELECT * FROM customer WHERE zipcode like "98_ _ _"

Even if an index is established on the ZipCode field, sequential scanning is also used in this case. If you change the statement to select * from customer where ZipCode >= "98000", the query is executed using the index to query, obviously will greatly increase the speed.

Also, avoid non-starting substrings. For example, statements:

SELECT * FROM Customer WHERE zipcode[2,3] > "80"

A non-starting substring is used in the WHERE clause, so the statement does not use an index.

8.3. Avoid sequential access to large table row data:

In nested queries, sequential access to a table can have a fatal effect on query efficiency. For example, a sequential access strategy, a nested 3-tier query, if each layer query 1000 rows, then the query will query 1 billion rows of data. The primary way to avoid this situation is to index the concatenated columns. For example, two tables: Student table (school number, name, age ...). ) and select the timetable (school number, course number, results). If two tables are to be connected, an index will be created on the connection field of the "Learning number".

You can also use a set to avoid sequential access. Although there are indexes on all the check columns, some forms of where clauses force the optimizer to use sequential access. The following query forces a sequential operation on the Orders table:

SELECT * FROM Orders WHERE (customer_num=104 and order_num>1001) OR order_num=1008

Although indexes are built on Customer_num and Order_num, the optimizer uses sequential access paths to scan the entire table in the above statement. Because this statement retrieves a collection of detached rows, it should be changed to the following statement:

SELECT * FROM Orders WHERE customer_num=104 and order_num>1001

UNION All

SELECT * FROM Orders WHERE order_num=1008

This enables the query to be processed using the index path.

8.4.EXISTS and in use:

In many base-table-based queries, it is often necessary to join another table in order to satisfy one condition. In this case, using EXISTS (or not EXISTS) will usually improve the efficiency of the query. In a subquery, the NOT IN clause performs an internal sort and merge. In either case, not in is the least effective because it performs a full-table traversal of the table in the subquery. To avoid using not, we can change it to an outer join (Outer Joins) or not EXISTS.

8.5. Avoid using is null and is not NULL on an indexed column:

To avoid using any nullable columns in the index, SQL Server will not be able to use the index. For single-column indexes, this record does not exist in the index if the column contains null values, and for composite indexes, the same record does not exist in the index if each column is empty. If at least one column is not empty, the record exists in the index.

If the uniqueness index is established on column A and column B of the table, and the table has a record of a A, A and a, (123,null), SQL Server will not accept the next record insert with the same A, B value (123,null).

If all the index columns are empty, SQL Server will consider the entire key value to be empty, and null cannot equal NULL, so you can insert 1000 records with the same key value, of course they are empty! Because null values do not exist in the index column, a null comparison of indexed columns in the WHERE clause causes SQL Server to deactivate the index. The following code will be inefficient (index invalidation):

SELECT ... From DEPARTMENT WHERE Dept_code are not NULL

8.6. Avoid using calculations on indexed columns:

In the WHERE clause, if the index column is part of a function, the optimizer uses a full table scan without using the index. For example, the following statement is inefficient:

SELECT ... From DEPT WHERE SAL * > 25000

And the following statement will be efficient:

SELECT ... From DEPT WHERE SAL > 25000/12

It is important to note that the index columns are not processed in the query, such as: Trim,substring,convert, and so on.

8.7. Replace the HAVING clause with a WHERE clause:

To avoid having a HAVING clause, the having only after retrieving all the records to filter the result set, this processing needs sorting, statistics and other operations. If you can limit the number of records through the WHERE clause, you can reduce this overhead.

9. Avoid or simplify sorting:

You should simplify or avoid repeating the ordering of large tables. The optimizer avoids sequencing steps when it is possible to automatically generate output in the appropriate order using the index. Here are some of the factors that affect:

One or several columns to be sorted are not included in the index;
The order of the columns in the group BY or ORDER BY clause is not the same as the order of the indexes;
The sorted column comes from a different table.

In order to avoid unnecessary sorting, it is necessary to construct the index correctly and merge the database tables reasonably (although sometimes it may affect the normalization of the table, but it is worthwhile to improve the efficiency). If sorting is unavoidable, you should try to simplify it, such as narrowing the range of sorted columns.

10. Use of temporary tables:

Temporary tables have many special uses, such as replacing cursors, but they can still cause performance problems, and if the problem is eliminated, SQL Server will execute faster. A temporary table does not have a permanent table fast with the same conditions as the data rows of the permanent table and the temporary table. Sometimes, however, you must use temporary tables, such as extracting embox conditions from a permanent table that stores large amounts of data to a temporary table, and then performing operations on the staging table. If you are directly performing operations on permanent tables that store large amounts of data, such as statistics, loops, and so on, their performance will be greatly compromised. Therefore, when you use temporary tables without using temporal tables, you need to decide on the circumstances.

11. Whether to use views:

The biggest purpose of the view is to deal with security-related issues, not the methods that lazy developers use to store frequently used queries. For example, if you need to allow users to access data for a specific SQL Server, you might consider creating a view for the user (or group) and then giving the user access to the view instead of the base table. On the other hand, in the application, there is no good reason to choose data from the view, instead, bypassing the view to get the data directly from the desired table. The reason is that many views (and of course not all) return more data than required by the SELECT statement, adding unnecessary overhead.

For example, suppose you have a view that returns 10 columns from two connection tables. You want to return 7 columns from the view using the SELECT statement. What actually happens is that the view-based query runs first, returns the data, and then your query runs against that data. Since you only need 7 columns, instead of the 10 columns returned by the view, more unnecessary data is returned. Wasting SQL Server's resources.

For a long time, everyone in the debate is the query view speed or direct query fast, I also dare not easy to jump to conclusions, so made a number of experiments, the result is: Based on the view query, performance is no faster than the direct write query statement, for simple queries, up to the same level.

Of course, the test above is that SQLServer2000 can create an index on a view without creating an index for the view, and the index of the view is very similar to the way the table's indexes work. As with tables, views can have a clustered index (clustered index) and multiple non-clustered indexes. Creating a view index can improve the performance of the view.

If the view does not contain an index, the result set returned by the view is not saved in the database. Sometimes, we may want to create views that involve a large number of records or have to perform complex calculations, such as aggregation grouping or multi-connection operations. If you let SQL Server regenerate the result set each time you refer to these views, the database overhead will be very large.

12. Make the transaction as short as possible:

Keep the TSQL transaction as short as possible. This helps reduce the number of locks (all types of locks), which can help improve SQL Server performance in general. If you have experience, you may want to divide a long transaction into smaller transaction groups.

13. Use stored procedures instead of direct write query statements:

Stored procedures provide a number of benefits for developers, including:

Reduce network traffic and response times to improve application performance. For example, sending a stored procedure call over the network instead of sending 500 rows of TSQL will be faster and use less resources. The parsing of SQL statements, estimating utilization of indexes, binding variables, reading blocks of data, and so on are performed every time SQL executes.
The stored procedure execution plan can be reused, residing in the cache of SQL Server memory, to reduce the cost of servers.
Client execution requests are more efficient. For example, if an application needs to insert a large number of binary values into an image data column without using a stored procedure, it must convert the binary to a string (the size will increase by one) and then send it to SQL Server. When SQL Server receives it, it must turn the string value back into binary format. A lot of wasted overhead. Stored procedures can eliminate this problem by passing the application to SQL Server's binary format as a parameter, thereby reducing overhead performance.
Stored procedures help provide code reuse. While these do not directly enhance the performance of the application, it increases the developer's efficiency by reducing the amount of code and reducing debugging time.
Stored procedures can encapsulate logic. You can change the stored procedure code without affecting the client (assuming you keep the parameters the same and not remove any result set columns). This saves the developer time.
Stored procedures provide better security for your data. If you use only stored procedures, you can remove the Select, INSERT, update, and delete permissions directly to the table to force developers to use stored procedures to access data. This will save the DBA time.

As a primary rule, all TSQL code should be called through a stored procedure.

13.1. Stored Procedure names do not start with sp_:

A lot of people may feel puzzled about this rule, yes, I started to wonder. If you are creating a stored procedure that is not running in the master database, do not use a name prefixed with sp_. This particular prefix is reserved for system stored procedures. Although the use of this prefix does not prohibit the operation of user-defined stored procedures, some execution efficiencies are somewhat reduced. This is because SQL Server, when executing any stored procedure prefixed with sp_, initially tries to find it in the master database by default, which wastes the time it takes to find the stored procedure, although there is no one there. If SQL Server cannot find the stored procedure in the master database, then the owner of the stored procedure is parsed as dbo. If the stored procedure is in the current database, then it executes. To avoid unnecessary delays, do not name any of your stored procedures with a prefix of sp_.

13.2. The owner of the stored procedure must be the same:

For best performance, the owner of all objects invoked in the same stored procedure should be the same, and dbo is more appropriate. If that is not the case, where the object name is the same and the owner is different, SQL Server must perform a name determination. When such a situation occurs, SQL Server cannot use the execution plan in memory in the stored procedure, instead it must recompile the stored procedure to affect performance. When calling a stored procedure from an application, it is also important to invoke the delimiter name. Such as:

EXEC dbo.myprocedure

Replace:

EXEC myprocedure

This is done for two reasons, one of which is related to performance. First, using a fully delimited name helps eliminate potential confusion with the stored procedures you want to run, helping to suppress bugs and potential problems. More importantly, however, SQL Server can more directly access the stored procedure execution plan instead of rotating access, speeding up the performance of the stored procedure. Of course, the performance boost is small, but if your server is running thousands or more of stored procedures every hour, these savings can add up to a significant amount of time.

14. Constraints and triggers for integrity use:

Do not perform redundant integrity features in the database. For example, if you are using primary KEY and foreign key constraints to force referential integrity, do not add triggers to achieve the same functionality and add unnecessary overhead. It also performs extra work by using constraints and using default values, or by using both constraints and rules.

15. Catching Exceptions in SQL:

This guideline should not be considered as an optimization, but a writing requirement. Now SQLServer2005, new begin TRY ... END try and BEGIN CATCH ... END catch two paired statements to catch the exception that occurs at run time. In Oracle, you can BEGIN ... EXCEPTION ... The END statement catches the exception.

There are two advantages to adding a SQL code block to an exception-catching statement: One is to get an exception inside the SQL statement and to handle the error, such as returning custom error messages, Rolback, and so on within the block of error code. This reduces the resource overhead of application catch exceptions, and the other is the ability to prevent deadlocks, which can be captured when a deadlock occurs and SQLServer2005 throws an exception.

The following is a list of some of the index concepts that can help you design table structures and write SQL statements:

Follow the storage rules to divide:

Clustered index: The logical order of the key values in the index determines the physical order of the corresponding rows in the table. Therefore, a table can contain only one clustered index, but the index may contain multiple columns (composite indexes). The retrieval efficiency is higher than the normal index, but the effect on new/modified/deleted data is larger.
nonclustered index: relative to a clustered index, it does not affect the order in which the data is stored in the table, the retrieval efficiency is lower than the clustered index, and there is little impact on new/modified/deleted data.
in terms of maintenance and management: &NBSP;
unique index: A unique index ensures that the indexed column does not contain duplicate values and can be used with multiple columns, but the index ensures that each value combination in the indexed column is unique.
primary key index: Defining a primary key for a table in a database diagram automatically creates a primary key index, which is a special type of unique index. The primary key index requires that each value in the primary key be unique. When a primary key index is used in a query, it also allows for fast access to the data.
Normal index: The index defined by the keyword key or index, the only task is to speed up access to the data. Therefore, you should only create indexes for the columns of data that most often appear in the query criteria or in the sort criteria. Whenever possible, you should select one of the most tidy and compact data columns (such as Integer data columns) to create the index. Duplicate columns are allowed to exist.
composite Index: If an index is created on more than two columns, it is called a composite index.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More