. NET Technical Questions Series (2)-sql Server database Optimization specification

Last Update:2018-01-15 Source: Internet

Author: User

Tags server memory

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Database Optimization Specification

A. Index

Each table requires a primary key, and it is not necessary to force a clustered index on the primary key.

Clustered index, the data stored in the table is stored in the order of the index, that is, the logical order determines the physical order of the corresponding row in the table, so the value of the field of the clustered index should be a value that does not change, and is in sequential growth, otherwise the impact on the new/modified/deleted data is relatively large.

Nonclustered indexes, generally consider using nonclustered indexes in the following scenarios: conditional fields using joins, fields using group by, exact match where condition fields, foreign key fields, and so on.

Indexes have a 900-byte size limit, so do not index on extra-long fields, and the total number of bytes in the indexed field does not exceed 900 bytes, otherwise the inserted data will be error-bound when it reaches 900 bytes.

Reasonably set up index, reasonable index, and reasonably establish index.

B. Using the Set NOCOUNT on option

By default, each time a SQL statement is executed, a message is sent from the server to the client to display the number of rows affected by the SQL statement. This information is rarely useful to clients, and even some clients use it as an error-handling message. By turning this default off, you can reduce network traffic on both the server and the client, helping to improve the overall performance of your servers and applications. To turn off this feature of the stored procedure level, include the "SET NOCOUNT on" statement at the beginning of each stored procedure. Similarly, in order to reduce network traffic on both the server side and the client, the Select and print statements in the stored procedure that are used during the debugging process should be removed from the production environment.

C. Proper use of union and UNION all

Many people do not fully understand how the Union and union all work, and as a result, a lot of unnecessary SQL Server resources are wasted. When you use union, it is equivalent to executing select DISTINCT on the result set. In other words, union will federate two similar recordsets, then search for duplicate records and exclude them. If this is your goal, then using Union is the right thing to do. But if you're using union union with two recordsets without repeating the records themselves, using union wastes resources because it looks for duplicate records, even if you're sure they don't exist.

So if you know that there are no duplicates in the recordset you want to federate, then you want to use union all instead of union. The Union all Union recordset, but does not search for duplicate records, thus reducing the use of SQL Server resources to improve performance.

D. Return only the required data, as far as possible without select *

In most cases, do not use * to replace the field list returned by the query, the advantage of * is that the code is small, even if the table structure or the View column changes, the written query SQL statement does not change, all the fields are returned. However, when the database server resolves, if it encounters a *, the structure of the table is analyzed, and then all the field names of the table are listed again, which increases the time of the analysis.

Another problem is that SELECT * may contain columns that are not needed and increase network traffic. If you use SELECT * In view creation, when you make changes to the table structure of the EOG table at a later time, unexpected results may be generated when the view is queried, unless the view is rebuilt or the metadata of the view is updated with Sp_refreshview.

E. Careful use of select DISTINCT

The DISTINCT clause is used only for certain functions, that is, when a duplicate record is excluded from the recordset. This is because the DISTINCT clause gets the result set first and then goes heavy, which increases the consumption of SQL Server resources. Of course, if you need to do it, you have to do it.

If you know that the SELECT statement will never return duplicate records, using the distinct statement is an unnecessary waste of SQL Server resources.

F. Using fewer cursors

Any kind of cursor can degrade SQL Server performance. Some situations cannot be avoided, but most can be avoided, so if your application is currently using TSQL cursors, see if the code can be overridden to avoid them. If you need a single line of execution, consider using a batch instead of using a cursor.

G. The specified alias before the field

When you concatenate multiple tables in an SQL statement, add the table name or alias to the front of each column, which reduces the time to parse and reduces the syntax errors caused by column ambiguity.

H.sarg your Where condition

Arge "SARG", which is derived from the first letter of "search Argument", refers to the comparison of columns and constants in the WHERE clause. If the WHERE clause is sargable (SARG), this means that it can use the index to accelerate the completion of the query. If the WHERE clause is not sarg, this means that the WHERE clause cannot take advantage of the index (or at least partially) and performs a full table or index scan, which can cause a performance degradation of the query.

Non-SARG search conditions in the WHERE clause such as "is NULL", "<>", "! =", "!>", "!<", "not", "not EXISTS", "Not in", "Not IKE" and "Like '%500 '" , usually (but not always) prevents the query optimizer from using the index to perform a search. An expression that includes a function on a column, an expression that uses the same column on either side, or a column (not a constant) is not sarg.

Not every non-SARG WHERE clause is destined for a full table scan. If the WHERE clause includes two sarg and a non-sarg clause, then at least SARG clauses can use the index (if present) to help quickly access the data.

In most cases, if the table has an overlay index that includes the columns used by all the Select, JOIN, WHERE clauses in the query, the overlay index can replace the full table scan to return the queried data, even if it has an sarg where clause. But remember that overwriting an index is especially a flaw in itself, so often producing a wide index increases read disk I/O. In some cases, the non-SARG WHERE clause can be rewritten as a SARG clause. For example:

where SUBSTRING (firstname,1,1) = ' m ' can be written as: where FirstName like ' m% '

The two WHERE clauses have the same result, but the first one is not SARG (because the function is used) will run slower, and the second is SARG and will run faster.

If you don't know if a particular where clause is SARG, check the query execution plan in Query Analyzer. In doing so, you can quickly know whether the query is using an index or a full table scan to return the data. Careful analysis, many non-sarg queries can be written as Sarg queries.

I. Avoiding or simplifying sorting

You should simplify or avoid repeating the ordering of large tables. The optimizer avoids sequencing steps when it is possible to automatically generate output in the appropriate order using the index. Here are some of the factors that affect:

One or several columns to be sorted are not included in the index;

The order of the columns in the group BY or ORDER BY clause is not the same as the order of the indexes;

The sorted column comes from a different table.

In order to avoid unnecessary sorting, it is necessary to construct the index correctly and merge the database tables reasonably (although sometimes it may affect the normalization of the table, but it is worthwhile to improve the efficiency). If sorting is unavoidable, you should try to simplify it, such as narrowing the range of sorted columns.

J. Attention to transactions and locks

The transaction is the database application and the important tool, it has the atomicity, the consistency, the isolation, the persistence four properties, many operations we all need to use the transaction to guarantee the data correctness. In the use of transactions we need to try to avoid deadlocks and minimize blocking. Special attention needs to be paid to the specific following:

1) The transaction operation process should be as small as possible, split the transaction to be split apart.

2) The transaction operation process should not have interaction, because when the interaction waits, the transaction does not end, possibly locking a lot of resources.

3) The transaction operation process to access the object in the same order, such as in a transaction in order to update a, b two tables, then in other transactions do not follow the order of B, a to update the two tables.

4) To improve the efficiency of each statement in a transaction, using indexes and other methods to improve the efficiency of each statement can effectively reduce the execution time of the whole transaction.

5) Try not to specify the lock type and index, SQL Server allows us to specify the type of lock and index used by the statement, but in general, the SQL Server optimizer chooses the lock type and index is optimal under the current data volume and query conditions, we have specified may only be better in the current situation, But the amount of data and the distribution of data will change in the future.

K. Using stored procedures instead of direct write query statements

Stored procedures provide a number of benefits for developers, including:

Reduce network traffic and response times to improve application performance. For example, sending a stored procedure call over the network instead of sending 500 rows of TSQL will be faster and use less resources. The parsing of SQL statements, estimating utilization of indexes, binding variables, reading blocks of data, and so on are performed every time SQL executes.

The stored procedure execution plan can be reused, residing in the SQL Server memory cache, reducing server overhead.

Client execution requests are more efficient. For example, if an application needs to insert a large number of binary values into an image data column without using a stored procedure, it must convert the binary to a string (the size will increase by one) and then send it to SQL Server. When SQL Server receives it, it must turn the string value back into binary format. A lot of wasted overhead. Stored procedures can eliminate this problem by passing the application to the SQL Server binary format as a parameter, thereby reducing overhead performance.

Stored procedures help provide code reuse. While these do not directly enhance the performance of the application, it increases the developer's efficiency by reducing the amount of code and reducing debugging time.

Stored procedures can encapsulate logic. You can change the stored procedure code without affecting the client (assuming you keep the parameters the same and not remove any result set columns). This saves the developer time.

Stored procedures provide better security for your data. If you use only stored procedures, you can remove the Select, INSERT, update, and delete permissions directly to the table to force developers to use stored procedures to access data. This will save the DBA time.

L. Do not use COUNT () sum in subqueries to perform an existence check

Do not use such a statement:

Select column_list from tablename where 0 < (SELECT count (*) from table2 where ...)

You should use such a statement instead:

Select column_list from tablename where EXISTS (SELECT * from Table2 where ...)

When you use COUNT (), SQL Server does not know that you are doing an existence check, it calculates all matching values, either performs a full table scan, or scans the smallest nonclustered index. When you use exists, SQL Server knows that you want to perform an existence check, and when it finds the first matching value, it returns true and stops the query.

M. Usage of temporary tables and table variables

In complex systems, temporary tables and table variables are difficult to avoid, and for the use of temporal tables and table variables, it is important to note:

1) If the statement is complex and has too many connections, consider stepping through the temporary table and table variables.

2) If you need to use the same portion of data from a large table multiple times, consider staging this part of the data with temporary tables and table variables.

3) If you need to synthesize data from multiple tables to form a result, consider using temporal tables and table variables to summarize the data of these tables in step-by-steps.

4) In other cases, you should control the use of temporary tables and table variables.

5) The choice of temporary tables and table variables, many of the statements are table variables in memory, fast, should be preferred table variables, but in the actual use of the choice is mainly to consider the amount of data to be placed on the temporary table, in the case of large amounts of data, the temporary table speed is faster instead.

6) About temporary table generation the choice of using SELECT INTO and create TABLE + INSERT into, we have tested, in general, select into will be much faster than the CREATE TABLE + INSERT into method, but select Into locks tempdb's system tables sysobjects, sysindexes, syscolumns, and is prone to blocking other processes in a multiuser concurrency environment, so my advice is to use CREATE TABLE + INSERT into in a concurrent system as much as possible. , while a single statement with a large amount of data is used, select INTO is used.

Note the collation, which is created with the CREATE table, and if you do not specify a collation for the field, the default collation for tempdb is selected instead of the collation of the current database. If the collation of the current database differs from the collation of tempdb, there may be a collation conflict error at the time of the connection. You can generally avoid these problems by specifying the collation of the field as Database_default when creating a temporary table in create table.

Ps:sql the difference between temporary tables, temporary variables, and with as keywords in the database to create a temporary table

1). With Temptablename as method (appears after 05):

With temptable as does not actually create a temporary table, just the subquery part (subquery factoring), which defines a SQL fragment that will be used by the entire SQL statement. Sometimes it is to make the SQL statement more readable, or it may be in different parts of union all as part of providing data. Especially useful for union all. Because each part of union all may be the same, but if each part goes through it, the cost is too high, so you can use the with as phrase, as long as you execute it again.

2). Temporary table method

A temporary table is similar to a permanent table, except that it is created in tempdb and only disappears after the end of a database connection or by a SQL command, otherwise it will persist (the temporary table is generally created and, if executed, not by the drop table operation, Can no longer be created the second time). Temporary tables generate the system logs of SQL Server when they are created, although they are in tempdb and are allocated in memory, they also support physical disks, but the user cannot see the files on the specified disk.

Temporary tables are both local and global, and the name of the local temporary table is prefixed with "#", which is visible only in the local current user connection and is deleted when the user disconnects from the instance. The names of global temporary tables are prefixed with "# #" and are visible to any user after they are created, and are deleted when all users referencing the table are disconnected.

3). Table Variable method

A table variable creates a syntax similar to a temporary table, and the difference is that it must be named when it is created. Table variables are one of the variables, the table variable is also divided into local and global two, the name of the surface variable is "@" prefix, only in the local current user connection can be accessed. The name of the global table variable is prefixed with "@@", which is generally the global variable of the system, as we commonly use, such as @ @Error represents the wrong number, @ @RowCount represents the number of rows affected.

The difference between a temporary table and a table variable:

1) The table variable is stored in memory, when the user accesses the table variable, SQL Server does not generate the log, and in the temporary table is generated log;

2) in a table variable, a nonclustered index is not allowed;

3) The table variable is not allowed to have default defaults, and does not allow constraints;

4) The statistical information on the temporary table is sound and reliable, but the statistics on the table variable are unreliable;

5) There is a locking mechanism in the temporal table, and there is no mechanism for locking in the table variable.

Selection of temporary tables and table variables:

1) The main thing to consider when using table variables is the pressure on the memory of the application, and if the code runs a lot of instances, pay special attention to memory variables ' memory consumption. We use table variables for smaller data or for recommended calculations. If the result of the data is large, in the code for the temporary calculation, when the selection is not a grouping of aggregations, you can consider using table variables.

2) generally for large data results, or because the statistical data to facilitate better optimization, we recommend the use of temporary tables, but also to create an index, because the temporary table is stored in tempdb, the general default allocation of less space, you need to tune tempdb to increase its storage space.

Welcome to the attention number, public number name: dotnet cultivation Treasure. Sweep the following QR code or collection below the QR code attention (Long press the following QR code image, and select the QR code in the identification diagram)

. NET Technical Questions Series (2)-sql Server database Optimization specification

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More