Toward Dba[mssql] Improve access performance of databases from the perspective of SQL statements

Source: Internet
Author: User
Tags mssql

Text: Toward Dba[mssql] Improve access performance of databases from the perspective of SQL statements

Recently the company has come to a very tiger DBA 10几 years of experience here is called Cai Teacher. Let us share the precious wealth that Cai Lao has brought to us, with the consent of our Chua. Welcome other DBAs to shoot bricks

Directory

1. What is the execution plan? The execution plan is dependent on what information.
2, unified SQL statement to reduce parsing overhead
3. Reduce the nesting of SQL statements
4. Staging intermediate results using temporary table
5. The OLTP system SQL statement must take a binding variable
6. The spy problem of the bound variable of the tilted field
7. Begin Tran to be as small as possible.
8, some SQL query statements should be added Nolock
9, add nolock after the query often occur page splitting table, easy to produce skip read or repeat Read
10, the clustered index is not built on the table in the Order field, the table is prone to page splitting
11. Use composite index to improve query speed for multiple where conditions
13, use like for fuzzy query should be careful to try not to use the former%
14. Three ways to connect SQL Server tables
15, Row_number will cause the table scan, with temporary table paging better

What is an execution plan? The execution plan is dependent on what information.

An execution plan is a query scheme that the database makes based on the statistics of SQL statements and related tables, which are generated automatically by the query optimizer, such as an SQL statement that searches for 1 records from a 100,000-record table, and the query optimizer chooses the "Index lookup" method. If the table is archived and there are currently only 5,000 records left, the query optimizer will change the scenario with a "full table scan" approach.

As can be seen, the execution plan is not fixed, it is "personalized". It is important to produce a correct "execution plan" with two points:
Does the SQL statement clearly tell the query optimizer what it wants to do?
Is the database statistics that the query optimizer gets up-to-date and correct?

The syntax of unified SQL statements reduces parsing overhead

For the following two SQL statements, the programmer considers the same, and the database query optimizer might consider it different.

SELECT * FROM dual

Select * from dual

In fact, the case is different, the Query Analyzer is considered to be two different SQL statements, must be parsed two times. Generate 2 execution plans. So as a programmer, you should ensure that the same query statements everywhere are consistent, one more space is not!

Reduce the nesting of SQL statements

I often see a sql The statement is printed with 2 Zhang a4 paper so long. In general, such complex statements are usually problematic. I hold this page long sql statement to consult the original author, and he said that the time is too long, he can not understand the moment. It can be imagined that even the original author may see the confused sql

In general, the result of a Select statement as a subset, and then query from that subset, this layer of nested statements is still relatively common, but based on experience, more than 3 layers of nesting, The query optimizer can easily give the wrong execution plan. Because it was dizzy. Like this AI-like things, after all, than the person's resolution is worse, if people are looking dizzy, I can ensure that the database will faint.

In addition, execution plans can be reused, and the simpler SQL statements are more likely to be reused. Complex SQL statements have to be re-parsed as long as one character changes, and then the bulk of the garbage is stuck in memory. It is conceivable how inefficient the database will be.

Staging intermediate results using temporary tables

An important way to simplify SQL statements is to use temporary tables to stage intermediate results, but the benefits of temporary tables are much more than that, temporary tables are temporarily present in tempdb, and the subsequent queries are in the same place, which avoids multiple scans of the main table in the program and greatly reduces the "shared lock" blocking "update lock" in program execution. Reduces congestion and improves concurrency performance.

OLTP system SQL statements must take bound variables

SELECT * from OrderHeader where changetime > ' 2010-10-20 00:00:01 '
SELECT * from OrderHeader where changetime > ' 2010-09-22 00:00:01 '
The above two sentences, the query optimizer is considered to be different SQL statements, need to parse two times. If you take a binding variable
SELECT * from OrderHeader where changetime > @chgtime
@chgtime variables can pass in any value, so a large number of similar queries can reuse the execution plan, which can greatly reduce the burden on the database parsing SQL statements. One-time analysis, multiple reuse, is the principle of improving database efficiency.

Spy problems with bound variables for skewed fields

There are two sides to things, and binding variables are appropriate for most OLTP processing, but there are exceptions. For example, when the field in the Where condition is "italic field".

"Tilt field" means that the vast majority of the values in the column are the same, such as a demographic survey, where the "ethnic" column, more than 90% are Han. So if an SQL statement queries the 30-year-old Han population, then the "nation" must be placed in the Where condition. There is a big problem with binding variable @nation at this time.

Imagine if the first value passed by @nation is "Han", then the entire execution plan will inevitably choose a table scan. Then, the second value in the "Buyi", supposedly "Buyi" accounted for only one out of 10,000 of the percentage, should be indexed to find. However, due to the reuse of the first resolution of the "Han" of the implementation plan, the second time will also use the table scanning method. This problem is known as the "Bound variable Spy", and it is recommended that you do not use bound variables for italic fields.

Begin Tran to be as small as possible.

A SQL statement in SQL Server is a transaction by default, and is the default commit after the statement executes. In fact, this is a minimized form of the BEGIN Tran, which implies a BEGIN tran at the beginning of each sentence, and a commit at the end.
In some cases, we need to explicitly declare the BEGIN TRAN, such as "Insert, delete, change" operations need to modify several tables at the same time, require either a few tables are modified successfully, or are unsuccessful. Begin Tran can play this role by putting a number of SQL statements together and finally a commit. The advantage is that the data is consistent, but nothing is perfect. The cost of the Begin Tran is that all the resources locked by the SQL statements cannot be released until the commit is committed before committing.
It can be seen that if the BEGIN TRAN too many SQL statements, the performance of the database is poor. Before this large transaction commits, it is bound to block other statements, resulting in many blocks.
The principle of the BEGIN Tran is that the less SQL statements that begin TRAN nested, the better it is to ensure data consistency! In some cases, triggers can be used to synchronize data, not necessarily with begin TRAN.

Some SQL query statements should be added Nolock

InSQLstatement, addNolockis to improveSQL Serveran important means of concurrency performance, inOracledo not need to do this becauseOracleThe structure is more reasonable, there isundoThe table space holds the "Data Front Shadow", which is notCommit, then you read that it modifies the previous copy, which is placed in theundothe table space. This way,Oraclethe reading and writing can be done independently, which is alsoOraclea place of popular praise. SQL Serverread and write are blocking each other, in order to improve concurrency performance, for some queries, you can addNolock, this allows writing to be read, but the disadvantage is that it is possible to read uncommitted dirty data. UseNolockhave a3Rule .

(1) The results of the query for "plug, delete, change" Can not add nolock !

(2) The query table belongs to the frequent occurrence of page splitting, with caution with nolock !

(3) The use of temporary tables can be saved as "Data Shadow", to play similar to the undo Table Space of Oracle function,

can use temporary table to improve concurrency performance, do not use nolock .

Add Nolock After query frequently occurs page splitting table, easy to produce skip read or repeat Read

AddNolockcan be in the "plug, delete, change" while the query, but because at the same time "plug, delete, change", in some cases, once the data page is full, then the page splitting is unavoidable, and at this timeNolockthe query is occurring, such as the -pages that have already been read may be divided into101page, which might makeNolockquery in Read101page and reads the data repeatedly, resulting in a "repeat read". Similarly, if the -the data on the page has not been read yet . Aboutpage has gone, thatNolockthe query may have missed the record and generated a "skip read".

The above mentioned buddies, after the addition of Nolock Some operation error, it is estimated that the nolock Query generated repeated reading,2 The same record to insert another table, of course, a primary key conflict occurs.

The clustered index is not built on the order field of the table, and the table is prone to page splitting

For example, the order table, with the order number OrderID, also has the customer number ContactID, then the clustered index should be added to which field? For this table, order numbers are added sequentially, and if you add a clustered index on OrderID , the new rows are added at the end, so it is not easy to generate page splits frequently. However, since most queries are based on the customer number, it makes sense to add the clustered index to the contactid . ContactID is not a sequential field for order tables.

For example "Zhang San" of "ContactID" is 001, then "Zhang San" order information must be placed on the first data page of this table, if today "Zhang San" a New Order, The order information can not be placed on the last page of the table, but the first page! What if the first page is full? I'm sorry, all the data in this table has to be moved backwards to make room for this record.

sql server index and oracle The index is different, sql server oracle index organization table. The clustered index is a form of organization of the table itself, so its efficiency is very high. Because of this, the insertion of a record, its location is not random, but to be placed in the order of the data page, if the data page has no space, it caused the page splitting. So obviously, the clustered index is not built on the order field of the table, and the table is prone to page splitting.

Once there was a situation where a man's table rebuilt the index, the efficiency of the insertion dropped dramatically. Presumably this is the case. The table's clustered index may not be built on the table's Order field, which is often archived, so the table's data is in a sparse state. For example, Zhang San under the order, and the last 3 months of orders only 5 , archiving strategy is to retain 3 months of data, So Zhang San's past orders have been filed, leaving A space of three, which can be re-used when the insert occurs. In this case, page splitting does not occur because there are slots available. However, query performance is relatively low because the query must scan for empty spaces that have no data.

After rebuilding the clustered index, the situation changed, because rebuilding the clustered index is to rearrange the data in the table again, the original vacancy is not, and the page filling rate is very high, the insertion data often have to break the page, so the performance is greatly reduced.

Do you want to give a lower page fill rate for a table with a clustered index that is not built on the sequential field? Do you want to avoid rebuilding the clustered index? is a question worth considering!

Using composite indexes to increase query speed for multiple where conditions

Composite indexes often have better selectivity than a single index. Moreover, it is an index that is specifically set up for a where condition, which is already sorted so that the query is faster than a single index. The boot field for a composite index must be in a "selective high" field. For example, there are 3 fields: Date, gender, age. You see, which field should be used as the boot field? The "date" should obviously be used as the boot field. The date is the highest-selectivity field in 3 fields.

Here is an exception, if the date is also a clustered index of the boot field, you can not build composite index, directly walk the clustered index, the efficiency is relatively high.

Do not put the clustered index into a "composite index", the easier the clustered index, the better, the higher the better! The clustered index consists of 2 fields tolerable. But more than 2 fields, you should consider building 1 self-increment fields as the primary key, the clustered index can not be the master key.

When using like for fuzzy queries, be careful not to use the previous%

Sometimes you need to do some fuzzy queries like

Select * from contact where username like '%yue% '

Keyword %yue%, because the Yue front to use "%", so the query must go full table scan, unless necessary, otherwise do not add% before the keyword ,

Three ways to connect to a SQL Server table

(1) Merge Join

(2) Nested Loop Join

(3) Hash Join

sql server 2000 There is only one join way-- Nested loop join if a a b a x b result set row number. So if two result sets are large, that join The result is very bad.

sql server 2005 new merge join a b join a table result set row number, one is plus, one is multiply, visible is more effective than > much better.

If there is no index on the connected field, the efficiency of the SQL2000 is quite low, and SQL2005 provides a Hash join, which is equivalent to a temporary A , B The result set of the table is indexed, so SQL2005 the efficiency ratio SQL2000 has greatly improved, I think, this is an important reason.

To summarize, pay attention to the following points when connecting tables:

(1) Connection field select the field where the clustered index is located

(2) Carefully consider the where condition to minimize The result set of aandB tables

(3) If many join fields are missing an index, and you are still using SQL2000, do the upgrade .

Row_number can cause a table scan, in other ways such as the top scheme and save large tables in a temporary table would be better

Row_number test results for pagination:
Paging using row_number: CPU time = 317265 milliseconds, elapsed time = 423090 milliseconds
Paging with top+ temp table: CPU time = 1266 milliseconds, elapsed time = 6705 milliseconds

The Row_number implementation is based on order by, and the effect of sorting on the query is obvious.

Other

The use of indexes is limited by the wording

Select * FROM TableName where chgdate +7 < sysdate

Select * FROM TableName where Chgdate < sysdate-7

The former suppresses the index on the Chgdate column, which does not

This is the first to welcome students who love to shoot bricks

Toward Dba[mssql] Improve access performance of databases from the perspective of SQL statements

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.