SQL query Statement optimization

Source: Internet
Author: User


Recently the company has come to a very tiger DBA 10几 years of experience here is called Cai Teacher. Let us share the precious wealth that Cai Lao has brought to us, with the consent of our Chua. Welcome other DBAs to shoot bricks


1. What is the execution plan? The execution plan is dependent on what information.
2, unified SQL statement to reduce parsing overhead
3. Reduce the nesting of SQL statements
4. Staging intermediate results using temporary table
5. The OLTP system SQL statement must take a binding variable
6. The spy problem of the bound variable of the tilted field
7. Begin Tran to be as small as possible.
8, some SQL query statements should be added Nolock
9, add nolock after the query often occur page splitting table, easy to produce skip read or repeat Read
10, the clustered index is not built on the table in the Order field, the table is prone to page splitting
11. Use composite index to improve query speed for multiple where conditions
13, use like for fuzzy query should be careful to try not to use the former%
14. Three ways to connect SQL Server tables
15, Row_number will cause the table scan, with temporary table paging better

What is an execution plan? The execution plan is dependent on what information.

An execution plan is a query scheme that the database makes based on the statistics of SQL statements and related tables, which are generated automatically by the query optimizer, such as an SQL statement that searches for 1 records from a 100,000-record table, and the query optimizer chooses the "Index lookup" method. If the table is archived and there are currently only 5,000 records left, the query optimizer will change the scenario with a "full table scan" approach.

As can be seen, the execution plan is not fixed, it is "personalized". It is important to produce a correct "execution plan" with two points:
Does the SQL statement clearly tell the query optimizer what it wants to do?
Is the database statistics that the query optimizer gets up-to-date and correct?

The syntax of unified SQL statements reduces parsing overhead

For the following two SQL statements, the programmer considers the same, and the database query optimizer might consider it different.


Select * from dual

In fact, the case is different, the Query Analyzer is considered to be two different SQL statements, must be parsed two times. Generate 2 execution plans. So as a programmer, you should ensure that the same query statements everywhere are consistent, one more space is not!

Reduce the nesting of SQL statements

I often see a SQL statement captured from a database that prints 2 sheets of A4 paper so long. In general, such complex statements are usually problematic. I took the 2-page long SQL statement to consult the original author, and he said that the time is too long, he can not understand the moment. It can be imagined that even the original author is likely to look at the confused SQL statement, the database will also look confused.

In general, the result of a SELECT statement as a subset, and then query from that subset, this layer of nested statements is still relatively common, but based on experience, more than 3 layers of nesting, the query optimizer can easily give the wrong execution plan. Because it was dizzy. Like this AI-like things, after all, than the person's resolution is worse, if people are looking dizzy, I can ensure that the database will faint.

In addition, execution plans can be reused, and the simpler SQL statements are more likely to be reused. Complex SQL statements have to be re-parsed as long as one character changes, and then the bulk of the garbage is stuck in memory. It is conceivable how inefficient the database will be.

Staging intermediate results using temporary tables

An important way to simplify SQL statements is to use temporary tables to stage intermediate results, but the benefits of temporary tables are much more than that, temporary tables are temporarily present in tempdb, and the subsequent queries are in the same place, which avoids multiple scans of the main table in the program and greatly reduces the "shared lock" blocking "update lock" in program execution. Reduces congestion and improves concurrency performance.

OLTP system SQL statements must take bound variables

SELECT * from OrderHeader where changetime > ' 2010-10-20 00:00:01 '
SELECT * from OrderHeader where changetime > ' 2010-09-22 00:00:01 '
The above two sentences, the query optimizer is considered to be different SQL statements, need to parse two times. If you take a binding variable
SELECT * from OrderHeader where changetime > @chgtime
@chgtime variables can pass in any value, so a large number of similar queries can reuse the execution plan, which can greatly reduce the burden on the database parsing SQL statements. One-time analysis, multiple reuse, is the principle of improving database efficiency.

Spy problems with bound variables for skewed fields

There are two sides to things, and binding variables are appropriate for most OLTP processing, but there are exceptions. For example, when the field in the Where condition is "italic field".

"Tilt field" means that the vast majority of the values in the column are the same, such as a demographic survey, where the "ethnic" column, more than 90% are Han. So if an SQL statement queries the 30-year-old Han population, then the "nation" must be placed in the Where condition. There is a big problem with binding variable @nation at this time.

Imagine if the first value passed by @nation is "Han", then the entire execution plan will inevitably choose a table scan. Then, the second value in the "Buyi", supposedly "Buyi" accounted for only one out of 10,000 of the percentage, should be indexed to find. However, due to the reuse of the first resolution of the "Han" of the implementation plan, the second time will also use the table scanning method. This problem is known as the "Bound variable Spy", and it is recommended that you do not use bound variables for italic fields.

Begin Tran to be as small as possible.

A SQL statement in SQL Server is a transaction by default, and is the default commit after the statement executes. In fact, this is a minimized form of the BEGIN Tran, which implies a BEGIN tran at the beginning of each sentence, and a commit at the end.
In some cases, we need to explicitly declare the BEGIN TRAN, such as "Insert, delete, change" operations need to modify several tables at the same time, require either a few tables are modified successfully, or are unsuccessful. Begin Tran can play this role by putting a number of SQL statements together and finally a commit. The advantage is that the data is consistent, but nothing is perfect. The cost of the Begin Tran is that all the resources locked by the SQL statements cannot be released until the commit is committed before committing.
It can be seen that if the BEGIN TRAN too many SQL statements, the performance of the database is poor. Before this large transaction commits, it is bound to block other statements, resulting in many blocks.
The principle of the BEGIN Tran is that the less SQL statements that begin TRAN nested, the better it is to ensure data consistency! In some cases, triggers can be used to synchronize data, not necessarily with begin TRAN.

Some SQL query statements should be added Nolock

Adding NOLOCK in SQL statements is an important way to improve concurrency performance in SQL Server, which is not required in Oracle, because the structure of Oracle is more reasonable, and the undo table space holds the "Data Front Shadow", which if not commit in the modification, So what you read is that it modifies the previous copy, which is placed in the undo table space. In this way, Oracle's read and write can be mutually exclusive, and this is where Oracle is widely praised. SQL Server reads and writes are blocked from each other, in order to improve concurrency performance, for some queries, you can add nolock, so that the reading can be allowed to write, but the disadvantage is that it is possible to read the dirty data uncommitted. There are 3 principles for using NOLOCK.

(1) The results of the query for "plug, delete, change" Can not add nolock!

(2) The query table belongs to the frequent occurrence of page splitting, with caution with NOLOCK!

(3) The use of temporary tables can be saved as "Data Shadow", to play similar to the undo table space of Oracle function,

can use temporary table to improve concurrency performance, do not use NOLOCK.

Add Nolock After query frequently occurs page splitting table, easy to produce skip read or repeat Read

Add Nolock can be in the "plug, delete, change" while the query, but because at the same time "plug, delete, change", in some cases, once the data page is full, then the page splitting is unavoidable, and at this time nolock query is happening, such as the 100th page has read the record, It is possible to divide the page into page 101th, which may cause the NOLOCK query to read the data repeatedly when it reads 101 pages, resulting in a "repeat read". Similarly, if the data on page 100 has not been read to 99 pages, then the NOLOCK query may have missed the record, resulting in a "skip read."

The above mentioned buddies, after the addition of Nolock some operation error, it is estimated that because the NOLOCK query produced repeated reading, 2 identical records to insert another table, of course, a primary key conflict occurs.

The clustered index is not built on the order field of the table, and the table is prone to page splitting

For example, the order table, with the order number OrderID, also has the customer number ContactID, then the clustered index should be added to which field? For this table, order numbers are added sequentially, and if you add a clustered index on OrderID, the new rows are added at the end, so it is not easy to generate page splits frequently. However, since most queries are based on the customer number, it makes sense to add the clustered index to the ContactID. ContactID is not a sequential field for order tables.

For example "Zhang San" of "ContactID" is 001, then "Zhang San" The order information must be placed on the first data page of this table, if today "Zhang San" a new order, then the order information can not be placed on the last page of the table, but the first page! What if the first page is full? I'm sorry, all the data in this table has to be moved backwards to make room for this record.

The index of SQL Server is different from the index of Oracle, and the clustered index of SQL Server actually sorts the table in the order of the clustered index fields, which is equivalent to the Index organization table of Oracle. The clustered index of SQL Server is a form of organization of the table itself, so its efficiency is very high. Because of this, the insertion of a record, its location is not random, but to be placed in the order of the data page, if the data page has no space, it caused the page splitting. So obviously, the clustered index is not built on the order field of the table, and the table is prone to page splitting.

Once there was a situation where a man's table rebuilt the index, the efficiency of the insertion dropped dramatically. Presumably this is the case. The table's clustered index may not be built on the table's Order field, which is often archived, so the table's data is in a sparse state. For example, Zhang San under 20 orders, and the last 3 months of orders only 5, archiving strategy is to retain 3 months of data, then Zhang San past 15 orders have been archived, leaving 15 empty slots, can be re-exploited when insert occurs. In this case, page splitting does not occur because there are slots available. However, query performance is relatively low because the query must scan for empty spaces that have no data.

After rebuilding the clustered index, the situation changed, because rebuilding the clustered index is to rearrange the data in the table again, the original vacancy is not, and the page filling rate is very high, the insertion data often have to break the page, so the performance is greatly reduced.

Do you want to give a lower page fill rate for a table with a clustered index that is not built on the sequential field? Do you want to avoid rebuilding the clustered index? is a question worth considering!

Using composite indexes to increase query speed for multiple where conditions

Composite indexes often have better selectivity than a single index. Moreover, it is an index that is specifically set up for a where condition, which is already sorted so that the query is faster than a single index. The boot field for a composite index must be in a "selective high" field. For example, there are 3 fields: date, gender, age. You see, which field should be used as the boot field? The "date" should obviously be used as the boot field. The date is the highest-selectivity field in 3 fields.

Here is an exception, if the date is also a clustered index of the boot field, you can not build composite index, directly walk the clustered index, the efficiency is relatively high.

Do not put the clustered index into a "composite index", the easier the clustered index, the better, the higher the better! The clustered index consists of 2 fields tolerable. But more than 2 fields, you should consider building 1 self-increment fields as the primary key, the clustered index can not be the master key.

When using like for fuzzy queries, be careful not to use the previous%

Sometimes you need to do some fuzzy queries like

Select * from contact where username like '%yue% '

Keyword%yue%, because Yue front used "%", so the query must go full table scan, unless necessary, otherwise do not add% before the keyword,

Three ways to connect to a SQL Server table

(1) Merge Join

(2) Nested Loop Join

(3) Hash Join

SQL Server 2000 has only one join mode--nested Loop join, if a result set is small, then the default as the appearance, a in each record to go to B scan again, actually swept the number of rows equal to a result set row x B result set row number. So if the two result sets are large, the result of the join IS bad.

SQL Server 2005 Added merge Join, if the Join field of table A and table B is exactly the field of the clustered index, then the order of the table is lined up, as long as the two sides of the row, the cost of this Join is equal to the result set row of a table plus the result set row number of B table, one is plus, One is multiply, and the merge join effect is much better than nested Loop join.

If there is no index on the connected field, then the efficiency of the SQL2000 is quite low, and SQL2005 provides a hash join, which is equivalent to a temporary index of the result set of A/b table, so SQL2005 efficiency is much higher than SQL2000, which I think is an important reason.

To summarize, pay attention to the following points when connecting tables:

(1) Connection field select the field where the clustered index is located

(2) Carefully consider the where condition to minimize the result set of a and B tables

(3) If many join fields are missing an index, and you are still using SQL2000, do the upgrade.

Row_number causes a table scan, which is better for paging with temporary tables

Row_number test results for pagination:
Paging using row_number: CPU time = 317265 milliseconds, elapsed time = 423090 milliseconds
Paging using temporary tables: CPU time = 1266 milliseconds, elapsed time = 6705 milliseconds

The Row_number implementation is based on order by, and the effect of sorting on the query is obvious.


The use of indexes is limited by the wording

Select * FROM TableName where chgdate +7 < sysdate

Select * FROM TableName where Chgdate < sysdate-7

This is the first to welcome students who love to shoot bricks

SQL query Statement optimization

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.