Optimizing SQL queries: How to write high-performance SQL statements

Last Update:2016-03-02 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1, first to understand what is called the implementation plan?

An execution plan is a query scheme that the database makes based on the statistics of SQL statements and related tables, which are generated automatically by the query optimizer, such as an SQL statement that searches for 1 records from a 100,000-record table, and the query optimizer chooses the "Index lookup" method. If the table is archived and there are currently only 5,000 records left, the query optimizer will change the scenario with a "full table scan" approach.

As can be seen, the execution plan is not fixed, it is "personalized". It is important to produce a correct "execution plan" with two points:

(1) Does the SQL statement clearly tell the query optimizer what it wants to do?

(2) is the database statistics obtained by the query optimizer up-to-date and correct?

2. How to unify SQL statements

For the following two SQL statements, the programmer thinks it is the same, and the database query optimizer thinks it is different.

Select*from Dual Select*from Dual

In fact, the case is different, the Query Analyzer is considered to be two different SQL statements, must be parsed two times. Generate 2 execution plans. So as a programmer, you should ensure that the same query statements everywhere are consistent, one more space is not!

3. Do not write SQL statements too complex

I often see a SQL statement captured from a database that prints 2 sheets of A4 paper so long. In general, such complex statements are usually problematic. I took the 2-page long SQL statement to consult the original author, and he said that the time is too long, he can not understand the moment. It can be imagined that even the original author is likely to look at the confused SQL statement, the database will also look confused.

In general, the result of a SELECT statement as a subset, and then query from that subset, this layer of nested statements is still relatively common, but based on experience, more than 3 layers of nesting, the query optimizer can easily give the wrong execution plan. Because it was dizzy. Like this AI-like things, after all, than the person's resolution is worse, if people are looking dizzy, I can ensure that the database will faint.

In addition, execution plans can be reused, and the simpler SQL statements are more likely to be reused. Complex SQL statements have to be re-parsed as long as one character changes, and then the bulk of the garbage is stuck in memory. It is conceivable how inefficient the database will be.

4. Staging intermediate results using temporary table

An important way to simplify SQL statements is to use temporary tables to stage intermediate results, but the benefits of temporary tables are much more than that, temporary tables are temporarily present in tempdb, and the subsequent queries are in the same place, which avoids multiple scans of the main table in the program and greatly reduces the "shared lock" blocking "update lock" in program execution. Reduces congestion and improves concurrency performance.

5. The OLTP system SQL statement must take a binding variable

Select*from orderheader where Changetime > ' 2010-10-20 00:00:01 ' select*from orderheader where Changetime > ' 2010-09 -22 00:00:01 '

The above two sentences, the query optimizer is considered to be different SQL statements, need to parse two times. If you take a binding variable

Select*from OrderHeader where Changetime > @chgtime

@chgtime variables can pass in any value, so a large number of similar queries can reuse the execution plan, which can greatly reduce the burden on the database parsing SQL statements. One-time analysis, multiple reuse, is the principle of improving database efficiency.

6. Binding variable Spy

There are two sides to things, and binding variables are appropriate for most OLTP processing, but there are exceptions. For example, when the field in the Where condition is "italic field".

"Tilt field" means that the vast majority of the values in the column are the same, such as a demographic survey, where the "ethnic" column, more than 90% are Han. So if an SQL statement queries the 30-year-old Han population, then the "nation" must be placed in the Where condition. There is a big problem with binding variable @nation at this time.

Imagine if the first value passed by @nation is "Han", then the entire execution plan will inevitably choose a table scan. Then, the second value in the "Buyi", supposedly "Buyi" accounted for only one out of 10,000 of the percentage, should be indexed to find. However, due to the reuse of the first resolution of the "Han" of the implementation plan, the second time will also use the table scanning method. This problem is known as the "Bound variable Spy", and it is recommended that you do not use bound variables for italic fields.

7. Use the BEGIN TRAN only if necessary

A SQL statement in SQL Server is a transaction by default, and is the default commit after the statement executes. In fact, this is a minimized form of the BEGIN Tran, which implies a BEGIN tran at the beginning of each sentence, and a commit at the end.

In some cases, we need to explicitly declare the BEGIN TRAN, such as "Insert, delete, change" operations need to modify several tables at the same time, require either a few tables are modified successfully, or are unsuccessful. Begin Tran can play this role by putting a number of SQL statements together and finally a commit. The advantage is that the data is consistent, but nothing is perfect. The cost of the Begin Tran is that all the resources locked by the SQL statements cannot be released until the commit is committed before committing.

It can be seen that if the BEGIN TRAN too many SQL statements, the performance of the database is poor. Before this large transaction commits, it is bound to block other statements, resulting in many blocks.

The principle of the BEGIN Tran is that the less SQL statements that begin TRAN nested, the better it is to ensure data consistency! In some cases, triggers can be used to synchronize data, not necessarily with begin TRAN.

8, some SQL query statements should be added Nolock

Adding NOLOCK in SQL statements is an important way to improve concurrency performance in SQL Server, which is not required in Oracle, because the structure of Oracle is more reasonable, and the undo table space holds the "Data Front Shadow", which if not commit in the modification, So what you read is that it modifies the previous copy, which is placed in the undo table space. In this way, Oracle's read and write can be mutually exclusive, and this is where Oracle is widely praised. SQL Server reads and writes are blocked from each other, in order to improve concurrency performance, for some queries, you can add nolock, so that the reading can be allowed to write, but the disadvantage is that it is possible to read the dirty data uncommitted. There are 3 principles for using NOLOCK.

(1) The results of the query for "plug, delete, change" Can not add nolock!

(2) The query table belongs to the frequent occurrence of page splitting, with caution with NOLOCK!

(3) The use of temporary tables can be saved as "Data Shadow", to play similar to the undo table space of Oracle function,

can use temporary table to improve concurrency performance, do not use NOLOCK.

9, the clustered index is not built on the table in the Order field, the table is prone to page splitting

For example, the order table, with the order number OrderID, also has the customer number ContactID, then the clustered index should be added to which field? For this table, order numbers are added sequentially, and if you add a clustered index on OrderID, the new rows are added at the end, so it is not easy to generate page splits frequently. However, since most queries are based on the customer number, it makes sense to add the clustered index to the ContactID. ContactID is not a sequential field for order tables.

For example "Zhang San" of "ContactID" is 001, then "Zhang San" The order information must be placed on the first data page of this table, if today "Zhang San" a new order, then the order information can not be placed on the last page of the table, but the first page! What if the first page is full? I'm sorry, all the data in this table has to be moved backwards to make room for this record.

The index of SQL Server is different from the index of Oracle, and the clustered index of SQL Server actually sorts the table in the order of the clustered index fields, which is equivalent to the Index organization table of Oracle. The clustered index of SQL Server is a form of organization of the table itself, so its efficiency is very high. Because of this, the insertion of a record, its location is not random, but to be placed in the order of the data page, if the data page has no space, it caused the page splitting. So obviously, the clustered index is not built on the order field of the table, and the table is prone to page splitting.

Once there was a situation where a man's table rebuilt the index, the efficiency of the insertion dropped dramatically. Presumably this is the case. The table's clustered index may not be built on the table's Order field, which is often archived, so the table's data is in a sparse state. For example, Zhang San under 20 orders, and the last 3 months of orders only 5, archiving strategy is to retain 3 months of data, then Zhang San past 15 orders have been archived, leaving 15 empty slots, can be re-exploited when insert occurs. In this case, page splitting does not occur because there are slots available. However, query performance is relatively low because the query must scan for empty spaces that have no data.

After rebuilding the clustered index, the situation changed, because rebuilding the clustered index is to rearrange the data in the table again, the original vacancy is not, and the page filling rate is very high, the insertion data often have to break the page, so the performance is greatly reduced.

Do you want to give a lower page fill rate for a table with a clustered index that is not built on the sequential field? Do you want to avoid rebuilding the clustered index? is a question worth considering!

10, add nolock after the query often occur page splitting table, easy to produce skip read or repeat Read

Add Nolock can be in the "plug, delete, change" while the query, but because at the same time "plug, delete, change", in some cases, once the data page is full, then the page splitting is unavoidable, and at this time nolock query is happening, such as the 100th page has read the record, It is possible to divide the page into page 101th, which may cause the NOLOCK query to read the data repeatedly when it reads 101 pages, resulting in a "repeat read". Similarly, if the data on page 100 has not been read to 99 pages, then the NOLOCK query may have missed the record, resulting in a "skip read."

The above mentioned buddies, after the addition of Nolock some operation error, it is estimated that because the NOLOCK query produced repeated reading, 2 identical records to insert another table, of course, a primary key conflict occurs.

11, use like for fuzzy query should pay attention to

Sometimes you need to do some fuzzy queries like

Select*from contact where username like '%yue% '

Keyword%yue%, because Yue front used "%", so the query must go full table scan, unless necessary, otherwise do not add% before the keyword,

12. Effect of implicit conversion of data type on query efficiency

SQL server2000 database, our program in the submission of SQL statements, not using a strong type to submit the value of this field, the data type is automatically converted by SQL Server 2000, will cause the incoming parameters and primary key field type inconsistent, this time SQL Server 2000 A full table scan may be used. This problem is not found on Sql2005, but it should be noted.

13. Three ways to connect SQL Server tables

(1) Merge Join

(2) Nested Loop Join

(3) Hash Join

SQL Server 2000 has only one join mode--nested Loop join, if a result set is small, then the default as the appearance, a in each record to go to B scan again, actually swept the number of rows equal to a result set row x B result set row number. So if the two result sets are large, the result of the join IS bad.

SQL Server 2005 Added merge Join, if the Join field of table A and table B is exactly the field of the clustered index, then the order of the table is lined up, as long as the two sides of the row, the cost of this Join is equal to the result set row of a table plus the result set row number of B table, one is plus, One is multiply, and the merge join effect is much better than nested Loop join.

If there is no index on the connected field, then the efficiency of the SQL2000 is quite low, and SQL2005 provides a hash join, which is equivalent to a temporary index of the result set of A/b table, so SQL2005 efficiency is much higher than SQL2000, which I think is an important reason.

To summarize, pay attention to the following points when connecting tables:

(1) Connection field select the field where the clustered index is located

(2) Carefully consider the where condition to minimize the result set of a and B tables

(3) If many join fields are missing an index, and you are still using SQL Server 2000, upgrade quickly.

This article link: http://www.cnblogs.com/ATree/archive/2011/02/13/sql_optimize_1.html

Turn from: 51cto

Optimizing SQL queries: How to write high-performance SQL statements

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Optimizing SQL queries: How to write high-performance SQL statements

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Optimizing SQL queries: How to write high-performance SQL statements

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support