Optimizing SQL queries: How to write high-performance SQL statements

Source: Internet
Author: User

1. First of all to understand what is called the implementation plan?

An execution plan is a query that the database makes based on the statistics of SQL statements and related tables, which are generated automatically by the query optimizer, such as an SQL statement that, if used to check 1 records from a 100,000-record table, the query optimizer chooses the "Index lookup" method. If the table is archived and there are currently only 5,000 records left, the query optimizer will change the scenario with a "full table scan" approach.

As can be seen, the execution plan is not fixed, it is "personalized". It is important to produce a correct "execution plan" with two points:

(1) Does the SQL statement clearly tell the query optimizer what it wants to do?

(2) is the database statistics obtained by the query optimizer current and correct?

2. How to unify SQL statements

Select *from Dual

SELECT * FROM dual

In fact, the case is different, the Query Analyzer is considered to be two different SQL statements, must be parsed two times. Generate 2 execution plans. So as Cheng, you should ensure that the same query statements in any place are consistent, one more space is not!

3. Don't write SQL statements too complex

I often see a SQL statement captured from a database that prints 2 sheets of A4 paper so long. In general, such complex statements are usually problematic. I took the 2-page long SQL statement to consult the original author, and he said that the time is too long, he can not understand the moment. It can be imagined that even the original author is likely to look at the confused SQL statement, the database will also look confused.

In general, it is common to use the result of a SELECT statement as a subset and then query from that subset, but with experience, more than 3 layers of nesting, the query optimizer can easily give the wrong execution plan. Because it was dizzy. Like this AI-like learning Ono A system, after all, than the person's resolution is worse, if people are looking dizzy, I can ensure that the database will faint.

In addition, execution plans can be reused, and the simpler SQL statements are more likely to be reused. Complex SQL statements have to be re-parsed as long as one character changes, and then the bulk of the garbage is stuck in memory. It is conceivable how inefficient the database will be.

4. Staging intermediate results using temporary tables

An important way to simplify SQL statements is to use temporary tables to stage intermediate results, but the benefits of temporary tables are much more than that, temporary tables are temporarily present in tempdb, and the subsequent queries are in the same place, which avoids multiple scans of the main table in the program and greatly reduces the "shared lock" blocking "update lock" in program execution. Reduces congestion and improves concurrency performance.

5.OLTP System SQL statements must take bound variables

SELECT * from OrderHeader where changetime> ' 2010-10-20 00:00:01 '

SELECT * from OrderHeader where changetime> ' 2010-09-22 00:00:01 '

The above two sentences, the query optimizer people think is different SQL statement, need to parse two times. If you take a binding variable

Slect * from OrderHeader where changetime> @changetime

@changetime variables can pass in any value, so a large number of similar queries can reuse the execution plan, which can greatly reduce the burden on the database parsing SQL statements. One-time analysis, multiple reuse, is the principle of improving database efficiency.

6. Binding variable Spy

There are two sides to transactions, and binding variables are applicable to most OLTP processing, but there are exceptions. For example, when the field in the Where condition is "italic field".

"Tilt field": means that most of the values in the column are the same, such as a Kangkou questionnaire, where the "nation" column, more than 90% are Han. So if an SQL statement queries the 30-year-old Han population, then the "nation" must be placed in the Where condition. There is a big problem with binding variable @nation at this time.

Imagine if the first value passed by @nation is "Han", then the entire execution plan will inevitably choose a table scan. Then, the second value in the "Buyi", supposedly "Buyi" accounted for only one out of 10,000 of the percentage, should be indexed to find. However, due to the reuse of the first resolution of the "Han" of the implementation plan, the second time will also use the table scanning method. This problem is known as the "Bound variable Spy", and it is recommended that you do not use bound variables for italic fields.

7. Use the BEGIN TRAN only if necessary

A SQL statement in SQL Server is a transaction by default, and is the default commit after the statement executes. In fact, this is a minimized form of the BEGIN Tran, which implies a BEGIN tran at the beginning of each sentence, and a commit at the end.

In some cases, we need to display a declaration of begin Tran, such as "plug, delete, change" operation needs to modify several tables at the same time, require either a few tables are modified successfully, or not succeed. Begin Tran can play this role by putting a number of SQL statements together and finally a commit. The advantage is that the data is consistent, but nothing is perfect. The price at Begin Tran is that all the resources locked by the SQL statements cannot be released until the commit, knowing that a commit is lost.

It can be seen that if the BEGIN TRAN too many SQL statements, the performance of the database is poor. Before this large transaction commits, it is bound to block other statements, resulting in many blocks.

The principle of the BEGIN TRAN is that the SQL statement that begin Tran is getting better with the consistency of the data! In some cases, triggers can be used to synchronize data, not necessarily with begin TRAN.

8. Some SQL query statements should be added Nolock

Adding Onlock in SQL statements is an important way to improve concurrency performance in SQL Server, which is not required in Oracle, because the structure of Oracle is more reasonable, and the undo table space holds the "Data Front Shadow", which if not commit in the modification, So what you read is that it modifies the previous copy, which is placed in the undo table space. In this way, Oracle's read and write can be mutually exclusive, and this is where Oracle is widely praised. SQL Server read, write is blocking each other, in order to improve concurrency performance, for some queries, you can add nolock, so read can be allowed to write, but the disadvantage is that it is possible to read uncommitted dirty data. There are 3 principles for using NOLOCK:

(1) The result of the query is used to "plug in." Delete, change "Cannot add nolock

(2) The query table belongs to the frequent occurrence of page splitting, with caution nolock

(3) Use temporary table as can save "data front shadow", play similar to Oracle's Undo table Space function

can use temporary table to improve concurrency performance, do not use NOLOCK.

9. The clustered index is not built on the order field of the table, and the table is prone to page splitting

For example, the order table, with the order number OrderID, also has the customer number ContactID, then the clustered index should be loaded on which field? For this table, order numbers are added sequentially, and if you add a clustered index to the OrderID, the new rows are added at the end, so it is not easy to tangle the pages frequently. However, since most queries are based on the customer number, it makes sense to load the clustered index on ContactID. ContactID is not a sequential field for order tables.

For example "Zhang San" of "ContactID" is 001, then "Zhang San" The order information must be placed on the first data page of this table, if today "Zhang San" a new order, then the order information can not be placed in the last table, but the first page! And if the first one is full? I'm sorry, all the data in this table has to be moved backwards to make room for this record.

The index of SQL Server is different from the index of Oracle, and the clustered index of SQL Server actually sorts the table in the order of the clustered index fields, which is equivalent to the Index organization table of Oracle. SQL Server's clustered index is an organizational form of the table itself, so its efficiency is very high. Because of this, the insertion of a record, its location is not random, but to be placed in the order of the data page, if the data page has no space, it caused the page splitting. So obviously, the clustered index is not built on the order field of the table, and the table is prone to page splitting.

Once there was a situation where a man's table rebuilt the index, the efficiency of the insertion dropped dramatically. Presumably this is the case. The table's clustered index may not be built on the table's Order field, which is often archived, so the table's data is in a sparse state. For example, Zhang San under 20 orders, and the last 3 months of orders only 5, archiving strategy is to retain 3 months of data, then Zhang San past 15 orders have been archived, leaving 15 empty slots, can be re-used when the insert occurs. In this case, page splitting does not occur because there are slots available. However, the query performance is relatively low because the query must scan for empty space with no data.

After rebuilding the clustered index, the situation changed, because rebuilding the clustered index is to rearrange the data in the table again, the original vacancy is not, and the page filling rate is very high, the insertion data often have to break the page, so the performance is greatly reduced.

Do you want to give a lower page fill rate for a table with a clustered index that is not built on the sequential field? Do you want to avoid rebuilding the clustered index? is a question worth considering!

10. Add Nolock After query frequently occurs page splitting table, easy to produce skip read or repeat Read

Add Nolock can then "plug, Delete, change" at the same time to make inquiries, but because at the same time "check, delete, change", in some cases, once the data page is full, then the page splitting is unavoidable, and at this time nolock query is happening, such as the 100th page has read the record, It is possible to divide the page into page 101th, which may cause the NOLOCK query to read the data repeatedly when it reads 101 pages, resulting in a "repeat read". Similarly, if the data on page 100 has not been read to 99 pages, then the NOLOCK query may have missed the record to produce a "skip read."

The above mentioned buddies, after the addition of Nolock some operation error, it is estimated that because the NOLOCK query produced repeated reading, 2 identical records inserted into another table, of course, a primary key conflict occurs.

11. When using like for fuzzy queries, you should be aware that

Sometimes you need to do some fuzzy queries, such as:

SELECT * FROM contact where username like '%yue% '

Keyword%yue%, because Yue in front of the use of%, so the query must go full table scan, unless necessary, otherwise do not add% before the keyword.

12. Effect of implicit conversion of data types on query efficiency

SQL server2000 database, our program in the submission of SQL statements, not using a strong type to submit the value of this field, the SQL server2000 automatically convert the data type, will cause the incoming parameters and primary key field type inconsistent, this time SQL Server 2000 A full table scan may be used. This problem is not found on Sql2005, but it should be noted.

Three ways to connect 13.SQL server tables

(1) Merge Join

(2) Nested Loop Join

(3) Hash Join

SQL Server2000 has only one join method---Nested loopjoin, if a result set is small, then the default as the outside, a in each record to go to B scan again, actually swept the number of rows is equivalent to a result set row x B result set row number. So if the two result sets are large, the result of the join IS bad.

SQL Server2005 New Merge Join, if the Join field of table A and table B is exactly the field of the clustered index, then the order of the table is lined up, as long as the two sides of the row, the cost of this Join is equal to the result set row of a table plus the result set row number of B table, one is plus, One is multiply, and the merge join effect is much better than nested Loop join.

If there is no index on the connected field, then the efficiency of the SQL2000 is quite low, and SQL2005 provides a hash Join, which is equivalent to a temporary index of the result set of A/b table, so SQL2005 efficiency is much higher than SQL2000, which I think is an important reason.

To summarize, pay attention to the following points when connecting tables:

(1) Connection field select the field where the clustered index is located

(2) Carefully consider the where condition to minimize the result set of A/b table

(3) If many join fields are missing the index, and you are still using SQL Server2000, please upgrade it quickly.

Http://www.cnblogs.com/ATree/archive/2011/02/13/sql_optimize_1.html

Optimizing SQL queries: How to write high-performance SQL statements

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.