Optimizes SQL query statements summarized by DBAs over the past 10 years

Last Update:2017-01-13 Source: Internet

Author: User

Tags bind commit hash join

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Directory

1. What is an execution plan? The information on which the execution plan depends.
2. Unify SQL statement writing to Reduce parsing overhead
3. Reduce SQL statement nesting
4. Use "temporary table" to save intermediate results
5. The SQL statement of the OLTP system must use the bound variable.
6. Preview the variable bound to the skewed field
7. The begin tran transaction should be as small as possible.
8. Add nolock to some SQL query statements.
9. After nolock is added, query tables that frequently split pages will easily generate skip or duplicate reads.
10. The clustered index is not created on the table's ordered field. This table is prone to page split.
11. Use composite indexes to increase the query speed of multiple where conditions
13. When using like for fuzzy search, try not to use the first %
14. SQL Server table connection methods
15. Row_number may cause table scanning. It is better to use temporary table paging.

What is an execution plan? The information on which the execution plan depends.

An execution plan is a query solution made by the database based on the statistics of SQL statements and related tables. This solution is automatically generated by the query optimizer, for example, if an SQL statement is used to query one record from a 0.1 million-record table, the query optimizer selects the "index search" method. If the table is archived, currently, there are only 5000 records left, and the query optimizer will change the solution by using the "full table scan" method.

Obviously, the execution plan is not fixed. It is "personalized ". It is important to generate a correct "execution plan:
Does the SQL statement clearly tell the query optimizer what it wants?
Is the database statistics obtained by the query optimizer up-to-date and correct?

Unified SQL statement writing reduces parsing overhead

For the following two SQL statements, the programmer thinks they are the same, and the database query optimizer may think they are different.

Select * from dual
Select * From dual

In fact, the query analyzer considers it to be two different SQL statements and must be parsed twice. Generate two execution plans. Therefore, as a programmer, make sure that the same query statement is consistent anywhere, and that no space can be added!

Reduce SQL statement nesting

I often see that two sheets of A4 paper are printed out of an SQL statement captured from the database. Generally, such complex statements are problematic. I took the two-page long SQL statement to consult the original author. As a result, he said that the time was too long and he could not understand it for a moment. As you can imagine, even the original author may be confused about SQL statements, and the database may also be confused.

Generally, the result of a Select statement is used as a subset and then queried from this subset. This layer of nested statements is still common. However, based on experience, more than three layers of nesting are supported, the query optimizer can easily give an incorrect execution plan. It is dizzy. Something similar to artificial intelligence is worse than human resolution after all. If people are dizzy, I can ensure that the database will also be dizzy.

In addition, execution plans can be reused. The Simpler SQL statements, the higher the possibility of reuse. Complex SQL statements must be re-parsed as long as there is a change in the character, and then put this pile of garbage in the memory. We can imagine how low the database efficiency will be.

Use "temporary table" to save intermediate results

An important way to simplify SQL statements is to use temporary tables to store intermediate results. However, temporary tables have far more benefits than this. Temporary results are saved to temporary tables, the subsequent query is in tempdb, which can avoid scanning the master table multiple times in the program. It also greatly reduces the blocking of the "update lock" in program execution and reduces blocking, improves concurrency performance.

The SQL statement of the OLTP system must use the bound variable.

Select * from orderheader where changetime> '2017-10-20 00:00:01'
Select * from orderheader where changetime> '2017-09-22 00:00:01'
The query optimizer considers the preceding two statements as different SQL statements and needs to be parsed twice. If you bind a variable
Select * from orderheader where changetime> @ chgtime
The @ chgtime variable can input any value, so that a large number of similar queries can reuse the execution plan, which can greatly reduce the burden on the database to parse SQL statements. One resolution and multiple reuse are principles for improving database efficiency.

How to bind skewed fields to variables

Everything has two sides. Variable binding is applicable to most OLTP processing, but there are exceptions. For example, when the field in the where condition is "skewed field.

"Skewed field" indicates that the vast majority of values in this column are the same. For example, in a population survey table, more than 90% of the values in the "nationality" column are Han nationality. If an SQL statement is used to query the population of the 30-year-old Han nationality, the "nationality" column must be placed in the where condition. At this time, if you bind the variable @ nation, there will be a big problem.

Imagine if the first value passed in by @ nation is "Han nationality", then the entire execution plan will inevitably select table scanning. Then, the second value is passed into the Buyi family. It is reasonable to say that the proportion of the Buyi family may be only one in one thousandth. Index search should be used. However, because the execution plan of the first parsed "Han" is reused, the second scan will also be performed using the table scan method. This is the famous "bind variable Preview". We recommend that you do not bind variables to "skewed fields.

The transactions of tran in tran should be as small as possible

By default, an SQL statement in SQL Server is a transaction, and is committed by default after the statement is executed. In fact, this is a minimal form of tran in tran. For example, a begin tran is hidden at the beginning of each statement, and a commit is hidden at the end.
In some cases, we need to explicitly declare begin tran. For example, to perform the "insert, delete, and modify" operation, we need to modify several tables at the same time, either of which must be successfully modified, either it fails. Begin tran can play such a role, it can set a number of SQL statements for execution together, and finally commit together. The advantage is that data consistency is ensured, but nothing is perfect. The price paid by tran in tran is that all resources locked by SQL statements cannot be released until the commit is completed.
It can be seen that if Begin tran has too many SQL statements, the database performance will be poor. Before a large transaction is committed, other statements will be blocked, resulting in a lot of blocks.
The principle used by tran in tran is that, while ensuring data consistency, the fewer SQL statements occupied by Begin tran, the better! In some cases, you can use a trigger to synchronize data, instead of using tran in tran.

Nolock should be added to some SQL query statements

Adding nolock to SQL statements is an important means to improve SQL Server's concurrent performance. This is not required in oracle because the oracle structure is more reasonable and the undo tablespace stores the "data shadow ", if the data has not been commit in the modification, you will read the copy before the modification, which is placed in the undo tablespace. In this way, oracle reading and writing can not affect each other, which is also widely praised by oracle. SQL Server reads and writes are mutually blocked. To improve the concurrency performance, you can add nolock to some queries so that you can allow write during read, however, it is possible to read uncommitted dirty data. There are three principles for using nolock.

(1) nolock cannot be added if the query result is used for "insertion, deletion, and modification!
(2) the queried table is frequently split by Pages. Use nolock with caution!
(3) the temporary table can be used to save the "data Shadow", which is similar to the oracle undo tablespace function,

Use a temporary table to improve the concurrency performance. Do not use nolock.

When nolock is added, tables that frequently split pages are queried, which may generate skip or duplicate read.

After nolock is added, you can perform queries during "insertion, deletion, and modification". However, because "insertion, deletion, and modification" occur at the same time, in some cases, once the data page is full, the page split is inevitable, and the nolock query is happening. For example, records that have been read on page 100th may be split into 101st pages, this may cause the nolock query to read the data repeatedly when reading the 101 page, resulting in "repeated read ". Similarly, if the data on the 100 page has not been read, it will be divided into 99 pages, then the nolock query may miss this record and generate a "skip read ".

As mentioned above, when nolock is added, some operations may report errors. It is estimated that the nolock query produces repeated reads and two identical records are inserted into other tables, of course, a primary key conflict occurs.

The clustered index is not created on the table's ordered field. This table is prone to page splitting.

For example, if the order table contains the order number orderid and the customer number contactid, which field should the clustered index be added? For this table, order numbers are sequentially added. If you add clustered indexes to orderid, the newly added rows are all added at the end, which is not prone to page splitting. However, since most queries are based on customer numbers, adding clustered indexes to contactid makes sense. Contactid is not an order field for the order table.

For example, if "Michael"'s "contactid" is 001, the order information of "Michael" must be placed on the first data page of this table, if you place a new order in "Michael" today, the order information cannot be placed on the last page of the table, but on the first page! What if the first page is full? Sorry, all the data in this table must be moved back to the place where this record is located.

The index of SQL Server is different from that of Oracle. The index of SQL Server is actually sorted by the order of clustered index fields, which is equivalent to the index organization table of oracle. The clustered index of SQL Server is an organizational form of the table, so it is very efficient. For this reason, the position of a record is not just put, but in order placed on the data page. If the data page has no space, the page is split. Therefore, it is clear that the clustered index is not created on the table's sequence field, and the table is prone to page split.

I have encountered a situation where, after a buddy's table re-creates an index, the insertion efficiency is greatly reduced. This is probably the case. The clustered index of the table may not be created on the sequence field of the table, and the table is often Archived. Therefore, the data in the table exists in a sparse state. For example, if Michael has placed 20 orders and there are only 5 orders in the last three months, and the archive policy is to keep data for three months, then Michael has Archived 15 orders in the past, leave 15 vacancies, which can be reused when insert occurs. In this case, no page splitting will occur because there is space available for use. However, the query performance is relatively low, because the blank space without data must be scanned during the query.

After the clustered index is re-built, the situation changes, because the re-built clustered index is to re-arrange the data in the table. The original vacancy is gone, and the page filling rate is very high, page splitting often occurs when data is inserted, so the performance is greatly reduced.

If the clustered index is not created on the sequence field, do you want to give a relatively low page filling rate? Do you want to avoid rebuilding clustered indexes? Is a question worth consideration!

Use composite indexes to increase the query speed of multiple where conditions

Composite indexes generally have better selectivity than a single index. Moreover, it is an index especially for a where condition. It has been sorted, so the query speed is faster than that of a single index. The bootstrap field of the composite index must use a "highly selective" field. For example, there are three fields: date, gender, and age. Which field should be used as the guiding field? Obviously, "date" should be used as the guiding field. Date is the most selective field among the three fields.

There is an exception here. If the date is also the guiding field of the clustered index, you can directly go through the clustered index without creating a composite index, and the efficiency is also relatively high.

Do not create a clustered index as a "composite index". The simpler the clustered index, the better the choice! Clustered indexes include two fields that are acceptable. However, if there are more than two fields, you should consider creating one auto-incrementing field as the primary key. Clustered indexes do not have the primary key.

When using like for fuzzy search, be sure not to use the first %

Sometimes you need to perform some fuzzy queries, such

Select * from contact where username like '% yue %'

Keyword % yue %, because "%" is used before yue, the query must undergo full table scan. Unless necessary, do not add % before the keyword,

SQL Server table connection methods

(1) Merge Join
(2) Nested Loop Join
(3) Hash Join

SQL Server 2000 has only one join method-Nested Loop Join. If the result set of A is small, it is used as the external table by default. Each record of A must be scanned in B, the number of rows actually scanned is equivalent to the number of rows in result set A x the number of rows in result set B. Therefore, if both result sets are large, the Join result is very bad.

In SQL Server 2005, Merge Join is added. If the Join fields of Table A and Table B are exactly the fields where the clustered index is located, the order of the table is well arranged, as long as the two sides are spelled together, the overhead of this join operation is equivalent to adding the number of rows in the result set of Table A to the number of rows in the result set of Table B. One is addition and the other is multiplication. It can be seen that the effect of merge join is much better than that of Nested Loop Join.

If no index is available for the Connected fields, the efficiency of SQL2000 is quite low, while SQL2005 provides Hash join, which is equivalent to temporarily adding an index to the result set of Table A and Table B, therefore, SQL2005 is much more efficient than SQL2000. I think this is an important reason.

To sum up, pay attention to the following points during table join:

(1) select fields where the clustered index is located as far as possible.
(2) carefully consider the where condition to minimize the result set of tables A and B.
(3) if many join fields are missing indexes, and you are still using SQL2000, please upgrade it.

Row_number may cause table scanning. It is better to use temporary table paging.

Test results of ROW_Number paging:
Use ROW_Number for paging: CPU time = 317265 MS, occupied time = 423090 MS
Use temporary tables for paging: CPU time = 1266 MS, time occupied = 6705 MS

ROW_Number is implemented based on order by, which has obvious impact on queries.

Others

For example, some write methods limit the use of indexes.

Select * from tablename where chgdate + 7 <sysdate
Select * from tablename where chgdate <sysdate-7

This article will first show you how to make a brick

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Optimizes SQL query statements summarized by DBAs over the past 10 years

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Optimizes SQL query statements summarized by DBAs over the past 10 years

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support