Mass data Query code optimization--Reprint

Last Update:2015-07-21 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Specific to note:
1. Avoid null-valued fields in the WHERE clause, which will cause the engine to discard full-table scans using the index, such as:
Select ID from t where num is null
You can set the default value of 0 on NUM, make sure that the NUM column in the table does not have a null value, and then query:
Select ID from t where num=0

2. Try to avoid using the! = or <> operator in the WHERE clause, or discard the engine for a full table scan using the index. The optimizer will not be able to determine the number of rows to be fatal by the index, so it needs to search all rows of that table.

3. You should try to avoid using or in the WHERE clause to join the condition, otherwise it will cause the engine to abandon using the index for a full table scan, such as:
Select ID from t where num=10 or num=20
You can query this:
Select ID from t where num=10
UNION ALL
Select ID from t where num=20

4.in and not in are also used sparingly, because in makes the system unusable with indexes and can only search the data in the table directly. Such as:
Select ID from t where num in
For consecutive values, you can use between instead of in:
Select ID from t where num between 1 and 3

5. Try to avoid searching in indexed character data using non-heading letters. This also makes the engine unusable with indexes.
See the following example:
SELECT * from T1 WHERE NAME like '%l% '
SELECT * from T1 WHERE substing (name,2,1) = ' L '
SELECT * from T1 WHERE NAME like ' l% '
Even though the name field is indexed, the first two queries are still unable to take advantage of the indexing to speed up the operation, and the engine has to perform the task by one-by-one operations on all tables. The third query can use an index to speed up operations.

6. Forcing the query optimizer to use an index if necessary, such as using parameters in the WHERE clause, can also cause a full table scan. Because SQL resolves local variables only at run time, the optimizer cannot defer the selection of access plans to run time; it must be selected at compile time. However, if an access plan is established at compile time, the value of the variable is still unknown and therefore cannot be selected as an input for the index. The following statement will perform a full table scan:
Select ID from t where [email protected]
You can force the query to use the index instead:
Select ID from T with (index name) where [email protected]

7. You should try to avoid expression operations on the fields in the WHERE clause, which will cause the engine to discard the full table scan using the index. Such as:
SELECT * from T1 WHERE f1/2=100
should read:
SELECT * from T1 WHERE f1=100*2
SELECT * from RECORD WHERE SUBSTRING (card_no,1,4) = ' 5378 '
should read:
SELECT * from RECORD WHERE card_no like ' 5,378% '
SELECT Member_number, first_name, last_name from members
WHERE DATEDIFF (Yy,datofbirth,getdate ()) > 21
should read:
SELECT Member_number, first_name, last_name from members
WHERE dateOfBirth < DATEADD (Yy,-21,getdate ())
That is, any action on a column causes a table scan, which includes database functions, calculation expressions, and so on, to move the operation to the right of the equals sign whenever possible.

8. You should try to avoid function operations on the fields in the WHERE clause, which will cause the engine to discard the full table scan using the index. Such as:
Select ID from t where substring (name,1,3) = ' abc '--name ID starting with ABC
Select ID from t where DATEDIFF (day,createdate, ' 2005-11-30 ') =0--' 2005-11-30 ' generated ID
should read:
Select ID from t where name like ' abc% '
Select ID from t where createdate>= ' 2005-11-30 ' and createdate< ' 2005-12-1 '

9. Do not perform functions, arithmetic operations, or other expression operations on the left side of "=" in the WHERE clause, or the index may not be used correctly by the system.

10. When using an indexed field as a condition, if the index is a composite index, you must use the first field in the index as a condition to guarantee that the system uses the index, otherwise the index will not be used, and the field order should be consistent with the index order as much as possible.

11. Many times it is a good choice to use exists:
Select num from a where num in (select num from B)
Replace with the following statement:
Select num from a where exists (select 1 from b where num=a.num)
SELECT SUM (T1. C1) from T1 WHERE (
(SELECT COUNT (*) from T2 WHERE t2.c2=t1.c2>0)
SELECT SUM (T1. C1) from T1where EXISTS (
SELECT * from T2 WHERE T2. C2=t1. C2)
Both produce the same result, but the latter is obviously more efficient than the former. Because the latter does not produce a large number of locked table scans or index scans.
If you want to verify that there is a record in the table, do not use COUNT (*) as inefficient and waste server resources. Can be replaced with exists. Such as:
IF (SELECT COUNT (*) from table_name WHERE column_name = ' xxx ')
Can be written as:
IF EXISTS (SELECT * FROM table_name WHERE column_name = ' xxx ')
It is often necessary to write a t_sql statement that compares a parent result set and a child result set to find out if there are records in the parent result set that are not in the child result set, such as:
SELECT A.hdr_key from Hdr_tbl a----tbl a means that TBL uses alias a instead
Where not EXISTS (SELECT * from dtl_tbl b WHERE a.hdr_key = B.hdr_key)
SELECT A.hdr_key from Hdr_tbl a
Left JOIN dtl_tbl B-a.hdr_key = B.hdr_key WHERE B.hdr_key is NULL
SELECT Hdr_key from Hdr_tbl
WHERE Hdr_key not in (SELECT Hdr_key from DTL_TBL)
Three kinds of writing can get the same correct results, but the efficiency is reduced in turn.

12. Try to use table variables instead of temporary tables. If the table variable contains a large amount of data, be aware that the index is very limited (only the primary key index).

13. Avoid frequent creation and deletion of temporary tables to reduce the consumption of system table resources.

14. Temporary tables are not unusable, and they can be used appropriately to make certain routines more efficient, for example, when you need to repeatedly reference a dataset in a large table or a common table. However, for one-time events, it is best to use an export table.

15. When creating a temporary table, if you insert a large amount of data at one time, you can use SELECT INTO instead of CREATE table to avoid causing a large number of logs to increase speed, and if the amount of data is small, create table to mitigate the resources of the system tables. Then insert.

16. If a temporary table is used, be sure to explicitly delete all temporary tables at the end of the stored procedure, TRUNCATE table first, and then drop table, which avoids longer locking of the system tables.

17. Set NOCOUNT on at the beginning of all stored procedures and triggers, set NOCOUNT OFF at the end. You do not need to send a DONE_IN_PROC message to the client after each statement that executes the stored procedure and trigger.

18. Try to avoid large transaction operation and improve the system concurrency ability.

19. Try to avoid the return of large data to the client, if the amount of data is too large, should consider whether the corresponding demand is reasonable.

20. Avoid using incompatible data types. For example, float and int, char and varchar, binary, and varbinary are incompatible. Incompatible data types may make the optimizer unable to perform some optimizations that could otherwise have been performed. For example:
SELECT name from employee WHERE salary > 60000
In this statement, such as the salary field is a money type, it is difficult for the optimizer to optimize it because 60000 is an integer number. We should convert an integer into a coin type when programming, rather than wait for a run-time conversion.

21. Make full use of the connection conditions, in some cases, there may be more than one connection between the two tables, at this point in the WHERE clause in the connection condition complete write, it is possible to greatly improve the query speed.
Cases:
SELECT SUM (A.amount) from account a,card B WHERE a.card_no = b.card_no
SELECT SUM (A.amount) from account a,card B WHERE a.card_no = B.card_no and A.account_no=b.account_no
The second sentence will be much faster than the first sentence.

22. Use the view to speed up the query
Sorting a subset of tables and creating views can sometimes speed up queries. It helps to avoid multiple sorting operations, and in other ways simplifies the work of the optimizer. For example:
SELECT cust.name,rcvbles.balance,......other Columns
From Cust,rcvbles
WHERE cust.customer_id = rcvlbes.customer_id
and rcvblls.balance>0
and cust.postcode> "98000"
ORDER by Cust.name
If the query is to be executed more than once, all unpaid customers can be found in a single view and sorted by the customer's name:
CREATE VIEW DBO. V_cust_rcvlbes
As
SELECT cust.name,rcvbles.balance,......other Columns
From Cust,rcvbles
WHERE cust.customer_id = rcvlbes.customer_id
and rcvblls.balance>0
ORDER by Cust.name
Then query in the view in the following way:
SELECT * from V_cust_rcvlbes
WHERE postcode> "98000"
The number of rows in the view is less than the rows in the primary table, and the physical order is the required order, reducing disk I/O, so the query effort can be significantly reduced.

23, you can use distinct without GROUP by
SELECT OrderID from Details WHERE UnitPrice > Ten GROUP by OrderID
Can be changed to:
SELECT DISTINCT OrderID from Details WHERE UnitPrice > 10

24. Use UNION ALL to not use Union
UNION all does not execute the SELECT DISTINCT function, which reduces a lot of unnecessary resources

25. Try not to use the SELECT INTO statement.
The SELECT inot statement causes the table to lock and prevent other users from accessing the table.
What we mentioned above is some basic considerations for improving query speed, but in more cases, it is often necessary to experiment with different statements to get the best solution. The best way of course is to test, see the implementation of the same function of the SQL statement which execution time is the least, but the database if the amount of data is not comparable, then you can use to view the execution plan, that is: the implementation of the same function of multiple SQL statements to the Query Analyzer, according to Ctrl+l to look at the index used, The number of table scans (both of which have the greatest impact on performance) and the overall cost percentage to be consulted.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Mass data Query code optimization--Reprint

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Mass data Query code optimization--Reprint

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support