statement Optimization of SQL optimization

Source: Internet
Author: User
Tags what sql

Yesterday Qi, share with you the optimization of the index in SQL optimization, today to talk to you, in the development process of high-quality code will also bring optimization


There are a lot of tutorials on SQL optimization online, but it's messy. Qi organized a bit, write to share with you, where there are errors and deficiencies, but also ask you to correct the supplement.

SQL statement optimization, in short, is the efficient coding of SQL statements, and its principles are in fact consistent with SQL Index optimization:

Indexing actually reduces the number of rows affected by the database being scanned at execution time, avoiding global scans as much as possible


Optimization Note:
To optimize the query, to avoid full table scanning, first consider establishing an index on the columns involved in the Where and order by.

You should try to avoid null values for the field in the Where clause, or it will cause the engine to discard full table scans using the index, such as:


Optimization scenarios:
Here to tell you what SQL statement performs a full table scan

1. Slow execution of SELECT statements with IS NULL in the query condition


WORKAROUND: Using NULL in SQL syntax can be a lot of trouble, preferably an indexed column is not NULL, and it is best not to leave the database null and, if possible, populate the database with not NULL. Comments, descriptions, comments, and so on can be set to NULL, others, preferably do not use NULL. ;

Is null, which can be indexed, is a null query when index lookups can be enabled, but the efficiency is not yet to be affirmed, it is recommended not to use.
is not NULL when the index is never used. Tables with large data volumes do not use the IS null query.

Do not assume that NULL does not require space, such as: char (100) type, when the field is established, the space is fixed, regardless of whether the insertion value (NULL is also included), is occupied 100 characters of space, if it is varchar such a variable length field, NULL does not occupy space.

You can set the default value of 0 on the field to make sure that the field column in the table does not have a null value, and then it matches the data where the field equals 0:

2. Try to avoid using the! = or <> operator in the WHERE clause, otherwise the engine discards full table scanning using the index.

Cause: In SQL, the non-equal operator restricts the index, causing a full table scan, even if there is an index on the comparison field
Workaround: By changing the non-equal operator to or, you can use the index to avoid a full table scan. For example, to change column<> ' aaa ' to column< ' aaa ' or column> ' AAA ', you can use the index.


3. Try to avoid using or in the WHERE clause to join the condition, if a field has an index and a field is not indexed, it will cause the engine to discard using the index for a full table scan, such as:

Select ID from t where num=10 or Name = ' admin '
You can query this:

Select ID from t where num = 10
UNION ALL
Select ID from t where Name = ' admin '

Union explanation
The UNION operator is used to combine the result set of two or more SELECT statements.
Note that the SELECT statement inside the UNION must have the same number of columns. The column must also have a similar data type. Also, the order of the columns in each SELECT statement must be the same.
Note: By default, the UNION operator chooses a different value. If duplicate values are allowed, use UNION all.
Connecting the index with or does not take effect
The code is as follows:
Mysql> explain select id from t1 where name=1 or age=2\g;
1. Row ***************************
Id:1
Select_type:simple
Table:t1
Partitions:null
Type:all
Possible_keys:in_name
Key:null
Key_len:null
Ref:null
rows:10000
filtered:55.00
Extra:using where
1 row in Set, 1 Warning (0.00 sec)
If the Union connection is used, the index takes effect at query time.
The code is as follows:
Mysql> DESC SELECT * FROM T1 where name=1 union ALL select *from T1 where age=2\g;
1. Row ***************************
Id:1
Select_type:primary The first statement index takes effect
Table:t1
Partitions:null
Type:ref
Possible_keys:in_name
Key:in_name
Key_len:5
Ref:const
Rows:1
filtered:100.00
Extra:null
2. Row ***************************
Id:2
Select_type:union The second statement does not set the index does not use the index
Table:t1
Partitions:null
Type:all
Possible_keys:null
Key:null
Key_len:null
Ref:null
rows:10000
filtered:10.00
Extra:using where
2 rows in Set, 1 Warning (0.00 sec)


4.in and not in should also be used with caution, otherwise it will result in full table scans, such as:

Select ID from t where num in
For consecutive values, you can use between instead of in:

Select ID from t where num between 1 and 3

5. Fuzzy query The following query will also cause a full table scan:

Select ID from t where name like '%abc% '
To be more efficient, consider full-text indexing.

6. If you use a parameter in the WHERE clause, it also causes a full table scan.

Because SQL resolves local variables only at run time, the optimizer cannot defer the selection of access plans to run time; it must be selected at compile time. However, if an access plan is established at compile time, the value of the variable is still unknown and therefore cannot be selected as an input for the index. The following statement will perform a full table scan:

Select ID from t where num = @num
You can force the query to use the index instead:

Select ID from the T with (index name) where num = @num
You should try to avoid expression operations on the fields in the WHERE clause, which causes the engine to discard full table scans using the index. Such as:

Select ID from t where NUM/2 = 100
should read:

Select ID from t where num = 100*2
7. You should try to avoid function operations on the fields in the WHERE clause, which will cause the engine to discard the full table scan using the index. such as:

Select ID from t where substring (name,1,3) = ' abc '-–name ID starting with ABC
Select ID from t where DATEDIFF (day,createdate, ' 2005-11-30′) = 0-' 2005-11-30 '--generated ID
should read:

Select ID from t where name like ' abc% '
Select ID from t where createdate >= ' 2005-11-30 ' and CreateDate < ' 2005-12-1 '
8. Do not perform functions, arithmetic operations, or other expression operations on the left side of "=" in the WHERE clause, or the index may not be used correctly by the system.

9. When using an indexed field as a condition, if the index is a composite index, you must use the first field in the index as a condition to guarantee that the system uses the index, otherwise the index will not be used, and the field order should be consistent with the index order as much as possible.

10. Do not write meaningless queries, such as the need to generate an empty table structure:

Select Col1,col2 into #t from T where 1=0
This type of code does not return any result sets, but consumes system resources and should be changed to this:

CREATE TABLE #t (...)

10.Update statement, if you only change 1, 2 fields, do not Update all fields, otherwise frequent calls will cause significant performance consumption, while bringing a large number of logs.

11. For multiple large data volume (here Hundreds of is even larger) table join, to first paged and then join, otherwise the logical reading will be very high, poor performance.

12.select Count (*) from table, so that count without any conditions causes a full table scan, and without any business meaning, it must be eliminated.

13. The index is not the more the better, although the index can improve the efficiency of the corresponding select, but also reduce the efficiency of insert and UPDATE, because the INSERT or update when the index may be rebuilt, so how to build the index needs careful consideration, depending on the situation. The number of indexes on a table should not be more than 6, if too many you should consider whether some of the indexes that are not commonly used are necessary.

14. You should avoid updating clustered index data columns as much as possible, because the order of the clustered index data columns is the physical storage order of the table records, which can consume considerable resources once the column values change to the order in which the entire table is recorded. If your application needs to update clustered index data columns frequently, you need to consider whether the index should be built as a clustered index.

15. Use numeric fields as much as possible, if the field containing only numeric information should not be designed as a character type, which will reduce the performance of queries and connections and increase storage overhead. This is because the engine compares each character in a string one at a time while processing queries and joins, and it is sufficient for a numeric type to be compared only once.

16. Use Varchar/nvarchar instead of Char/nchar as much as possible, because the first variable length field storage space is small, can save storage space, second, for the query, in a relatively small field in the search efficiency is obviously higher.

17. Do not use SELECT * from t anywhere, replace "*" with a specific field list, and do not return any fields that are not available.

18. Try to use table variables instead of temporary tables. If the table variable contains a large amount of data, be aware that the index is very limited (only the primary key index).

19. Avoid frequent creation and deletion of temporary tables to reduce the consumption of system table resources. Temporary tables are not unusable, and they can be used appropriately to make certain routines more efficient, such as when you need to repeatedly reference a dataset in a large table or a common table. However, for one-time events, it is best to use an export table.

20. When creating a temporary table, if you insert a large amount of data at one time, you can use SELECT INTO instead of CREATE table to avoid causing a large number of logs to increase speed, and if the amount of data is small, create table to mitigate the resources of the system tables. Then insert.

21. If a temporary table is used, be sure to explicitly delete all temporary tables at the end of the stored procedure, TRUNCATE table first, and then drop table, which avoids longer locking of the system tables.

22. Avoid using cursors as much as possible, because cursors are inefficient and should be considered for overwriting if the cursor is manipulating more than 10,000 rows of data.

23. Before using a cursor-based method or temporal table method, you should first look for a set-based solution to solve the problem, and the set-based approach is generally more efficient.

24. As with temporary tables, cursors are not unusable. Using Fast_forward cursors on small datasets is often preferable to other progressive processing methods, especially if you must reference several tables to obtain the required data. Routines that include "totals" in the result set are typically faster than using cursors. If development time permits, a cursor-based approach and a set-based approach can all be tried to see which method works better.

25. Set NOCOUNT on at the beginning of all stored procedures and triggers, set NOCOUNT OFF at the end. You do not need to send a DONE_IN_PROC message to the client after each statement that executes the stored procedure and trigger.

26. Try to avoid large transaction operation and improve the system concurrency ability.

27. Try to avoid the return of large data to the client, if the amount of data is too large, should consider whether the corresponding demand is reasonable.

Real case Analysis: splitting large DELETE or INSERT statements and committing SQL statements in batches

If you need to perform a large DELETE or INSERT query on an online website, you need to be very careful to avoid your actions to keep your entire site from stopping accordingly. Because these two operations will lock the table, the table is locked, the other operations are not in.

Apache will have a lot of child processes or threads. So, it works quite efficiently, and our servers don't want to have too many child processes, threads and database links, which is a huge amount of server resources, especially memory.

If you lock your watch for a period of time, say 30 seconds, then for a site with a high volume of traffic, the 30-second cumulative number of access processes/threads, database links, and open files may not only crash your Web service, but may also cause your entire server to hang up immediately.

So, if you have a big deal, you must split it, using the LIMIT Oracle (rownum), SQL Server (top) condition is a good method. Here is an example of MySQL:


while (1) {

Only 1000 at a time.

mysql_query ("Delete from logs where log_date <= ' 2012-11-01 ' limit 1000");

if (mysql_affected_rows () = = 0) {

Delete Complete, exit!
Break
}

Each time a pause is paused, the table is freed for other processes/threads to access.
Usleep (50000)

}

Reference Link: http://database.51cto.com/art/201407/445934.htm

This article Qi is also not fully integrated, I hope we can help.

Good night.

statement Optimization of SQL optimization

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.