SQL optimization----million data query optimization

Source: Internet
Author: User

Transfer from http://www.cnblogs.com/qqzy168/p/3233817.html

1. Fair use Index

An index is an important data structure in a database, and its fundamental purpose is to improve query efficiency. Most database products now use the ISAM index structure first proposed by IBM. The use of indexes is just right, and the following principles are used:

Indexes are established on columns that are frequently connected but not specified as foreign keys, and fields that are not frequently connected are automatically indexed by the optimizer.

An index on a column that is frequently sorted or grouped (that is, a group by or order by operation).

Build a search on columns that are often used in conditional expressions with more different values, and do not index on columns with fewer values. For example, in the employee table, the "Gender" column is only "male" and "female" two different values, so there is no need to build an index. If you build an index, it will not improve the query efficiency, but can seriously reduce the update speed.

If there are multiple columns to sort, you can create a composite index on those columns (compound index).

Use System Tools. If the Informix database has a Tbcheck tool, it can be checked on a suspicious index. On some database servers, indexes can be invalidated or slow to read because of frequent operations, and if a query that uses an index is slowly slowing down, try using the Tbcheck tool to check the integrity of the index and fix it if necessary. In addition, deleting and rebuilding an index can improve query speed when the database table updates a large amount of data.

2. Avoid or simplify sorting

You should simplify or avoid repeating the ordering of large tables. The optimizer avoids sequencing steps when it is possible to automatically generate output in the appropriate order using the index. Here are some of the factors that affect:

One or several columns to be sorted are not included in the index;

The order of the columns in the group BY or ORDER BY clause is not the same as the order of the indexes;

The sorted column comes from a different table.

In order to avoid unnecessary sorting, it is necessary to construct the index correctly and merge the database tables reasonably (although sometimes it may affect the normalization of the table, but it is worthwhile to improve the efficiency). If sorting is unavoidable, you should try to simplify it, such as narrowing the range of sorted columns.

3. Eliminate sequential access to large table row data

In nested queries, sequential access to a table can have a fatal effect on query efficiency.    For example, a sequential access strategy, a nested 3-tier query, if each layer query 1000 rows, then the query will query 1 billion rows of data. A) The primary way to avoid this is to index the concatenated columns.

B) You can also use the union to avoid sequential access (change or to union). Although there are indexes on all the check columns, some forms of where clauses force the optimizer to use sequential access. The following query forces a sequential operation on the Orders table:

SELECT * FROM Orders WHERE (customer_num=104 and order_num>1001) OR order_num=1008

Although indexes are built on Customer_num and Order_num, the optimizer uses sequential access paths to scan the entire table in the above statement. Because this statement retrieves a collection of detached rows, it should be changed to the following statement:

SELECT * FROM Orders WHERE customer_num=104 and order_num>1001

UNION

SELECT * FROM Orders WHERE order_num=1008

This enables the query to be processed using the index path.

4. Avoid correlated subqueries

When a column's label appears in both the main query and the query in the WHERE clause, it is likely that the subquery must be queried again once the column values in the main query have changed. The more nested levels of queries, the lower the efficiency, so you should avoid subqueries as much as possible. If the subquery is unavoidable, filter out as many rows as possible in the subquery.

5. Regular expressions to avoid difficulties

The matches and like keywords support wildcard matching, which is technically known as a regular expression. But this is a particularly time-consuming match. Example: SELECT * from the customer WHERE zipcode like "98_ _ _"

Even if an index is established on the ZipCode field, sequential scanning is also used in this case. If you change the statement to select * from Customer where zipcode > "98000", the index will be used to query when executing the query, which obviously will greatly increase the speed.

Also, avoid non-starting substrings. For example, the statement: SELECT * from Customer where zipcode[2,3] > "80", in the WHERE clause takes a non-starting substring, so the statement will not use the index.

6. Accelerating queries with temporal tables

Sorting a subset of tables and creating temporary tables can sometimes speed up queries. Helps to avoid multiple sorting operations, and in other ways simplifies the work of the optimizer. For example:

SELECT cust.name,rcvbles.balance,......other Columns

From Cust,rcvbles

WHERE cust.customer_id = rcvlbes.customer_id

and rcvblls.balance>0

and cust.postcode> "98000"

ORDER by Cust.name

If the query is to be executed more than once, all unpaid customers can be found in a temporary file and sorted by the customer's name:

SELECT cust.name,rcvbles.balance,......other Columns

From Cust,rcvbles

WHERE cust.customer_id = rcvlbes.customer_id

and rcvblls.balance>0

ORDER by Cust.name

Into TEMP cust_with_balance

Then query in the temporary table in the following way:

SELECT * from Cust_with_balance

WHERE postcode> "98000"

The rows in the staging table are less than the rows in the primary table, and the physical order is the required order, reducing disk I/O, so the query effort can be significantly reduced.

Note: Changes to the primary table are not reflected when the staging table is created. When data is frequently modified in the primary table, be careful not to lose data.

 

Summary

20% of the code took 80% of the time, which is a name law in programming, as well as in database applications. Our optimization captures key issues, and for database applications, the focus is on the efficiency of SQL execution. The focus of query optimization is to make the database server less read data from disk and sequential read pages instead of non-sequential read pages.

Part II (How to make the engine fully use the index)

l million data query optimization tips 30

1. Create an index to optimize the query, avoid full table scan as far as possible, and first consider indexing on the columns involved in where and order by.

2. Avoid null-valued fields in the WHERE clause, which will cause the engine to discard full-table scans using the index, such as:

Select ID from t where num is null

You can set the default value of 0 on NUM, make sure that the NUM column in the table does not have a null value, and then query:

Select ID from t where num=0

3. Try to avoid using the! = or <> operator in the WHERE clause, or discard the engine for a full table scan using the index.

4. You should try to avoid using or in the WHERE clause to join the condition, you can use union, otherwise it will cause the engine to abandon using the index for a full table scan, such as:

Select ID from t where num=10 or num=20

You can query this:

Select ID from t where num=10

UNION ALL

Select ID from t where num=20

5.in and not in should also be used with caution, otherwise it will result in full table scans, such as:

Select ID from t where num in

For consecutive values, you can use between instead of in:

Select ID from t where num between 1 and 3

6. The following fuzzy query will also result in a full table scan:

Select ID from t where name like '%abc% '

To be more efficient, consider full-text indexing.

7. If you use a parameter in the WHERE clause, it also causes a full table scan.

Because SQL resolves local variables only at run time, the optimizer cannot defer the selection of access plans to run time; it must be selected at compile time. However, if an access plan is established at compile time, the value of the variable is still unknown and therefore cannot be selected as an input for the index. The following statement will perform a full table scan:

Select ID from t where [email protected]

You can force the query to use the index instead:

Select ID from T with (index name) where [email protected]

8. You should try to avoid expression operations on the fields in the WHERE clause, which will cause the engine to discard the full table scan using the index. Such as:

Select ID from t where num/2=100

should read:

Select ID from t where num=100*2

9. You should try to avoid functions (built-in functions) in the WHERE clause,

This causes the engine to discard the full table scan using the index. Such as:

Select ID from t where substring (name,1,3) = ' abc '--name ID starting with ABC

Select ID from t where DATEDIFF (day,createdate, ' 2005-11-30 ') =0--' 2005-11-30 ' generated ID

should read:

Select ID from t where name like ' abc% '

Select ID from t where createdate>= ' 2005-11-30 ' and createdate< ' 2005-12-1 '

10. Do not perform functions, arithmetic operations, or other expression operations on the left side of "=" in the WHERE clause, or the index may not be used correctly by the system.

11. When using an indexed field as a condition, if the index is a composite index, you must use the first field in the index as a condition to guarantee that the system uses the index, otherwise the index will not be used, and the field order should be consistent with the index order as much as possible.

12. Do not write meaningless queries, such as the need to generate an empty table structure:

Select Col1,col2 into #t from T where 1=0

This type of code does not return any result sets, but consumes system resources and should be changed to this:

CREATE TABLE #t (...)

13. It is a good choice to use exists instead of in (reason: Please refer to my blog):

Select num from a where num in (select num from B)

Replace with the following statement:

Select num from a where exists (select 1 from b where num=a.num)

14. Not all indexes are valid for queries, and SQL is optimized for queries based on the data in the table, and when there is a large amount of data duplication in the index column, the SQL query may not take advantage of the index (unless it is a bitmap index), such as a table with fields Sex,male, female almost half, So even if you build an index on sex, it doesn't work for query efficiency.

15. The index is not the more the better, although the index can improve the efficiency of the corresponding select, but also reduce the efficiency of insert and UPDATE, because the INSERT or update when the index may be rebuilt, so how to build the index needs careful consideration, depending on the situation. The number of indexes on a table should not be more than 6, if too many you should consider whether some of the indexes that are not commonly used are necessary.

16. You should avoid updating clustered index data columns as much as possible, because the order of the clustered index data columns is the physical storage order of the table records, which can consume considerable resources once the column values change to the order in which the entire table is recorded. If your application needs to update clustered index data columns frequently, you need to consider whether the index should be built as a clustered index.

17. Use numeric fields as much as possible, if the field containing only numeric information should not be designed as a character type, which will reduce the performance of queries and connections and increase storage overhead. This is because the engine compares each character in a string one at a time while processing queries and joins, and it is sufficient for a numeric type to be compared only once.

18. Use Varchar/nvarchar as much as possible instead of Char/nchar, preferably with varchar2 (self-variable length), because the first variable long field storage space is small, you can save storage space, and secondly for the query, in a relatively small field in the search efficiency is obviously higher.

19. Do not use SELECT * from t anywhere, replace "*" with a specific field list, and do not return any fields that are not available.

20. Try to use table variables instead of temporary tables (typically used in stored procedures). If the table variable contains a large amount of data, be aware that the index is very limited (only the primary key index).

21. Avoid frequent creation and deletion of temporary tables to reduce the consumption of system table resources.

22. Temporary tables are not unusable, and they can be used appropriately to make certain routines more efficient, for example, when you need to repeatedly reference a dataset in a large table or a common table. However, for one-time events, it is best to use an export table.

23. When creating a temporary table, if you insert a large amount of data at once, you can use SELECT INTO A select ... Instead of CREATE TABLE, avoid creating a large number of logs to speed up, and if the amount of data is small, create a table and insert it in order to mitigate the resources of the system tables.

24. If a temporary table is used, be sure to explicitly delete all temporary tables at the end of the stored procedure, TRUNCATE table first, and then drop table, which avoids longer locking of the system tables.

25. Avoid using cursors as much as possible, because cursors are inefficient and should be considered for overwriting if the cursor is manipulating more than 10,000 rows of data.

26. Before using a cursor-based method or temporal table method, you should first look for a set-based solution to solve the problem, and the set-based approach is generally more efficient.

27. As with temporary tables, cursors are not unusable. Using Fast_forward cursors on small datasets is often preferable to other progressive processing methods, especially if you must reference several tables to obtain the required data. Routines that include "totals" in the result set are typically faster than using cursors. If development time permits, a cursor-based approach and a set-based approach can all be tried to see which method works better.

28. Set NOCOUNT on at the beginning of all stored procedures and triggers, set NOCOUNT OFF at the end. You do not need to send a DONE_IN_PROC message to the client after each statement that executes the stored procedure and trigger.

29. Try to avoid large transaction operation and improve the system concurrency ability.

30. Try to avoid the return of large data to the client, if the amount of data is too large, should consider whether the corresponding demand is reasonable.

SQL optimization----million data query optimization

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.