Many programmers think that query optimization is the task of the DBMS (Database Tutorial management System), which is not related to the SQL statements written by programmers, which is wrong. A good query plan can often improve the performance of the program by dozens of times times. A query plan is a collection of SQL statements that a user submits, and a query plan is a collection of statements that are produced after an optimized process. The process of the DBMS processing the query plan is as follows: After the lexical and grammatical checking of the query sentence, the statement is submitted to the query optimizer of the DBMS, after the optimizer finishes the optimization of algebraic optimization and access path, the statement is processed by the precompiled module and the query plan is generated. It is then submitted to the system for execution at the right time, and the execution results are returned to the user at the end. In the high version of the actual database products, such as Oracle, Sybase, and so on, the cost-based optimization method is used to estimate the cost of different query plans based on the information obtained from the System Dictionary table, and then choose a better plan. Although the current database products in query optimization has been doing better, but the user submitted by the SQL statements is the basis for system optimization, it is difficult to imagine a bad query plan after the system optimization will become efficient, so the pros and cons of the written statement is critical. The following highlights the solution for improving the query plan.
Solve the problem
The following is an example of a relational database system, Informix, that improves the user query plan.
1. Rational use of indexes
Index is an important data structure in database, and its basic aim is to improve the efficiency of query. Most of the database products now use IBM's first proposed ISAM index structure. The use of indexes is just right, with the following principles:
Indexes are established on columns that are frequently connected but not specified as foreign keys, while fields that are not frequently connected are automatically generated by the optimizer.
Index on a column that is frequently sorted or grouped (that is, a group by or order by operation).
A search is established on columns with more values that are often used in conditional expressions, and no index is established on columns with fewer values. For example, there are only two different values for "male" and "female" on the "Sex" column of the employee table, so there is no need to index. If indexing does not improve query efficiency, it can significantly reduce the speed of updates.
If there are multiple columns to be sorted, you can set up a composite index on those columns (compound index).
Use System Tools. If the Informix database has a Tbcheck tool, it can be checked on suspicious indexes. On some database servers , the index may fail or the read efficiency is reduced because of frequent manipulation, and if a query using an index slows down, try using the Tbcheck tool to check the integrity of the index and fix it if necessary. In addition, when a database table updates a large amount of data, deleting and rebuilding the index can increase the query speed.
2. Avoid or simplify sorting
You should simplify or avoid repeating sorting of large tables. The optimizer avoids sorting steps when it is possible to use indexes to automatically produce output in the appropriate order. Here are some of the factors that affect:
The index does not include one or several columns to be sorted, and the order of the columns in the group BY or ORDER BY clause is not the same as the index;
The sorted columns come from different tables.
In order to avoid unnecessary sorting, it is necessary to build the index correctly and consolidate the database table reasonably (although it may sometimes affect the normalization of the table, but it is worthwhile relative to the increase in efficiency). If sorting is unavoidable, try simplifying it, such as narrowing the range of sorted columns.
3. Eliminates sequential access to large table row data
In nested queries, sequential access to tables can have a fatal effect on query efficiency. For example, the use of sequential access strategy, a nested 3-level query, if each layer query 1000 rows, then the query will query 1 billion rows of data. The primary way to avoid this is to index the connected columns. For example, two tables: Student form (school number, name, age ...). ) and the selected timetable (school number, course number, grade). If two tables are to be connected, the index should be indexed on the connection field "School Number".
You can also use a set of collections to avoid sequential access. Although there are indexes on all of the check columns, some forms of the WHERE clause force the optimizer to use sequential access. The following query forces a sequential operation on the Orders table:
SELECT * FROM Orders WHERE (customer_num=104 and order_num>1001) OR order_num=1008
Although indexes are built on Customer_num and Order_num, the optimizer uses sequential access paths to scan the entire table in the above statement. Because this statement retrieves a collection of detached rows, it should be changed to the following statement:
SELECT * FROM Orders WHERE customer_num=104 and order_num>1001
UNION
SELECT * FROM Orders WHERE order_num=1008
This allows the query to be processed using the index path.
4. Avoid correlated subqueries
A column's label appears in both the main query and the query in the WHERE clause, it is likely that the subquery must requery once the column value in the main query changes. The more nested the query, the lower the efficiency, so the subquery should be avoided as much as possible. If the subquery is unavoidable, filter out as many rows as possible in the subquery.
5. Regular expressions to avoid difficulties
Matches and like keywords support wildcard matching, technically called regular expressions. But this kind of match is especially time-consuming. For example: SELECT * from the customer WHERE zipcode like "98_ _ _"
Even if an index is established on the ZipCode field, the sequential scan is used in this case. If you change the statement to select * from Customer WHERE zipcode > "98000", the index is used to query when executing the query, which obviously increases the speed significantly.
Also, avoid substrings that do not start. For example, a SELECT * from Customer WHERE zipcode[2,3] > "80" takes a non-start substring in the WHERE clause, so the statement does not use an index.
6. Using temporary tables to speed up queries
Sorting a subset of a table and creating a temporary table can sometimes speed up queries. Helps avoid multiple sort operations and, in other ways, simplifies the work of the optimizer. For example:
SELECT cust.name,rcvbles.balance,......other Columns
From Cust,rcvbles
WHERE cust.customer_id = rcvlbes.customer_id
and rcvblls.balance>0
and cust.postcode> "98000"
ORDER BY Cust.name
If the query is to be executed multiple times and more than once, all unpaid customers can be found in a temporary file and sorted by the customer's name:
SELECT cust.name,rcvbles.balance,......other Columns
From Cust,rcvbles
WHERE cust.customer_id = rcvlbes.customer_id
and rcvblls.balance>0
ORDER BY Cust.name
Into TEMP cust_with_balance
Then query in the Temp table in the following way:
SELECT * from Cust_with_balance
WHERE postcode> "98000"
There are fewer rows in the temporary table than in the primary table, and the physical order is the required order, reducing disk I/O, so the query workload can be drastically reduced.
Note: Temporary table creation does not reflect changes to the primary table. When data is frequently modified in the primary table, be careful not to lose data.