Database system is the core of MIS, and online transaction processing (OLTP) and online analytical Processing (OLAP) based on database is one of the most important computer applications in banks, enterprises and government departments. From the application examples of most systems, query operation occupies the most proportion in various database operations, and the SELECT statement based on query operation is the most expensive statement in SQL statement. For example, if the amount of data accumulated to a certain extent, such as a bank's account database table information accumulated to millions or even thousands records, a full table scan often takes 10 minutes, or even hours. If you adopt a better query strategy than full table scan, you can often reduce the query time to a few minutes, thus the importance of query optimization technology I found in the implementation of the application project, many programmers are using some front-end database development tools (such as PowerBuilder, Delphi, etc.) when developing database application, only pay attention to the gorgeous user interface, do not pay attention to the efficiency of query statements, resulting in the development of the application system inefficient, resource waste is serious. Therefore, how to design efficient and reasonable query statement is very important. Based on the application example and database theory, this paper introduces the application of query optimization technology in the real system.
Analyze problems
Many programmers think that query optimization is the task of DBMS (database management System), which is not related to the SQL statements written by programmers, which is wrong. A good query plan can often improve the performance of the program by dozens of times times. A query plan is a collection of SQL statements that a user submits, and a query plan is a collection of statements that are produced after an optimized process. The process of the DBMS processing the query plan is as follows: After the lexical and grammatical checking of the query sentence, the statement is submitted to the query optimizer of the DBMS, after the optimizer finishes the optimization of algebraic optimization and access path, the statement is processed by the precompiled module and the query plan is generated. It is then submitted to the system for execution at the right time, and the execution results are returned to the user at the end. In the high version of the actual database products, such as Oracle, Sybase, and so on, the cost-based optimization method is used to estimate the cost of different query plans based on the information obtained from the System Dictionary table, and then choose a better plan. Although the current database products in query optimization has been doing better, but the user-submitted SQL statements is the basis for system optimization, it is difficult to imagine a bad query plan after the system optimization will become efficient, so users write the pros and cons of the sentence is critical. Query optimization for the system we do not discuss, the following highlights to improve the user query plan solution.
Solve the problem
The following is an example of a relational database system, Informix, that improves the user query plan.
1. Rational use of indexes
Index is an important data structure in database, and its basic aim is to improve the efficiency of query. Most of the database products now use IBM's first proposed ISAM index structure. The use of indexes is just right, with the following principles:
Indexes are established on columns that are frequently connected but not specified as foreign keys, while fields that are not frequently connected are automatically generated by the optimizer.
Index on a column that is frequently sorted or grouped (that is, a group by or order by operation).
A search is established on columns with more values that are often used in conditional expressions, and no index is established on columns with fewer values. For example, there are only two different values for "male" and "female" on the "Sex" column of the employee table, so there is no need to index. If indexing does not improve query efficiency, it can significantly reduce the speed of updates.
If there are multiple columns to be sorted, you can set up a composite index on those columns (compound index).
Use System Tools. If the Informix database has a Tbcheck tool, it can be checked on suspicious indexes. On some database servers, the index may fail or the read efficiency is reduced because of frequent manipulation, and if a query using an index slows down, try using the Tbcheck tool to check the integrity of the index and fix it if necessary. In addition, when a database table updates a large amount of data, deleting and rebuilding the index can increase the query speed.
2. Avoid or simplify sorting
You should simplify or avoid repeating sorting of large tables. The optimizer avoids sorting steps when it is possible to use indexes to automatically produce output in the appropriate order. Here are some of the factors that affect:
The index does not include one or several columns to be sorted;
The order of the columns in the group BY or ORDER BY clause is not the same as the index;
The sorted columns come from different tables.
In order to avoid unnecessary sorting, it is necessary to build the index correctly and consolidate the database table reasonably (although it may sometimes affect the normalization of the table, but it is worthwhile relative to the increase in efficiency). If sorting is unavoidable, try simplifying it, such as narrowing the range of sorted columns.
3. Eliminates sequential access to large table row data
In nested queries, sequential access to tables can have a fatal effect on query efficiency. For example, the use of sequential access strategy, a nested 3-level query, if each layer query 1000 rows, then the query will query 1 billion rows of data. The primary way to avoid this is to index the connected columns. For example, two tables: Student form (school number, name, age ...). ) and the selected timetable (school number, course number, grade). If two tables are to be connected, the index should be indexed on the connection field "School Number".
You can also use a set of collections to avoid sequential access. Although there are indexes on all of the check columns, some forms of the WHERE clause force the optimizer to use sequential access. The following query forces a sequential operation on the Orders table:
SELECT * FROM Orders WHERE (customer_num=104 and order_num>1001) OR order_num=1008
Although indexes are built on Customer_num and Order_num, the optimizer uses sequential access paths to scan the entire table in the above statement. Because this statement retrieves a collection of detached rows, it should be changed to the following statement:
SELECT * FROM Orders WHERE customer_num=104 and order_num>1001
UNION
SELECT * FROM Orders WHERE order_num=1008
This allows the query to be processed using the index path.
4. Avoid correlated subqueries
A column's label appears in both the main query and the query in the WHERE clause, it is likely that the subquery must requery once the column value in the main query changes. The more nested the query, the lower the efficiency, so the subquery should be avoided as much as possible. If the subquery is unavoidable, filter out as many rows as possible in the subquery.
5. Regular expressions to avoid difficulties
Matches and like keywords support wildcard matching, technically called regular expressions. But this kind of match is especially time-consuming. For example: SELECT * from the customer WHERE zipcode like "98_ _ _"
Even if an index is established on the ZipCode field, the sequential scan is used in this case. If you change the statement to select * from Customer WHERE zipcode > "98000", the index is used to query when executing the query, which obviously increases the speed significantly.
Also, avoid substrings that do not start. For example, the statement: SELECT * FROM Customer WHERE zipcode[2,3]> "80", which takes a non-start substring in the WHERE clause, and therefore does not use the index.
6. Using temporary tables to speed up queries
Sorting a subset of a table and creating a temporary table can sometimes speed up queries. It helps to avoid multiple sorting operations and, in other ways, simplifies the work of the optimizer. For example:
If the query is to be executed multiple times and more than once, all unpaid customers can be found in a temporary file and sorted by the customer's name:
Then query in the Temp table in the following way:
SELECT * from Cust_with_balance
WHERE postcode> "98000"
There are fewer rows in the temporary table than in the primary table, and the physical order is the required order, reducing disk I/O, so the query workload can be drastically reduced.
Note: Temporary table creation does not reflect changes to the primary table. When data is frequently modified in the primary table, be careful not to lose data.
7. Using sorting to replace non sequential access
Non-sequential disk access is the slowest operation, manifested in the movement of the disk access arm back and forth. The SQL statement hides this situation, making it easy for us to write queries that require access to a large number of non sequential pages when writing an application.
In some cases, using the ability of database sorting to replace non sequential access can improve the query.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.