MySQL (17) ----------- Query Optimization for massive databases, mysql -----------

Source: Internet
Author: User
Tags informix

MySQL (17) ----------- Query Optimization for massive databases, mysql -----------

Many Programmers think that query optimization is a task of DBMS (Database tutorial Management System). It has little to do with the SQL statements compiled by the programmers. This is wrong. A good query plan can often increase the program performance by dozens of times. A query plan is a set of SQL statements submitted by the user. A query plan is a set of statements generated after optimization. The process of the DBMS processing the query plan is as follows: after the lexical and syntax check of the query statement is completed, the statement is submitted to the query optimizer of the DBMS, after the optimizer completes algebra optimization and access path optimization, the pre-compilation module processes the statements and generates query plans, and then submits the statements to the system for processing and execution at the appropriate time, finally, return the execution result to the user. The actual database products (such as Oracle and Sybase) Use a cost-based Optimization Method in later versions, this optimization can estimate the cost of different query plans based on the information obtained from the system dictionary table, and then select a better plan. Although the database products have been doing better and better in query optimization, the SQL statements submitted by users are the basis for system optimization, it is hard to imagine that a bad query plan will become efficient after the system is optimized, so the merits and demerits of the statements written are crucial. The following describes the solutions to improve the query plan.

Solve the problem

The following uses Informix as an example to describe how to improve the user query plan.

1. Use indexes reasonably

An index is an important data structure in a database. Its fundamental goal is to improve query efficiency. Currently, most database products adopt the ISAM index structure first proposed by IBM. The index should be used properly. The usage principles are as follows:

● The optimizer automatically generates an index for fields that are frequently connected but not specified as foreign keys.

● Index the columns that are frequently sorted or grouped (that is, group by or order by operations.

● Create a search for columns with different values that are frequently used in conditional expressions. Do not create an index for columns with fewer values. For example, in the "gender" column of the employee table, there are only two different values: "male" and "female", so there is no need to create an index. If an index is created, the query efficiency is not improved, but the update speed is greatly reduced.

● If there are multiple columns to be sorted, you can create a compound index on these columns ).

● Use system tools. For example, the Informix database has a tbcheck tool that can be checked on suspicious indexes. On some database servers, the index may be invalid or the reading efficiency may be reduced due to frequent operations. If an index-based Query slows down, you can use the tbcheck tool to check the index integrity, fix the issue if necessary. In addition, when a database table updates a large amount of data, deleting and re-indexing can increase the query speed.

2. Avoid or simplify sorting

Duplicate sorting of large tables should be simplified or avoided. When indexes can be used to automatically generate output in the appropriate order, the optimizer avoids the sorting step. The following are some influencing factors:

● The index does not contain one or more columns to be sorted. ● The order of the columns in the group by or order by clause is different from that of the index;

● Sort columns from different tables.

In order to avoid unnecessary sorting, We need to correctly add indexes and reasonably merge database tables (although it may affect table standardization sometimes, it is worthwhile to Improve the efficiency ). If sorting is unavoidable, you should try to simplify it, such as narrowing the column range of sorting.

3. Eliminates sequential access to data in large table rows

In nested queries, sequential access to a table may have a fatal impact on query efficiency. For example, the sequential access policy is used to create a nested layer-3 query. IF 1000 rows are queried at each layer, 1 billion rows of data are queried. The primary way to avoid this is to index the connected columns. For example, two tables: Student table (student ID, name, age ......) And Course Selection form (student ID, course number, score ). If you want to connect two tables, you need to create an index on the join field "student ID.

Union can also be used to avoid sequential access. Although all check columns are indexed, some forms of where clauses force the optimizer to use sequential access. The following query forces the sequential operation on the orders table: (www.111cn.net)

SELECT * FROM orders WHERE (customer_num = 104 AND order_num> 1001) OR order_num = 1008

Although indexes are created on customer_num and order_num, the optimizer still uses sequential access paths to scan the entire table in the preceding statement. Because this statement is used to retrieve the set of separated rows, it should be changed to the following statement:

SELECT * FROM orders WHERE customer_num = 104 AND order_num> 1001

UNION

SELECT * FROM orders WHERE order_num = 1008

In this way, you can use the index path to process queries.

4. Avoid related subqueries

The label of a column appears in both the primary query and the where clause query. It is very likely that after the column value in the primary query changes, the subquery must perform a new query. The more nested query layers, the lower the efficiency. Therefore, avoid subqueries as much as possible. If the subquery is unavoidable, filter as many rows as possible in the subquery.

5. Avoid difficult Regular Expressions

MATCHES and LIKE keywords support wildcard matching, technically called regular expressions. However, this matching is especially time-consuming. Example: SELECT * FROM customer WHERE zipcode LIKE "98 ___"

Even if an index is created on the zipcode field, sequential scanning is used in this case. If you change the statement to SELECT * FROM customer WHERE zipcode> "98000", the query will be executed using the index, which will obviously increase the speed.

In addition, avoid non-starting substrings. For example, if SELECT * FROM customer WHERE zipcode [2, 3]> "80" is used in the where clause, non-starting substrings are used. Therefore, this statement does not use indexes.

6. Use temporary tables to accelerate queries

Sort a subset of a table and create a temporary table, which sometimes accelerates query. It helps avoid multiple sorting operations and simplifies the optimizer's work in other aspects. For example:

SELECT cust. name, rcvbles. balance ,...... Other columns

FROM cust, rcvbles

WHERE cust. customer_id = rcvlbes. customer_id

AND rcvblls. balance> 0

AND cust. postcode> 98000"

Order by cust. name

If this query is executed multiple times but more than once, you can find all the unpaid customers in a temporary file and sort them by customer name:

SELECT cust. name, rcvbles. balance ,...... Other columns

FROM cust, rcvbles

WHERE cust. customer_id = rcvlbes. customer_id

AND rcvblls. balance> 0

Order by cust. name

Into temp cust_with_balance

Then, query the temporary table in the following way:

SELECT * FROM cust_with_balance

WHERE postcode> 98000"

The temporary table has fewer rows than the primary table, and the physical order is the required order, which reduces disk I/O, so the query workload can be greatly reduced.

Note: after a temporary table is created, the modification to the primary table is not reflected. Do not lose data when the data in the master table is frequently modified.

From:

Http://www.111cn.net/database/110/5d6bc167016ed54ef3de149d89fee7c4.htm

Copyright Disclaimer: you are welcome to reprint it. I hope to add the original article address at the same time. Thank you for your cooperation and learning!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.