Query and optimization of massive data in MySQL database

Source: Internet
Author: User
Tags informix mysql database

Many programmers think that query optimization is the task of the DBMS (Database Tutorial management System), which is not related to the SQL statements written by programmers, which is wrong. A good query plan can often improve the performance of the program by dozens of times times. A query plan is a collection of SQL statements that a user submits, and a query plan is a collection of statements that are produced after an optimized process. The process of the DBMS processing the query plan is as follows: After the lexical and grammatical checking of the query sentence, the statement is submitted to the query optimizer of the DBMS, after the optimizer finishes the optimization of algebraic optimization and access path, the statement is processed by the precompiled module and the query plan is generated. It is then submitted to the system for execution at the right time, and the execution results are returned to the user at the end. In the high version of the actual database products, such as Oracle, Sybase, and so on, the cost-based optimization method is used to estimate the cost of different query plans based on the information obtained from the System Dictionary table, and then choose a better plan. Although the current database products in query optimization has been doing better, but the user submitted by the SQL statements is the basis for system optimization, it is difficult to imagine a bad query plan after the system optimization will become efficient, so the pros and cons of the written statement is critical. The following highlights the solution for improving the query plan.

Solve the problem

The following is an example of a relational database system, Informix, that improves the user query plan.

1. Rational use of indexes

Index is an important data structure in database, and its basic aim is to improve the efficiency of query. Most of the database products now use IBM's first proposed ISAM index structure. The use of indexes is just right, with the following principles:

Indexes are established on columns that are frequently connected but not specified as foreign keys, while fields that are not frequently connected are automatically generated by the optimizer.

Index on a column that is frequently sorted or grouped (that is, a group by or order by operation).

A search is established on columns with more values that are often used in conditional expressions, and no index is established on columns with fewer values. For example, there are only two different values for "male" and "female" on the "Sex" column of the employee table, so there is no need to index. If indexing does not improve query efficiency, it can significantly reduce the speed of updates.

If there are multiple columns to be sorted, you can set up a composite index on those columns (compound index).

Use System Tools. If the Informix database has a Tbcheck tool, it can be checked on suspicious indexes. On some database servers , the index may fail or the read efficiency is reduced because of frequent manipulation, and if a query using an index slows down, try using the Tbcheck tool to check the integrity of the index and fix it if necessary. In addition, when a database table updates a large amount of data, deleting and rebuilding the index can increase the query speed.

2. Avoid or simplify sorting

You should simplify or avoid repeating sorting of large tables. The optimizer avoids sorting steps when it is possible to use indexes to automatically produce output in the appropriate order. Here are some of the factors that affect:

The index does not include one or several columns to be sorted, and the order of the columns in the group BY or ORDER BY clause is not the same as the index;

The sorted columns come from different tables.

In order to avoid unnecessary sorting, it is necessary to build the index correctly and consolidate the database table reasonably (although it may sometimes affect the normalization of the table, but it is worthwhile relative to the increase in efficiency). If sorting is unavoidable, try simplifying it, such as narrowing the range of sorted columns.

3. Eliminates sequential access to large table row data

In nested queries, sequential access to tables can have a fatal effect on query efficiency. For example, the use of sequential access strategy, a nested 3-level query, if each layer query 1000 rows, then the query will query 1 billion rows of data. The primary way to avoid this is to index the connected columns. For example, two tables: Student form (school number, name, age ...). ) and the selected timetable (school number, course number, grade). If two tables are to be connected, the index should be indexed on the connection field "School Number".

You can also use a set of collections to avoid sequential access. Although there are indexes on all of the check columns, some forms of the WHERE clause force the optimizer to use sequential access. The following query forces a sequential operation on the Orders table:

SELECT * FROM Orders WHERE (customer_num=104 and order_num>1001) OR order_num=1008

Although indexes are built on Customer_num and Order_num, the optimizer uses sequential access paths to scan the entire table in the above statement. Because this statement retrieves a collection of detached rows, it should be changed to the following statement:

SELECT * FROM Orders WHERE customer_num=104 and order_num>1001

UNION

SELECT * FROM Orders WHERE order_num=1008

This allows the query to be processed using the index path.

4. Avoid correlated subqueries

A column's label appears in both the main query and the query in the WHERE clause, it is likely that the subquery must requery once the column value in the main query changes. The more nested the query, the lower the efficiency, so the subquery should be avoided as much as possible. If the subquery is unavoidable, filter out as many rows as possible in the subquery.

5. Regular expressions to avoid difficulties

Matches and like keywords support wildcard matching, technically called regular expressions. But this kind of match is especially time-consuming. For example: SELECT * from the customer WHERE zipcode like "98_ _ _"

Even if an index is established on the ZipCode field, the sequential scan is used in this case. If you change the statement to select * from Customer WHERE zipcode > "98000", the index is used to query when executing the query, which obviously increases the speed significantly.

Also, avoid substrings that do not start. For example, a SELECT * from Customer WHERE zipcode[2,3] > "80" takes a non-start substring in the WHERE clause, so the statement does not use an index.

6. Using temporary tables to speed up queries

Sorting a subset of a table and creating a temporary table can sometimes speed up queries. Helps avoid multiple sort operations and, in other ways, simplifies the work of the optimizer. For example:

SELECT cust.name,rcvbles.balance,......other Columns

From Cust,rcvbles

WHERE cust.customer_id = rcvlbes.customer_id

and rcvblls.balance>0

and cust.postcode> "98000"

ORDER BY Cust.name

If the query is to be executed multiple times and more than once, all unpaid customers can be found in a temporary file and sorted by the customer's name:

SELECT cust.name,rcvbles.balance,......other Columns

From Cust,rcvbles

WHERE cust.customer_id = rcvlbes.customer_id

and rcvblls.balance>0

ORDER BY Cust.name

Into TEMP cust_with_balance

Then query in the Temp table in the following way:

SELECT * from Cust_with_balance

WHERE postcode> "98000"

There are fewer rows in the temporary table than in the primary table, and the physical order is the required order, reducing disk I/O, so the query workload can be drastically reduced.

Note: Temporary table creation does not reflect changes to the primary table. When data is frequently modified in the primary table, be careful not to lose data.
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.