Mysql query on the sixth day 1, mysql query on the sixth day

Source: Internet
Author: User

Mysql query on the sixth day 1, mysql query on the sixth day

The general query process is from the client to the server, and there is a network between them. Parse on the server, generate the execution plan, and execute. And return it to the client. The execution will contain a large number of calls to the storage engine to retrieve data, as well as retrieval processing, such as sorting.
In general, the time is generally spent on network, CPU computing, statistical information generation, execution plan, lock wait, and I/O operations when the memory is insufficient.

Let's talk about two SQL statements used to view performance indicators.

SELECT @ profiling; SET profiling = 1; select count (*) from test; show profiles; // you can view the execution time of a statement, which gives you the most intuitive view of the SQL speed.

The second statement is:

explain select * from biz_pay_task where jd_order_id=42596246804;

The result is as follows:
+ -- + ----- + --- + -- + --- +
| Id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+ -- + ----- + --- + -- + --- +
| 1 | SIMPLE | biz_pay_task | ref | I _jd_order_id | 8 | const | 1 |
+ -- + ----- + --- + -- + --- +
The options are select_type, key, rows, and type.

  • Which index does the key use?
  • Possible number of queries required by rows mysql
  • Type corresponds to the type used by the query. For example, All indicates full table scan, ref indicates index scan, and range scan and unique index scan are also performed. It is best to reach the ref level.

Generally, you can consider the following aspects:

Whether unnecessary data is requested from the database
  • Instead of using LIMIT, I tried to deal with a large amount of data, only using the first few lines
  • When multiple tables are joined, all columns are returned, which has many duplicate columns.
  • By default, select * is used to return only required columns.
Scanned additional records?

This is not a good decision. Generally, the number of scans can be reduced after appropriate indexes are added. However, grouping statistics SQL statements cannot be indexed. Generally, we can perform the following optimization:

  • Overwrite Indexes
  • Table Structure of the table library, using a separate summary table
  • Rewrite complex queries so that the Mysql optimizer can execute the query in a better way.
Refactored query method: one complex query or multiple simple queries

This amplification of conflicts involves multiple simple SQL statements, which are then computed in the Code, or all calculations are completed using the stored procedure.
In the past, we thought that the database query computation was fast, and the communication overhead with the client was very high. However, the availability and reusability of a complex query may become increasingly important.

Split Query

Each SQL statement has the same functions, but only completes a small part.
The most typical method is paging, that is, using the limit keyword, which can be checked by page or deleted by page.
Especially Delete, because it will occupy transaction logs and locks, it is more necessary to use paging. We can use the following pseudo code to indicate paging deletion:

rows_affected = 0;do{    rows_affected = do_query(DELETE FROM test LIMIT 10000)} while rows_affected > 0;
Decomposition association query

Splits a join statement into multiple queries, for example:

Select * from Student s JOIN grade g on s. gradeId = g. id where s. name = "James"; // It can be rewritten to: SELECT * FROM Student s where s. name = "Zhang San"; SELECT * FROM Grade g where g. id in (# result #);

This looks exactly the same and increases the number of connections. But it can bring the following benefits:

  • Higher cache efficiency. If the first query has been executed, you can skip it.
  • Reduce lock Competition
  • Association at the application layer allows you to split databases for better scalability.
  • Using IN () instead of association query is more efficient than association query.
  • You can reuse the first query result at the application layer, such as cache.
Query execution Basics

  • The client sends a request to the server.
  • The server first queries the cache and returns the result if hit occurs.
  • If parsing and preprocessing are not hit, the optimizer will generate the execution plan.
  • Call APIs to perform queries according to the execution plan
  • Return the result to the client
Communication between the Mysql client and the server

The half-duplex communication method determines that the traffic cannot be restricted. After a request is sent, the request can only wait for the result.
The following parameters can be used to set the size of the received package, which is too small and may cause request failure.

Show VARIABLES like '% max_allowed_packet %'; // my. cnfmax_allowed_packet = 20 M // command set global max_allowed_packet = 2*1024*1024*10

Generally, the mysql client package obtains all data returned by SQL from the mysql server, caches the data, and then operates on the cached data. The problem is that memory overflow may occur if the result set is too high.
JDBC can use the following method to disable this return method:

Stmt = (com. mysql. jdbc. statement) con. createStatement (); stmt. setFetchSize (1); // stmt is returned when the stream is opened. enableStreamingResults (); // methods similar to the mysql mechanism include setLocalInfileInputStream, which can be quickly inserted together with LOAD DATA LOCAL INFILE

Connection status

SHOW FULL PROCESSLIST;

+ --- + -- + -------- + ------- + --- + -- + --- + -------- +
| Id | User | Host | db | Command | Time | State | Info |
+ --- + -- + -------- + ------- + --- + -- + --- + -------- +
| 1897957 | root | 192.168.147.34: 60520 | biz | Sleep | 172 | NULL |
The State shows the status of the connection thread.
Sleep is a thread waiting for the client to send a new request.
Query is being queried, and so on.

Query Cache

It is implemented through case-sensitive Hash Lookup. Therefore, it cannot be hit once the SQL statement is changed.

Lexical Analysis and preprocessing

Analysis name resolution tree, in which a syntax error is returned directly

Query Optimizer

Generate an execution plan.
Each SQL statement can have multiple execution plans. mysql uses the pre-judgment method to estimate the minimum cost plan. The following statement shows the mysql estimation result:

SELECT SQL_NO_CACHE COUNT(*) FROM biz_pay_task;SHOW STATUS LIKE 'Last_query_cost';

The returned result is the number of pages that mysql considers to be randomly searched to complete the task.

Mysql usually has the following optimization methods:

  • Redefinition of the sequence of joined tables
  • Equivalent Conversion rules: remove the constant, merge AND compare, for example, 1 = 1 AND a> 5 will be converted to a> 5
  • Optimize COUNT (), MIN (), MAX ()
    MIN () and MAX () correspond to the beginning and end of B-Tree indexes respectively, which is basically equivalent to the efficiency of constant reference.
EXPLAIN select MAX (jd_order_id) FROM biz_pay_task; // result: Extra: Select tables optimized away

Indicates that this optimization is enabled.
COUNT (), which must be supported by the storage engine. For example, some storage engines can directly return this variable without removing the need for de-counting.

  • Predict and convert to a constant expression
// Find the extension field of an Order: The third-party order number. Explain select B. id, o. third_order_id FROM biz_pay_task B inner join order_snap o ON o. virtual_order_id = B. id WHERE B. id = 1;

Result:
+ -- + ----- + --- + -- + ----- +
| Id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+ -- + ----- + --- + -- + ----- +
| 1 | SIMPLE | B | const | PRIMARY | 8 | const | 1 | Using index |
| 1 | SIMPLE | o | ALL | NULL | 61 | Using where |
+ -- + ----- + --- + -- + ----- +
It can be seen that the query is converted to two queries. The first query is the primary key, and the second query is a where query. The second query is replaced by o. virtual_order_id = 1. Use a constant value.

In addition, the MIN () mentioned above should also belong to this situation.

  • Covering Index Scanning
  • Early query termination
    For example, if it is Limit, it will not be searched down at the specified position.
    For example, if a non-existent data exists in the next ten days, the data will be directly returned from the index and will not be queried.
Explain select B. id FROM biz_pay_task B WHERE B. id =-1; // Extra is: Impossible WHERE noticed after reading const tables

For example, not exist, left join, and lift join:
Query orders with gifts
SELECT B. order_id FROM order LEFT JOIN order_sku o ON order. order_id = o. order_id WHERE o. skuName IS NULL;
This query finds that the first skuName is NULL and enters the next order, instead of scanning all.

In fact, it is a bit similar to the return, break, and continue Syntax of Java.

  • IN mysql sorts its content and uses the binary search method, which is better than other databases. Other databases are basically equivalent to multiple OR.
Join Query

In Mysql, the associated query operation is very simple, that is, nested loop. That is, first traverse the qualified items on the left, and then query the content in the right table based on the condition of each left table.

This method is also used for including subqueries.

Mysql converts an SQL statement into an execution tree during execution, which is a left-side depth-first number, for example:

Associated Query Optimization
Mainly for inline operations. Because the order of inline tables is not important in many cases, mysql may change the traversal order to traverse tables with few data records first.
Example:

Explain select v. id, o. third_order_id FROM virtual_order v inner join order_snap o ON v. id = o. virtual_order_id; // This SQL statement has the same effect as order_snap o INNER JOIN virtual_order v. During execution, you can see:

+ -- + ----- + --- + ----------- + -- + ----- +
| Id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+ -- + ----- + --- + ----------- + -- + ----- +
| 1 | easy | o | ALL | NULL | 344 |
| 1 | SIMPLE | v | eq_ref | PRIMARY | 8 | virtual_biz.o.virtual_order_id | 1 | Using index |
+ -- + ----- + --- + ----------- + -- + ----- +
We can see the execution of the o table first, because the mysql optimization point is to select the driver table, it will simply select the driver table with less data.

However, if many tables are associated, the combination may be many, so it may be converted into another method for selection. We do not recommend that you associate multiple tables.

In addition, we can specify the connection sequence and select the driver table. Use the straight join keyword. In this way, we can ensure that the driver table is what we want. For example, we should try to make the sorting behavior in the driver table, which will make the query faster.

Sorting Optimization

Sort by index and use quick sorting. If the memory is not enough, data is segmented first, and then each part is sorted separately. Finally, merge.

There are also two sorting algorithms
If max_length_for_sort_data is not exceeded, a single transmission is used. Otherwise, the data is transmitted twice.
Only available in a new version during single transmission. All columns are loaded for sorting, which reduces I/O and increases memory usage.
Two transfers, loading, sorting, and sorting columns for the first time before obtaining other data. This reduces the occupied space, but adds a lot of random I/O.

When the associated Query Needs to be sorted, if it is in the driver table, it will be sorted first.
If not, the join result is calculated first, then put into the temporary table, and then sorted.

Query execution engine

The generated execution plan is a data structure.
The execution process calls the storage engine many times through APIS.

Return the result to the client

When the result set is returned, the system determines whether the data can be cached. If yes, the data is cached first.
The results are returned incrementally. Therefore, you can set whether to receive the results incrementally when calling the API.

Copyright Disclaimer: This article is an original article by the blogger and cannot be reproduced without the permission of the blogger.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.