Command methods for accelerating MySQL SQL statement optimization

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Introduction

Optimizing SQL is a common task of DBA. How to optimize a statement efficiently and quickly is a common problem for every DBA. In my daily optimization work, I found that many operations are essential in the optimization process. However, repetitive execution of these steps will consume a lot of DBA effort. As a result, I developed my own small tools to improve the efficiency of optimization.

What language should I choose to develop tools?

For a DBA, it is necessary to master a language to work with him. Compared with the simplicity of shell and the elegance of perl, Python is a rigorous high-level language. It has many advantages, such as easy-to-use, simple syntax, rich extension, and cross-platform. Many people refer to it as a "glue" language. Through a large number of class libraries and modules, they can quickly build the tools they need.

As a result, this little tool became the first job I learned Python, and I called it the "MySQL statement optimization auxiliary tool ". Since then, I have fallen in love with Python and developed many database-related gadgets. I will have the opportunity to introduce them to you later.

I. Optimization Methods and steps

Before using the tool, we will first describe the common methods, methods, and precautions for MySQL statement optimization. This is what we need to know in daily manual optimization.

1. Execution plan-EXPLAIN command

Execution plan is the main entry point for statement optimization. You can understand the statement execution process by interpreting the execution plan. In terms of execution plan generation, MySQL is obviously different from Oracle. It does not cache the execution plan and runs "hard parsing" every time ". The EXPLAIN command is used to view the execution plan.

Basic usage

EXPLAIN QUERY

When the keyword "EXPLAIN" is used before a Select statement, MySQL explains how to run the Select statement, and displays the table connection and connection sequence.

EXPLAIN EXTENDED QUERY

When the EXTENDED keyword is used, EXPLAIN generates additional information, which can be viewed using show warnings. This information shows the optimizer limits the tables and column names in the SELECT statement, what the SELECT statement looks like after rewriting and executing the optimization rule, and may include other annotations of the optimization process. MySQL and later versions can be used. In MySQL, an additional filtering column (filtered) is added ).

EXPLAIN PARTITIONS QUERY

Displays the data shards to be accessed. It can only be used in MySQL5.1 and later versions.

Explain format = JSON (new in 5.6)

The execution plan is displayed in another format. You can see information such as the association mode between tables.

Output field

The following describes the meaning of the fields output by EXPLAIN, and learns how to determine an execution plan.

The serial number queried in the execution plan selected by MySQL. If there are no subqueries in the statement, there is only one SELECT in the output, so that each row will display a 1 in this column. If subqueries, set operations, and temporary tables are used in a statement, the ID column is very complex. In the preceding example, the WHERE clause uses a subquery, and the row id = 2 indicates an associated subquery.

Select_type

The query type used by the statement. Is it a simple SELECT or a complex SELECT (if it is the latter, it shows which complex type it belongs ). The following tag types are commonly used.

DEPENDENT SUBQUERY

The first SELECT in the subquery layer depends on the result set of the external query.
DEPENDENT UNION

The UNION in the subquery, which is all the SELECT statements after the second SELECT in the UNION statement, also depends on the result set of the external query.
PRIMARY

The outermost layer of the subquery. Note that it is not a primary key query.
SIMPLE

Queries other than subqueries or UNION.
SUBQUERY

The first SELECT statement of the subquery's inner-layer query. The results do not depend on the external query result set.
UNCACHEABLE SUBQUERY

Subqueries that cannot be cached in the result set.
UNION

All the SELECT statements after the second SELECT statement in the UNION statement, and the first SELECT statement is PRIMARY.
UNION RESULT

UNION. Obtain the SELECT statement of the result from the temporary UNION table.
DERIVED

Derivative table query (subquery in the FROM clause ). MySQL recursively executes these subqueries and places the results in the temporary table. Internally, the server will reference it as a "derived table", because the temporary table is derived from the subquery.

Table

The name of the table in the database accessed in this step or an individual table specified by the SQL statement. This value may be the table name, table alias, or an identifier of a temporary table generated for the query, such as a derived table, subquery, or set.

Type

Table access method. The following lists various types of table connections, from the best to the worst.

System

The system table has only one row of records. This is a special case of the const table connection type.
Const

Read constants. A maximum of one matching record can be found. Since there is only one record, the field value recorded by this row in the optimizer can be treated as a constant value. Const is used to compare a fixed value with a primary key or UNIQUE index.
Eq_ref

There can be at most one matching result, which is generally accessed through the primary key or unique key index. A row of records is read from the table and associated with the records read from the previous table. Different from the const type, this is the best connection type. It is used for connecting all parts of the index and the index is of the primary key or UNIQUE type. Eq_ref can be used to retrieve fields when "=" is compared. The comparative values can be fixed values or expressions, and the fields in the table can be used in the expression. They are ready before reading the table.
Ref

The query that drives table index reference in the JOIN statement. All records that match the search value in this table will be taken out and used together with the records obtained from the previous table. Ref is the leftmost prefix used by the Connection program to use the KEY, or the KEY is not a primary key or a UNIQUE index (in other words, the connection program cannot retrieve only one record based on the KEY value. When only a few matching records are queried based on the key value, this is a good connection type. Ref can also be used to compare fields by using the "=" operator.
Ref_or_null

The only difference from ref is that a query with a null value is added in addition to the query referenced by the index. This connection type is similar to ref. The difference is that MySQL will search for records containing NULL values during retrieval. This connection type is optimized from MySQL 4.1.1 and is often used for subqueries.
Index_merge

Two (or more) indexes are used in the query, and then the index results are merged (merge), and the table data is read. This connection type means that the Index Merge optimization method is used.
Unique_subquery

The combination of returned fields in a subquery is a primary key or a unique constraint.
Index_subquery

The returned result Field combination in a subquery is an index (or an index combination), but it is not a primary key or a unique index. The connection type is similar to unique_subquery. It uses a subquery to replace IN, but it is used when the subquery does not have a unique index.
Range

Index range scan. Only records within a given range are retrieved and an index is used to obtain a record.
Index

Full Index scan. The connection type is the same as that of ALL. The difference is that it only scans the index tree. It is usually faster than ALL because the index file is usually smaller than the data file. MySQL uses this connection type when the queried field knowledge is a separate part of the index.
Fulltext

Full-text index scanning.
All

Full table scan.

Possible_keys

This field indicates which index MySQL may use when searching table records. If no index is available, null is displayed.

The index selected by the query optimizer from possible_keys. The key field shows the indexes actually used by MySQL. When no index is used, the value of this field is NULL.

Key_len

The length of the selected index key. The key_len field shows the index length used by MySQL. If the value of the key field is NULL, the index length is NULL.

The list is filtered by a constant or a field in a table. The ref field shows which fields or constants are used to work with keys to query records from the table.

Rows

This field shows the number of records in the result set estimated by the query optimizer through the statistical information collected by the system.

Extra

This field displays the additional MySQL information in the query.

Filtered

This column type is newly added in MySQL5.1 and will only appear when you use explain extended. It displays a pessimistic estimate of the percentage of records that meet a certain condition (WHERE clause or join condition) in the table.

SQL rewriting

In addition to the execution plan, the EXPLAIN statement can also display SQL rewriting. SQL rewriting means that MySQL will rewrite the SQL statement based on some principles before optimizing the SQL statement to facilitate the later Optimizer to optimize and generate a better execution plan. This function is used in combination with explain extended + show warnings. The following is an example.

IN the preceding example, we can see that the IN subquery IN the original statement is rewritten as the join method between tables.

2. Statistics

Viewing statistics is also an essential step in optimization statements. Statistics help you quickly understand the storage features of objects. The following describes two types of statistical information: tables and indexes.

TABLE Statistics-SHOW TABLE STATUS

Name: table Name
Engine: Storage Engine type of a table (ISAM, MyISAM, or InnoDB)
Row_format: row storage format (Fixed-Fixed, Dynamic-Dynamic or Compressed-Compressed)
Rows: number of Rows. In some storage engines, such as MyISAM and ISAM, they store precise records. However, in other storage engines, it may only be an approximate value.
Avg_row_length: Average length of a row.
Data_length: the length of the data file.
Max_data_length: the maximum length of the data file.
Index_length: the length of the index file.
Data_free: allocated but no bytes are used.
Auto_increment: next autoincrement (automatically add 1) value.
Create_time: the time when the table is created.
Update_time: The last update time of the data file.
Check_time: The last time the table was run. Update after executing the mysqlcheck command, which is only valid for MyISAM.
Create_options: Additional options left for create table.
Comment: Comment used when creating a table (or why MySQL cannot access some table information ).
Version: the Version number of the '. Frm' file of the data table.
Collation: the character set of the table and the corrected character set.
Checksum: real-time verification value (if any ).

3. INDEX statistics-SHOW INDEX

Table: Table name.
Non_unique: 0. If the index cannot contain duplicates.
Key_name: Index Name
Seq_in_index: column sequence number in the index, starting from 1.
Column_name: column name.
Collation: How columns are sorted in the index. In MySQL, this value can be A (ascending) or NULL (unordered ).
Cardinality: the number of unique values in the index.
Sub_part: number of index characters if only some columns are indexed. When the entire field is indexed, its value is NULL.
Packed: indicates how the key value is compressed, and NULL indicates no compression.
Null: when the record containing NULL in a field is YES, its value is; otherwise, it is ".
Index_type: the index algorithm used (including BTREE, FULLTEXT, HASH, and RTREE ).
Comment: remarks.
System parameters: System parameters also affect the statement execution efficiency. To view system parameters, run the show variables command.

Parameter description

There are many system parameters, which are described below.

Sort_buffer_size

The size of the sorting area. Its size directly affects the algorithm used for sorting. If the sorting in the system is large, the memory is sufficient, and the concurrency is not large, you can add this parameter as appropriate. This parameter is for a single Thead.

Join_buffer_size

The size of the memory area used by the Join operation. Only Join Buffer is used when Join is ALL, index, range, or index_merge. If there are many join statements, you can increase join_buffer_size as appropriate. Note that this value is for a single Thread. Each Thread creates an independent Buffer instead of the Buffer shared by the entire system. Do not set it too large to cause insufficient system memory.

Tmp_table_size

If the temporary table in the memory exceeds this value, MySQL automatically converts it to the MyISAM table on the hard disk. If you execute many advanced group by queries with a large amount of memory, you can increase the value of tmp_table_size.

Read_buffer_size

The buffer size that can be used by the read query operation. This parameter is for a single Thead.

4. Optimizer switch

In MySQL, some parameters can be used to control the optimizer behavior.

Parameter description

Optimizer_search_depth

This parameter controls the limit of the optimizer when the execution plan is exhausted. If the query is in the "statistics" status for a long time, you can reduce this parameter.

Optimizer_prune_level

It is enabled by default, which enables the optimizer to decide whether to skip certain execution plans based on the number of rows to be scanned.

Optimizer_switch

This variable contains some flags for enabling/disabling Optimizer features.

Example-intervene in optimizer behavior (ICP feature)

By default, the ICP feature is enabled. Check the optimizer behavior.

The secondary index-based filtering query uses the ICP feature, which is visible from the "Using index condition" in Extra. What if the optimizer is used to intervene in the optimizer behavior?

As shown in Extra, the ICP feature is disabled.

5. System STATUS)

MySQL also has some built-in states. These state variables can also reflect some situations of statement execution to facilitate problem locating. For manual execution, run the show status command before and after the statement to view the STATUS changes. Of course, because there are many status variables, the comparison is not very convenient. The gadgets I will introduce later can solve this problem.

Status variable

There are many status variables. Here are a few.

Sort_merge_passes

The number of merge executed by the sorting algorithm. If the value of this variable is large, you should consider increasing the value of the sort_buffer_size system variable.

Sort_range

The number of sorting tasks in the specified range.

Sort_rows

Number of sorted rows.

Sort_scan

The number of sorting completed by scanning the table.

Handler_read_first

The number of times the first index is read. The number of times the index header is read. If this value is high, it indicates that there are many full index scans.

Handler_read_key

The number of requests that read a row based on the key. If the index is high, the query and table index are correct.

Handler_read_next

Number of requests read from the next row in the key-ordered order. This value increases if you use range constraints or if you perform an index scan to query index columns.

Handler_read_prev

The number of requests read from the previous row in a key-ordered manner.

Handler_read_rnd

The number of requests reading a row at a fixed position. If you execute a large number of queries and want to sort the results, this value is high. A large number of queries or connections that require MySQL to scan the entire table are not correctly used.

Handler_read_rnd_next

The number of requests read from the next row in the data file. If a large number of tables are being scanned, the value is high. It usually indicates that the table index is incorrect or the index is not used for the written query.

6. SQL performance analyzer (Query Profiler)

MySQL Query Profiler is a very convenient Query diagnostic and analysis tool that can be used to obtain the consumption of multiple resources in a Query throughout the execution process, such as CPU, IO, IPC, and SWAP, as well as page faults and context switche. At the same time, you can obtain the position of the functions called by MySQL in the source file during the Query execution.

Usage

Enable

Mysql> select @ profiling;

Mysql> set profiling = 1;

By default, the value of profiling is 0, indicating that MySQL SQL Profiler is OFF. After enabling SQL performance analyzer, the value of profiling is 1.

Execute SQL statements

Mysql> select count (*) from t1;

Obtain summary information

Run the "show profile" command to obtain the profile information of multiple queries saved in the current system.

Mysql> show profiles;

+ ---- + -------- +

| Query_ID | Duration | Query |

+ ---- + -------- +

| 1 | 0.00039300 | select count (*) from t1 |

+ ---- + -------- +

Obtain detailed profile information for a single Query

After obtaining the summary information, you can obtain the detailed profile information during the execution of a Query based on the Query_ID of the summary information.

Mysql> show profile for query 1;

Mysql> show profile cpu, block io for query 1;

II. Tool description

I have discussed a variety of methods before, which are helpful for optimizing SQL statements. The following small tool can automatically call the command to push the above content to the DBA at one time, greatly accelerating the optimization process.

1. Prerequisites

Module-MySQLDB
Module-sqlparse
Python version = 2.7.3 (2.6.x should be okay, and 3.x is not tested)

2. Call method

Python mysql_tuning.py-p tuning_ SQL .ini-s 'Select xxx'

Parameter description

-P: specifies the configuration file name.

-S specifies the SQL statement

3. Configuration file

There are two pieces of information: [database] description of database connection information, [option] running configuration information.

[Database]

Server_ip = 127.0.0.1

Db_user = testuser

Db_pwd = testpwd

Db_name = test

[Option]

Sys_parm = ON // whether to display system parameters

SQL _plan = ON // whether to display the execution plan

Obj_stat = ON // whether to display statistics of related objects (tables and indexes)

Ses_status = ON // whether to display the status information before and after running (the SQL statement is executed after activation)

SQL _profile = ON // whether to display PROFILE trace information (the SQL statement is executed after activation)

4. Output description

Title

Contains the address information and data version information of the running database.

Original SQL

The user executes the entered SQL, which is mainly used for subsequent comparison of SQL rewriting. The statement is formatted.

System parameters

The script selection displays some SQL performance-related parameters. This part is written to the code, and the script needs to be modified if it needs to be extended.

Optimizer switch

The following are some parameters related to the optimizer. By adjusting these parameters, you can manually intervene in the optimizer behavior.

Execution plan

Is the output result of calling explain extended. If the result is too long, a serial problem may occur (not resolved at the moment ).

SQL statement rewritten by the Optimizer

Here, you can determine whether the Optimizer has optimized the SQL statements (such as the processing of subqueries ).

Statistics

The statistical information of all involved tables and indexes in the SQL statement is displayed here.

Running status information

Compare the show status before and after execution at the session level, and display the changed part. Note that the SELECT method is used to collect status data, which may cause errors of some indicators (such as Com_select ).

PROFILE details

Detailed information obtained by calling show profile.

PROFILE Summary

According to the resource consumption of PROFILE, the consumption comparison of different stages (top n) is displayed, and the bottleneck is displayed intuitively.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More