Introduction
Optimizing SQL is a common task of DBA. How to optimize a statement efficiently and quickly is a common problem for every DBA. In my daily optimization work, I found that many operations are essential in the optimization process. However, repetitive execution of these steps will consume a lot of DBA effort. As a result, I developed my own small tools to improve the efficiency of optimization.
What language should I choose to develop tools?
For a DBA, it is necessary to master a language to work with him. Compared with the simplicity of shell and the elegance of perl, Python is a rigorous high-level language. It has many advantages, such as easy-to-use, simple syntax, rich extension, and cross-platform. Many people refer to it as a "glue" language. Through a large number of class libraries and modules, they can quickly build the tools they need.
As a result, this little tool became the first job I learned Python, and I called it the "MySQL statement optimization auxiliary tool ". Since then, I have fallen in love with Python and developed many database-related gadgets. I will have the opportunity to introduce them to you later.
I. Optimization Methods and steps
Before using the tool, we will first describe the common methods, methods, and precautions for MySQL statement optimization. This is what we need to know in daily manual optimization.
1. Execution plan-EXPLAIN command
Execution plan is the main entry point for statement optimization. You can understand the statement execution process by interpreting the execution plan. In terms of execution plan generation, MySQL is obviously different from Oracle. It does not cache the execution plan and runs "hard parsing" every time ". The EXPLAIN command is used to view the execution plan.
Basic usage
When the keyword "EXPLAIN" is used before a Select statement, MySQL explains how to run the Select statement, and displays the table connection and connection sequence.
When the EXTENDED keyword is used, EXPLAIN generates additional information, which can be viewed using show warnings. This information shows the optimizer limits the tables and column names in the SELECT statement, what the SELECT statement looks like after rewriting and executing the optimization rule, and may include other annotations of the optimization process. MySQL and later versions can be used. In MySQL, an additional filtering column (filtered) is added ).
Displays the data shards to be accessed. It can only be used in MySQL5.1 and later versions.
The execution plan is displayed in another format. You can see information such as the association mode between tables.
Output field
The following describes the meaning of the fields output by EXPLAIN, and learns how to determine an execution plan.
The serial number queried in the execution plan selected by MySQL. If there are no subqueries in the statement, there is only one SELECT in the output, so that each row will display a 1 in this column. If subqueries, set operations, and temporary tables are used in a statement, the ID column is very complex. In the preceding example, the WHERE clause uses a subquery, and the row id = 2 indicates an associated subquery.
The query type used by the statement. Is it a simple SELECT or a complex SELECT (if it is the latter, it shows which complex type it belongs ). The following tag types are commonly used.
-
DEPENDENT SUBQUERY
The first SELECT in the subquery layer depends on the result set of the external query.
-
DEPENDENT UNION
The UNION in the subquery, which is all the SELECT statements after the second SELECT in the UNION statement, also depends on the result set of the external query.
-
PRIMARY
The outermost layer of the subquery. Note that it is not a primary key query.
-
SIMPLE
Queries other than subqueries or UNION.
-
SUBQUERY
The first SELECT statement of the subquery's inner-layer query. The results do not depend on the external query result set.
-
UNCACHEABLE SUBQUERY
Subqueries that cannot be cached in the result set.
-
UNION
All the SELECT statements after the second SELECT statement in the UNION statement, and the first SELECT statement is PRIMARY.
-
UNION RESULT
UNION. Obtain the SELECT statement of the result from the temporary UNION table.
-
DERIVED
Derivative table query (subquery in the FROM clause ). MySQL recursively executes these subqueries and places the results in the temporary table. Internally, the server will reference it as a "derived table", because the temporary table is derived from the subquery.
The name of the table in the database accessed in this step or an individual table specified by the SQL statement. This value may be the table name, table alias, or an identifier of a temporary table generated for the query, such as a derived table, subquery, or set.
Table access method. The following lists various types of table connections, from the best to the worst.
-
System
The system table has only one row of records. This is a special case of the const table connection type.
-
Const
Read constants. A maximum of one matching record can be found. Since there is only one record, the field value recorded by this row in the optimizer can be treated as a constant value. Const is used to compare a fixed value with a primary key or UNIQUE index.
-
Eq_ref
There can be at most one matching result, which is generally accessed through the primary key or unique key index. A row of records is read from the table and associated with the records read from the previous table. Different from the const type, this is the best connection type. It is used for connecting all parts of the index and the index is of the primary key or UNIQUE type. Eq_ref can be used to retrieve fields when "=" is compared. The comparative values can be fixed values or expressions, and the fields in the table can be used in the expression. They are ready before reading the table.
-
Ref
The query that drives table index reference in the JOIN statement. All records that match the search value in this table will be taken out and used together with the records obtained from the previous table. Ref is the leftmost prefix used by the Connection program to use the KEY, or the KEY is not a primary key or a UNIQUE index (in other words, the connection program cannot retrieve only one record based on the KEY value. When only a few matching records are queried based on the key value, this is a good connection type. Ref can also be used to compare fields by using the "=" operator.
-
Ref_or_null
The only difference from ref is that a query with a null value is added in addition to the query referenced by the index. This connection type is similar to ref. The difference is that MySQL will search for records containing NULL values during retrieval. This connection type is optimized from MySQL 4.1.1 and is often used for subqueries.
-
Index_merge
Two (or more) indexes are used in the query, and then the index results are merged (merge), and the table data is read. This connection type means that the Index Merge optimization method is used.
-
Unique_subquery
The combination of returned fields in a subquery is a primary key or a unique constraint.
-
Index_subquery
The returned result Field combination in a subquery is an index (or an index combination), but it is not a primary key or a unique index. The connection type is similar to unique_subquery. It uses a subquery to replace IN, but it is used when the subquery does not have a unique index.
-
Range
Index range scan. Only records within a given range are retrieved and an index is used to obtain a record.
-
Index
Full Index scan. The connection type is the same as that of ALL. The difference is that it only scans the index tree. It is usually faster than ALL because the index file is usually smaller than the data file. MySQL uses this connection type when the queried field knowledge is a separate part of the index.
-
Fulltext
Full-text index scanning.
-
All
Full table scan.
This field indicates which index MySQL may use when searching table records. If no index is available, null is displayed.
The index selected by the query optimizer from possible_keys. The key field shows the indexes actually used by MySQL. When no index is used, the value of this field is NULL.
The length of the selected index key. The key_len field shows the index length used by MySQL. If the value of the key field is NULL, the index length is NULL.
The list is filtered by a constant or a field in a table. The ref field shows which fields or constants are used to work with keys to query records from the table.
This field shows the number of records in the result set estimated by the query optimizer through the statistical information collected by the system.
This field displays the additional MySQL information in the query.
This column type is newly added in MySQL5.1 and will only appear when you use explain extended. It displays a pessimistic estimate of the percentage of records that meet a certain condition (WHERE clause or join condition) in the table.
SQL rewriting
In addition to the execution plan, the EXPLAIN statement can also display SQL rewriting. SQL rewriting means that MySQL will rewrite the SQL statement based on some principles before optimizing the SQL statement to facilitate the later Optimizer to optimize and generate a better execution plan. This function is used in combination with explain extended + show warnings. The following is an example.
IN the preceding example, we can see that the IN subquery IN the original statement is rewritten as the join method between tables.
2. Statistics
Viewing statistics is also an essential step in optimization statements. Statistics help you quickly understand the storage features of objects. The following describes two types of statistical information: tables and indexes.
TABLE Statistics-SHOW TABLE STATUS
-
Name: table Name
-
Engine: Storage Engine type of a table (ISAM, MyISAM, or InnoDB)
-
Row_format: row storage format (Fixed-Fixed, Dynamic-Dynamic or Compressed-Compressed)
-
Rows: number of Rows. In some storage engines, such as MyISAM and ISAM, they store precise records. However, in other storage engines, it may only be an approximate value.
-
Avg_row_length: Average length of a row.
-
Data_length: the length of the data file.
-
Max_data_length: the maximum length of the data file.
-
Index_length: the length of the index file.
-
Data_free: allocated but no bytes are used.
-
Auto_increment: next autoincrement (automatically add 1) value.
-
Create_time: the time when the table is created.
-
Update_time: The last update time of the data file.
-
Check_time: The last time the table was run. Update after executing the mysqlcheck command, which is only valid for MyISAM.
-
Create_options: Additional options left for create table.
-
Comment: Comment used when creating a table (or why MySQL cannot access some table information ).
-
Version: the Version number of the '. Frm' file of the data table.
-
Collation: the character set of the table and the corrected character set.
-
Checksum: real-time verification value (if any ).
3. INDEX statistics-SHOW INDEX
-
Table: Table name.
-
Non_unique: 0. If the index cannot contain duplicates.
-
Key_name: Index Name
-
Seq_in_index: column sequence number in the index, starting from 1.
-
Column_name: column name.
-
Collation: How columns are sorted in the index. In MySQL, this value can be A (ascending) or NULL (unordered ).
-
Cardinality: the number of unique values in the index.
-
Sub_part: number of index characters if only some columns are indexed. When the entire field is indexed, its value is NULL.
-
Packed: indicates how the key value is compressed, and NULL indicates no compression.
-
Null: when the record containing NULL in a field is YES, its value is; otherwise, it is ".
-
Index_type: the index algorithm used (including BTREE, FULLTEXT, HASH, and RTREE ).
-
Comment: remarks.
-
System parameters: System parameters also affect the statement execution efficiency. To view system parameters, run the show variables command.
Parameter description
There are many system parameters, which are described below.
The size of the sorting area. Its size directly affects the algorithm used for sorting. If the sorting in the system is large, the memory is sufficient, and the concurrency is not large, you can add this parameter as appropriate. This parameter is for a single Thead.
The size of the memory area used by the Join operation. Only Join Buffer is used when Join is ALL, index, range, or index_merge. If there are many join statements, you can increase join_buffer_size as appropriate. Note that this value is for a single Thread. Each Thread creates an independent Buffer instead of the Buffer shared by the entire system. Do not set it too large to cause insufficient system memory.
If the temporary table in the memory exceeds this value, MySQL automatically converts it to the MyISAM table on the hard disk. If you execute many advanced group by queries with a large amount of memory, you can increase the value of tmp_table_size.
The buffer size that can be used by the read query operation. This parameter is for a single Thead.
4. Optimizer switch
In MySQL, some parameters can be used to control the optimizer behavior.
Parameter description
This parameter controls the limit of the optimizer when the execution plan is exhausted. If the query is in the "statistics" status for a long time, you can reduce this parameter.
It is enabled by default, which enables the optimizer to decide whether to skip certain execution plans based on the number of rows to be scanned.
This variable contains some flags for enabling/disabling Optimizer features.
Example-intervene in optimizer behavior (ICP feature)
By default, the ICP feature is enabled. Check the optimizer behavior.
The secondary index-based filtering query uses the ICP feature, which is visible from the "Using index condition" in Extra. What if the optimizer is used to intervene in the optimizer behavior?
As shown in Extra, the ICP feature is disabled.
5. System STATUS)
MySQL also has some built-in states. These state variables can also reflect some situations of statement execution to facilitate problem locating. For manual execution, run the show status command before and after the statement to view the STATUS changes. Of course, because there are many status variables, the comparison is not very convenient. The gadgets I will introduce later can solve this problem.
Status variable
There are many status variables. Here are a few.
The number of merge executed by the sorting algorithm. If the value of this variable is large, you should consider increasing the value of the sort_buffer_size system variable.
The number of sorting tasks in the specified range.
Number of sorted rows.
The number of sorting completed by scanning the table.
The number of times the first index is read. The number of times the index header is read. If this value is high, it indicates that there are many full index scans.
The number of requests that read a row based on the key. If the index is high, the query and table index are correct.
Number of requests read from the next row in the key-ordered order. This value increases if you use range constraints or if you perform an index scan to query index columns.
The number of requests read from the previous row in a key-ordered manner.
The number of requests reading a row at a fixed position. If you execute a large number of queries and want to sort the results, this value is high. A large number of queries or connections that require MySQL to scan the entire table are not correctly used.
The number of requests read from the next row in the data file. If a large number of tables are being scanned, the value is high. It usually indicates that the table index is incorrect or the index is not used for the written query.
6. SQL performance analyzer (Query Profiler)
MySQL Query Profiler is a very convenient Query diagnostic and analysis tool that can be used to obtain the consumption of multiple resources in a Query throughout the execution process, such as CPU, IO, IPC, and SWAP, as well as page faults and context switche. At the same time, you can obtain the position of the functions called by MySQL in the source file during the Query execution.
Usage
Mysql> select @ profiling;
Mysql> set profiling = 1;
By default, the value of profiling is 0, indicating that MySQL SQL Profiler is OFF. After enabling SQL performance analyzer, the value of profiling is 1.
Mysql> select count (*) from t1;
Run the "show profile" command to obtain the profile information of multiple queries saved in the current system.
Mysql> show profiles;
+ ---- + -------- +
| Query_ID | Duration | Query |
+ ---- + -------- +
| 1 | 0.00039300 | select count (*) from t1 |
+ ---- + -------- +
After obtaining the summary information, you can obtain the detailed profile information during the execution of a Query based on the Query_ID of the summary information.
Mysql> show profile for query 1;
Mysql> show profile cpu, block io for query 1;
II. Tool description
I have discussed a variety of methods before, which are helpful for optimizing SQL statements. The following small tool can automatically call the command to push the above content to the DBA at one time, greatly accelerating the optimization process.
1. Prerequisites
2. Call method
Python mysql_tuning.py-p tuning_ SQL .ini-s 'Select xxx'
Parameter description
-P: specifies the configuration file name.
-S specifies the SQL statement
3. Configuration file
There are two pieces of information: [database] description of database connection information, [option] running configuration information.
[Database]
Server_ip = 127.0.0.1
Db_user = testuser
Db_pwd = testpwd
Db_name = test
[Option]
Sys_parm = ON // whether to display system parameters
SQL _plan = ON // whether to display the execution plan
Obj_stat = ON // whether to display statistics of related objects (tables and indexes)
Ses_status = ON // whether to display the status information before and after running (the SQL statement is executed after activation)
SQL _profile = ON // whether to display PROFILE trace information (the SQL statement is executed after activation)
4. Output description
Title
Contains the address information and data version information of the running database.
Original SQL
The user executes the entered SQL, which is mainly used for subsequent comparison of SQL rewriting. The statement is formatted.
System parameters
The script selection displays some SQL performance-related parameters. This part is written to the code, and the script needs to be modified if it needs to be extended.
Optimizer switch
The following are some parameters related to the optimizer. By adjusting these parameters, you can manually intervene in the optimizer behavior.
Execution plan
Is the output result of calling explain extended. If the result is too long, a serial problem may occur (not resolved at the moment ).
SQL statement rewritten by the Optimizer
Here, you can determine whether the Optimizer has optimized the SQL statements (such as the processing of subqueries ).
Statistics
The statistical information of all involved tables and indexes in the SQL statement is displayed here.
Running status information
Compare the show status before and after execution at the session level, and display the changed part. Note that the SELECT method is used to collect status data, which may cause errors of some indicators (such as Com_select ).
PROFILE details
Detailed information obtained by calling show profile.
PROFILE Summary
According to the resource consumption of PROFILE, the consumption comparison of different stages (top n) is displayed, and the bottleneck is displayed intuitively.