MySQL Query performance optimization
The optimization of MySQL query performance involves many aspects, including library table structure, establishing reasonable index and reasonable query. The library table structure includes how to design associations between tables, data types for table fields, and so on. This requires a design based on a specific scenario. As follows, we describe how to improve MySQL query performance from the perspective of database indexing and query statement design.
Database index
An index is a data structure in the storage engine that is used to quickly locate records. There are several classifications of indexes, which can be divided into clustered indexes and non-clustered indexes according to the storage method. The uniqueness of the data can be divided into: Unique index and non-unique index, according to the number of columns can be divided into: single-row index and multi-column index, etc. There are also several types of indexes: B-tree index, hash index, Spatial data Index (R-TREE), full-text index, and so on.
In the process of using the B-tree index to query, there are a few considerations, we are described in table A. Where table A is defined as follows:
CREATE TABLE A (ID int auto_increment PRIMARY key, name varchar (ten), age tinyint, sex enum (' Male ', ' female '), birth datatime, key (NA Me,age,sex)); The ID is the primary key and the index is established on the Name,age,sex column.
Full-value match: Refers to the match of all columns in the index, such as finding name= ' Jone ' and age=13 and sex= ' man ';
Match leftmost prefix: refers to the first column of the index name, such as where name= ' Jone ', which uses only the first column of the index
Match column prefix: Matches the beginning of the index column value, such as where name like ' j% ', looking for a person whose name begins with J;
Match range values: For example, find Jone,where name= ' Jone ' and age between between 10-30 and 30;
Query that accesses indexes only: If the field selected in the Select is a field in the index, then no access to the data rows is required, thus increasing the query speed.
If you do not find the leftmost column of the index, you cannot use the index, such as when looking only for people aged 15 years in table A, you cannot use the index;
You cannot skip a column in an index, such as looking for a man whose name is jone in table A, the index can only use the Name column and cannot use the sex column;
A column that is indexed in a query is a range query, the query condition after the column cannot use the index.
The difference between a hash index and a b-tree:
A hash index refers to a hash value (computed from a column in key) and a row pointer, while B-tree stores the column value. So hash cannot use index to avoid reading data rows;
Hash index data is not stored in the order of index values, so it cannot be used for sorting;
The hash index does not support partial index column matching lookups, because the hash value is calculated from all the columns in the index;
The hash index only supports equivalent comparison queries, including =, in (), <=>. Range queries are not supported.
The index not only allows the server to quickly navigate to the specified location of the table, but also has the following advantages:
The B-tree index stores data in the order of columns, so it can be used to do order by and group by operations, avoiding sorting and staging tables
The value of the indexed column is stored in the B-tree index, so you can avoid accessing the data row when the value of select is in the index
Indexes can effectively reduce the amount of data scanned by the server.
Correctly creating and using indexes is the basis for implementing high-performance queries. The various types of indexes and their corresponding pros and cons are described earlier. There are many ways to efficiently select and use indexes, some of which are optimized for special cases, and others for specific behaviors.
Stand-alone columns: The index cannot be part of an expression, nor can it be an argument to a function. such as: SELECT * from A where id+1=5; The primary key index cannot be used.
Prefix index and index selectivity: Sometimes you need to index a long string, the index takes up a lot of space, you can usually index the beginning of some characters to save the index space, improve index efficiency, but also reduce the selectivity of the index. Selectivity of index = total number of records of the index value/data table are not duplicated. The higher the selectivity of the index, the higher the query efficiency.
multicolumn indexes: First you need to explain that creating indexes on multiple columns is not equivalent to indexing each column of those columns independently. When executing a query, MySQL can use only one index. If you have three single-column indexes, MySQL will try to select one of the most restrictive indexes. Even with the most restrictive single-column index, it is certainly far less restrictive than a multicolumn index on these three columns. For example, we want to query the table A in the ID 3 or the first letter of a person, the SQL statement of two comparisons, where the second way to reduce the number of scans on the table:
The order of indexed columns in a multicolumn index is also important, and you need to consider how to better meet the need for sorting and grouping (B-tree) When designing the order of the indexes. In a multi-column B-tree index, the order of indexed columns means that the index is sorted first by the leftmost column, followed by the second column, and so on. There is a rule of thumb for determining the order of indexed columns: Place the highest-selectivity columns at the forefront of the index. Of course, if you need to consider the sorting of the table, you need to consider the situation separately.
Clustered index: Not a separate index type, but a way of storing data, the details depend on how it is implemented, and the InnoDB clustered index actually holds b-tree indexes and data rows in the same structure, and a table can have only one clustered index. The excellent (1-3) missing (4-7) points of the clustered index are as follows:
-
To keep related data together. For example, when implementing an e-mail, you can aggregate the data based on the user ID, so that only a small number of data pages are read from disk to get all the messages for a user. If you do not have a clustered index, each message can cause disk I/O at a time;
-
Data access is faster. The clustered index saves the index and data in the same b-tree, so getting data in a clustered index is usually faster than finding it in a nonclustered index;
-
Queries using the overlay index Scan can use the primary key values directly in the page node;
-
B-tree index insertion speed is heavily dependent on the insertion order. Inserting data in the order of the values in the clustered index column is the fastest in the InnoDB table;
-
Updates the clustered index column at a high cost because it forces InnoDB to move each updated row to a new location;
-
The inserted new row may face a "page splitting" problem when it moves. Page splitting problem is when a clustered index requires that the row must be inserted into a full page, the storage engine splits the page into two pages to accommodate the row, which is a page split operation that causes the table to take up more disk space;
-
Clustered indexes can cause full table scans to slow down. In particular, rows are sparse, or the data is stored discontinuous due to page splitting. The
Above is a stolen graph of the time and index size of the data inserted into the InnoDB table, where the only difference between the UserInfo table and the Userinfo_uuid table is the UserInfo table with the ID primary key, and the Userinfo_uuid table with the UUID as the primary key , and the order of inserting 1 million and 3 million data is inserted in the order of the ID column, it is known that when inserting 3 million data rows, the Userinfo_uuid table is not inserted according to the primary key (UUID) order, resulting in a large number of page splits, which will require more time to insert, The index takes up more space.
Overwrite index: Everyone will build the appropriate index based on the Where condition, which is just one aspect of index optimization. An excellent index should also consider the entire query. MySQL can use the index to get the data of the column directly, so there is no need to read the data rows. If the index contains (overrides) all the field values that need to be queried, we call it the overwrite index. When a query is an index overlay query, the extra column can see information using index.
Of course, there are many pitfalls to overwrite queries that could lead to optimizations that cannot be achieved. The MySQL query optimizer determines whether an index can overwrite the fields in the Where Condition and the Select field before executing the query. If you cannot overwrite, you still need to scan the data rows.
Because the primary key value is stored in the InnoDB table in the non-clustered index, we first get the primary key value based on the condition and then query based on the primary key value, which is called deferred correlation.
Use an index scan to do the sorting. If the Explain Type column value is index, MySQL uses an index scan to do the sorting. The scan index itself is fast, but if the index does not overwrite all the columns required by the query, it is necessary to query the corresponding row at a time with each index record. This is basically random I/O, so the speed of reading in indexed order is usually slower than the sequential full table scan, especially when I am I/o intensive workloads. Therefore, MySQL should design the index as much as possible to match the sort and lookup. MySQL can use an index to sort the results only if the order of the index column and ORDER BY clauses are exactly the same, and if all columns are sorted in the same direction. If a query is associated with more than one table, you can use index ordering only if the fields referenced by the ORDER BY clause are all the first.
As above is the query using the primary key ID sort and name sorting, you can see that the queries sorted by ID Use the index sort, and the query with name sort uses Filesort.
In general, when you write query statements, you should choose the appropriate indexes to avoid single-row lookups, use native order as much as possible to avoid additional sorting operations, and use indexes to overwrite queries whenever possible. We analyze the query by response time, find the longest-consuming query or the most stressful query for the server, then check the schema of the query, SQL and index structure, determine if there are too many queries scanned, whether to do a lot of extra sort or use a temporary table, whether the use of random I /o accesses the data, or too many tables query the operations of the columns that are not in the index.
Query design
When the query efficiency is not high, the first step is to consider whether the query statement design is reasonable. Here's a look at some of the query optimization techniques, and then introduce some of the mechanisms inside the MySQL optimizer and show how MySQL executes the query. Finally, explore the pattern of query optimization to help MySQL execute queries more efficiently.
The most basic reason for poor query performance is that there is too much data to access. As a result, most performance-poor queries can be optimized by reducing the amount of data accessed. Reducing the amount of data access often means that too many rows are accessed, but sometimes it is possible to access too many columns. If you only need to query the previous rows in the result set at query time, the simplest way is to add the limit at the end of the query statement. You should try to avoid using SELECT * when making multiple-table association queries, because it returns all the columns of the table, but these columns may not all be required. In addition to requesting data that is not needed, you need to see if MySQL is scanning for additional records, which can be measured by the number of rows scanned and the number of rows returned. If you find that you need to scan large amounts of data in a query but return only a few rows, you can usually:
Use the index overlay scan to index all required columns so that the storage engine can return the results without having to fetch the corresponding rows back to the table;
Change the structure of the library table;
Rewrite this complex query so that the MySQL optimizer can execute the query in a more optimal way.
One important question to consider when designing queries is whether you need to divide a complex query into multiple simple queries. In the traditional implementation always emphasize that the database layer to do as much work as possible, this logic is always thought that network communication, query parsing and optimization is a very expensive thing. But the idea is not suitable for MySQL, and MySQL is very lightweight to connect and disconnect from design, and is efficient in returning a small query result.
Decomposing associative queries: Many high-performance applications decompose associative queries, simply by making a single-table query on each table, and then associating the results in the application. As shown in the following:
Query the Computer 1 class students all the results, we can break up the process into three sub-steps, as follows:
So where is the benefit of this decomposition? The first is to make the cache more efficient. Many applications can easily query the corresponding result object for a table of ease. If the computer has been cached 1 classes corresponding to the ID of the 1,tb_student table in Class 1 students have 1th numbers and 5th, so you can check the scores from the score table of 1th and 5th students; second, after the query decomposition, the execution of a single query can reduce the lock competition, and again the efficiency of the query itself will be improved. If you use in () instead of the associated query, you can let MySQL query in the order of ID, which may be more efficient than the random association, the final decomposition of the query can reduce the redundant records of queries, when the application layer to make the associated query, it means that for a record application only need to query, and in the database to do related queries, You may need to access a subset of the data repeatedly.
When you want MySQL to run queries with high performance, the best way is to figure out how MySQL optimizes and executes queries. As shown in MySQL when sending a request to MySQL specific operation process:
First, the server receives a client request, first check the query cache, if hit cache, then immediately return the data in the cache, or go to the next stage;
The server performs SQL parsing, preprocessing, and then generates the corresponding execution plan by the optimizer;
MySQL executes queries based on the execution plan generated by the optimizer, invoking the API of the storage engine;
Returns the result to the client.
The first step is the MySQL client/server communication, where the communication protocol is "half-duplex", meaning that only one party is sending data at a time. At any one time, the MySQL connection has a status that represents the current work of MySQL and queries the status through the show full Processlist command. There are sleep, Query, Locked, analyzing and statistics, coping to TMP table, sorting result, and sending data.
The second step is to search for the cache. Before parsing a query statement, if the query cache is open, MySQL will first check whether the query hits the data in the query cache. This is usually achieved through a case-sensitive hash lookup. If hit, MySQL checks the user right before returning the result, which does not require parsing the query SQL statement. If missing, the SQL statement is resolved.
The third step is query optimization processing. This includes parsing SQL, preprocessing, and optimizing the SQL execution plan, where any errors will terminate the query. First, MySQL parses the SQL statement with the keyword and generates a corresponding "parse tree". The query optimizer is responsible for transforming the parse tree into an execution plan, and the optimizer's role is to find a better execution plan for the query. MySQL uses the cost-based optimizer, which will try to predict the cost of a query using some kind of execution plan (SHOW STATUS like ' last_query_cost ') and select one of the least cost. The query optimizer is a very complex part that uses a number of optimization strategies to generate an optimal execution plan. Optimization strategies are divided into: static optimization and dynamic optimization. The static optimization can analyze the parse tree directly and complete the optimization. For example, the optimizer can convert a where condition into another equivalent form with a simple algebraic transformation, and static optimizations do not depend on special values, such as the constant in where. Static optimizations are effective after the first completion, and can be considered a "compile-time optimization", even if repeated execution with different parameters is not changed. Dynamic optimization is context-sensitive, such as the value in the Where condition, the number of data rows corresponding to the index entry, and so on, which is a "run-time optimization". The following are the types of optimizations that MySQL can handle:
-
Redefine the order of associated tables: Data table associations do not always follow the order specified in the query.
-
Converts an outer join into an inner join: The OUTER JOIN statement must not be executed in a way other than a connection. If the Where condition, the library table structure may make an outer join equivalent to an inner join;
-
uses an equivalent transformation: MySQL uses an equivalent transformation to standardize the expression. such as (A<b and B=c) and a=10 will be rewritten to a=10 and b>10 and b=c;
-
Optimize count (), Min (), Max ()
-
Overwrite index Scan: when the cable When the column in the citation contains the required columns, MySQL uses the index to return the required data without querying the corresponding row data;
-
Subquery optimization: Converting subqueries into a more efficient form, thereby reducing multiple queries accessing data more than once;
-
Early termination of queries: When using limit, it is possible for MySQL to terminate the query immediately when it finds that the query needs are met;
-
List in () comparison: MySQL in () is not equivalent to more than one or condition clauses, because MySQL first sorts the data in () and then finds the value in the list by binary lookup to determine if the condition is satisfied, the time complexity is O (Logn), The time complexity of multiple or queries is O (n).
When MySQL needs to sort the selected data, if the index cannot be used for sorting, then MySQL will sort in memory if the data is small, but if the data is large it needs to sort the disk, but MySQL unifies the process known as file ordering (filesort). If the amount of data that needs to be sorted is less than the "sort buffer", MySQL uses memory for a "quick sort" operation, and if the memory is not sorted enough, MySQL first blocks the data, then uses "quick sort" for each individual block and puts the results of each block in the disk. Then merge the various ordered blocks (merge). In the case of queries, MySQL will handle such a sort of file in two different cases, and if all the columns in the ORDER BY clause are from the first associated table, then MySQL will sort the file when the first table is associated with it. The extra field of the explain result of MySQL will have a "using Filesort". In other cases, MySQL will first put the results of the association into a temporary table, and then after all the associations are finished before the file sorting, at this time, the MySQL explain results of the extra field value is "Using temporary; Using Filesort ". If there is a limit in the query, the limit is also applied after the sort, so even if you return less data, the temporary table and the number of orders that need to be sorted are still very large (the MySQL5.6 limit clause has been improved here).
The fourth step is to query the execution engine. MySQL is progressively executed according to the instructions given by the execution plan, in which a large number of operations need to be done by invoking the interface implemented by the storage engine, the "Handler API". MySQL creates a handler instance for each table during the optimization phase, and the optimizer obtains information about the table based on the interface of those instances.
The final step is to return the results of the query to the client. MySQL returning the result set to the client is an incremental, stepwise process of return. Once the server finishes processing the last associated table and starts generating the first result, MySQL can begin to expect the client to return the results incrementally. This has two advantages: one is that the server side does not need to store too much results, and the second is that each row in the result set is sent with a packet that satisfies the MySQL client/server communication protocol and then transmitted through the TCP protocol, so that the client can obtain the returned results at the first time.
Optimizes the count () query. If a column is specified, the number of rows that are not NULL for the column is queried, and if COUNT (*) queries the Total row count.
Optimize the associated query to make sure that the on or using clause has an index on the column. Make sure that the group by and order by expressions involve only the columns in one table, so that MySQL can use the index to optimize the entire process.
Optimize group BY and distinct. MySQL uses the same approach to optimize these two types of queries, usually using the sequential nature of the indexes to optimize. However, if the index cannot be used, group by uses two strategies: using temporary tables or sorting files to do the grouping.
Optimize limit paging and use deferred correlation to optimize limit paging;
Optimize union queries. MySQL executes the union query by creating and populating a temporary table, so it is necessary to manually "push" the Where, limit, ORDER BY clauses "down" into the individual subqueries of the Union, unless you do need the server to de-duplicate rows, you must use UNION ALL, Without the all keyword, MySQL will add distinct to the temporary table to make a unique check of the data on the temporary table, which is very expensive.
With all the content in mind, creating high-performance applications takes into account schema, index, query statements, and query optimization issues. Understand how queries are executed and where the time is spent, improving for time-consuming query statements.
170727. mysql Query performance optimization