MySQL optimization (ii): SQL optimization

Source: Internet
Author: User
Tags bulk insert create index rand sorts server memory

First, SQL optimization

1. Optimizing SQL General Steps

1.1 Viewing SQL Execution frequency

SHOW STATUS like ' com_% ';

Com_select: The number of times the select operation was performed, with one query accumulating 1. Other similar

The following parameters are only for the InnoDB storage engine, and the additive algorithm is slightly different

Innodb_rows_read:select number of rows inserted by the query operation

innodb_rows_inserted/updated/deleted: Number of rows performing insert/update/delete operations

Through the above parameters, you can know whether the current database application is the main query or write data.

For transactional-type applications. With Com_commit and Com_rollback, you can understand transaction commits and rollbacks, and for a database that is very frequent for rollback operations, it may mean that there is a problem with application authoring.

Basic information:

Connections: Number of attempts to connect to the MySQL server.

Uptime: Server Working time

Slow_queries: Number of slow queries

1.2 Locating SQL statements with less efficient execution

-When you start with the slow query log to locate slow SQL, with the--log-slow-queries[=file_name] option, Mysqld writes a log file for all SQL statements that take more than long_query_time seconds to execute.

-Use show full processlist; View current MySQL threads in progress while optimizing some of the lock table operations.

1.3 Parsing slow SQL with explain

Syntax: EXPLAIN SQL statements

Results:

-Select_type: Represents the type of select, and the common values are simple (i.e. not using table joins or subqueries), PRIMARY (main query, or outer query), union (the second or subsequent query in union), Subquery (the first select in a subquery), and so on.

-table: The name of the output result

-Type: Indicates how MySQL finds the desired row in the table, or is called an access type

Common are: All index range ref eq_ref Const,system NULL, from left to right, performance from worst to best.

Type=all: Full table scan.

Type=index: Index full scan, MySQL traversal entire index to query.

Type=range: Index range Scan, common in <, <=, >, >=, between.

Type=ref: A prefix scan that uses a non-unique index scan or a unique index to return records that match a single value.

Type=eq_ref: Similar to ref, the difference is that the index used is a unique index, and for each index key value, there is only one record match in the table, which simply means that primary key or unique index is used as the association condition in a multi-table connection.

Type=const/system: There is a maximum of one matching row in a single table, the query is very fast, the general primary key primary KEY or unique index query, the type is const when accessed by a unique index Uk_email , and when retrieved from a table of only one record we constructed, the type is System.

Type=null:mysql can get results directly without accessing the table or index.

Type types also have other values, such as ref_or_null (similar to ref, except that the condition contains a null query), index_merge (index merge optimization), Unique_subquery (in which is followed by a subquery that queries the primary key field), Index_subquery (similar to Unique_subquery, except that in is followed by subqueries that query for non-unique indexed fields)

-Possible_keys: Represents the index that may be used when querying.

-Key: Represents the actual index used.

-Key_len: The length of the field used to index.

-Rows: Number of scan lines

-Extra: Description and description of the execution, including additional information that is not appropriate for display in other columns but is important for the execution plan.

Using where: Indicates that the optimizer, in addition to using the index to speed up access, also needs to query the data against the index back table.

1.4 Analyzing SQL by Show profile

Check whether the current MySQL support profile

The default profiling is off, and the Profiling:set profiling=1 can be opened at the session level through the SET statement;

How to use:

-Perform statistical queries:

-Find the query ID for the SQL above:

-Find the status and consumption time of each thread during the above SQL execution:

The sending data status means that the MySQL thread begins to access the data in parallel and returns the results to the client, not just to the client. Because MySQL threads often need to do a lot of disk reads in the sending data state, they are often the longest in the entire query.

-View detailed information and sort:

SELECTState ,SUM(DURATION) asTR,ROUND(         - * SUM(DURATION)/ (            SELECT                SUM(DURATION) frominformation_schema. PROFILINGWHEREquery_id= 3        ),        2    )  asPR,COUNT(*) asCalls,SUM(DURATION)/ COUNT(*) as"R/Call " frominformation_schema. PROFILINGWHEREquery_id= 3GROUP  by StateORDER  byTRDESC;

Further get all, CPU, block IO, context switch, page faults and other detail types to see how much time MySQL spends on what resources, for example, choosing to view the CPU time spent.

The sending data time is now consumed primarily on the CPU

Tip: The InnoDB engine count (*) does not have MyISAM execution speed, because the InnoDB engine goes through the sending data state, there is the process of accessing it, and the MyISAM engine's table directly ends the query after executing. No access to data at all.

2. Indexing issues

Indexing is one of the most common and important means of database optimization, and indexing often helps users to solve most SQL performance problems.

2.1 Classification of storage engines

-B-tree index: most common index types, most engines support B-tree indexes.

-Hash index: Only memory engine is supported.

-R-tree Index: A spatial index is a special type of index for MyISAM, primarily for geospatial data types.

-Full-text: Full-text indexing is a special index type for MyISAM, primarily for full-text indexing, and InnoDB is supported from the MySQL5.6 version.

MySQL does not currently support function indexing, but it can index a portion of a column before it, such as the title Title field, which can be indexed by only the first 10 characters of the title, but cannot be used when sorting order by and grouping by operations. Prefix index Creation Example: CREATE INDEX idx_title on film (title (10)).

The commonly used indexes are b-tree and hash. Hash only memory/heap engine support. For Key-value queries, the hash is faster than b-tree. Hash indexes do not use range queries. The MEMORY/HEAP engine uses the index only if it is in the = condition.

2.2 MySQL How to use the index

Create a composite index: ALTER TABLE rental ADD index idx_rental_date (rental_date, inventory_id, customer_id);

2.2.1 Typical scenarios where indexes can be used in MySQL

-Match all values to specify a specific value for all columns in the index, that is, to have an equal match condition for all columns in the index.

such as the idx_rental_date created above, including Rental_date, inventory_id, customer_id, if the WHERE clause contains three, that is, the full value match.

The field key is Idx_rental_date, which indicates that the optimizer is using index idx_rental_date for scanning.

-The range lookup that matches the value of the query, which can be scoped to the value of the index.

Type is the range description Optimizer Select Scope query, index key is the idx_fk_customer_id description optimizer selects the index idx_fk_customer_id to speed up access.

-Match the leftmost prefix, meaning that in a composite index, the index is searched from the left first, not across the first one from the second lookup, such as a federated index containing (c1, C2, C3) three fields, but cannot be used by C2 or C2+C3 equivalent queries.

Add index: ALTER TABLE Payment Add index idx_payment_date (payment_date, amount, last_update); At this point the first field is Payment_date

If the query criteria contains the first column of the index payment date, you can use the composite index idx_payment_date to filter.

Like what:

If you are using a second payment amount that does not contain the first one, the index is not used.

For example: (when key is empty)

-simply querying the index means that queries are more efficient when the data in the query is in the indexed field.

For example, query last_update at this time and last_update field is included in the index field

Then direct access to the index can obtain the required data, do not need to be indexed back to the table, at this time the extra also become using Index,using index refers to the overwrite index scan.

Query Result:

-Match the column prefix, using only the first column in the index, and only the first part of the index column to find it.

For example, the Find header title is a movie message that starts with African.

The index is created first: CREATE INDEX Idx_title_desc_part on Film_text (title (Ten), Description (20));

The query can see that Idx_title_desc_part is used, and the using where means that the optimizer needs to query the data by index back to the table:

-The matching section is accurate and the other part ranges match.

Specified date, different customer number

Type is the range description of the optimizer select Scope query, index key for idx_rental_date description Optimizer Select index idx_rental_date help speed up the query, while the queried field is in the index, index extra can see the using index.

-the column name is null, in which case the index is used.

For example:

2.2.2 Typical scenarios where an index exists but cannot be used

-A like query starting with% cannot take advantage of the B-tree index.

As follows:

B-tree index structure, queries that begin with% cannot take advantage of indexes, and generally use full-text indexing (fulltext) to solve similar problems. Or, using a Level two index on the InnoDB table, first get the ID of the list that satisfies the criteria, and then retrieve the records based on the primary key back to the table.

-Implicit conversion of data types does not use indexes, some column types are strings, and constant values need to be enclosed in quotation marks when writing where conditions.

-composite index, the query condition needs to include the leftmost part, otherwise the composite index will not be used. namely leftmost

-MySQL executes the statement with the optimizer selection process, and when the cost of a full table scan is less than the cost of the index, a full table scan is used, so a higher filter condition needs to be replaced at this time.

-conditions separated by or, the index is not used if the column before or is indexed and no index is followed.

2.3 Viewing the usage of indexes

If the index is working, the value of Handler_read_key will be high, which represents the number of times a row is read by the index value, and if it is low, the performance improvement of the index is not high because the index is not used frequently.

A high value of handler_read_rnd_next means that the query runs inefficiently and that an index remediation should be established. The meaning of this value is the number of requests to read the next line in the data file. If the value is large, indicating that a large number of table scans are in progress, it is common to note that the table index is incorrect or the query being written does not take advantage of the index.

3. Common SQL Optimization

3.1 Bulk Insert data (load)

-MyISAM

-Turn on or off updates for non-unique indexes on MYISAM tables to improve import efficiency (import data to non-empty MyISAM tables).

Step: ALTER TABLE tab_name DISABLE KEYS; Import data; ALTER TABLE tab_name ENABLE KEYS;

Import the data into an empty MyISAM table, the default is to import the data before creating the index, so do not set.

-InnoDB

-Because tables of the InnoDB type are saved in the primary key order, the data that is imported is listed in the order of the primary key, which effectively improves the efficiency of the imported data.

-Turn off uniqueness check, SET unique_checks = 0, turn on after import is complete.

-If you are using autocommit mode, use Set autocommit = 0 before importing, and restore after import is complete.

3.2 Optimizing INSERT Statements

-INSERT statement that the same client inserts many rows and should try to use multiple values. For example: INSERT into Tab_name values (), (), () ...

-Different clients insert many rows, and you can use Insert delayed,delayed meaning that the INSERT statement is placed in the memory queue and is not written to disk. Low_priority is inserted after all other users have read and write to the table.

-Place the index file and the data file on separate disks.

-MyISAM If BULK insert is made, increase the value of bulk_insert_buffer_size.

-When loading a table from a file, use the load DATA INFILE, which is 20 times times faster than the INSERT statement.

3.3 Optimizing the ORDER BY statement

3.3.1 MySQL sort mode

-Direct return of ordered data through sequential index sequence scans.

There is an index idx_fk_store_id on the table customer, pointing to the field store_id

At this point when order by using the store_id sort, extra is the using index, which does not require additional sorting and is more efficient to operate.

-Sorting by Filesort, all sorts that are not directly returned by index are called Filesort sorts. The MySQL server sets the sort parameters and the size of the sort data to determine whether the sort operation uses disk files or temporary tables.

Filesort is the algorithm to sort the obtained data in the memory sort area set by the Sort_buffer_size system variable, and if the memory does not fit, the data on the disk is chunked, the individual blocks are sorted, and then merged. The sort area for sort_buffer_size is thread exclusive, and there may be multiple at the same time.

For example, when all customer records are sorted by store_id, the full table is scanned at this time, and Filesort is used.

General optimization Method: Reduce the additional sorting and return the ordered data directly through the index. Try to make the Where condition and order by use the same index, and the order by is the same as the index data, and the order by field is ascending or descending, otherwise the filesort will certainly appear.

-Do not use the index condition:

-Order BY Field mix ASC and Desc:select * from Tab_name ORDER by Key_part1 DESC, key_part2 ASC;

-The keywords used for querying are not the same as those used in order by: the SELECT * from Tab_name WHERE key2=constant the order by KEY1;

-Use Order By:select * from Tab_name ORDER by KEY1, KEY2 for different keywords;

3.3.2 Optimization Filesort

Filesort has two sorting algorithms:

-Two-scan algorithm: the sort field and the row pointer information are first removed according to the criteria, and then sorted in sort buffer in the sorting area. If the sort area sort buffer is not sufficient, the sort results are stored in the Temp table temporary table, and the records are read back to the table based on the row pointer when sorting is complete. Requires two accesses to the data, the first time to get the sort field and the row pointer information, the second to get records based on the row pointer, the second read operation may result in a large number of random I/O operations, the advantage is that the memory overhead is low when sorting.

-One-time scanning algorithm: Once all the fields of the row that satisfy the condition are fetched, then output the result set directly after sorting in sort buffer in the sorting area, the memory overhead is larger when sorting, but the sorting efficiency is higher than two scans.

MySQL determines which algorithm to use by comparing the size of the system variable Max_length_for_sort_data with the total size of the field taken out by the query statement. Max_length_for_sort_data large Use the second algorithm, otherwise the first kind.

Appropriately increasing the value of the system variable Max_length_for_sort_data allows MySQL to choose a more optimized filesort sorting algorithm. However, excessive CPU utilization and high disk I/O are caused by the Convention.

Appropriately increase the sort_buffer_size sorting area, try to make the sort in memory, rather than by creating temporary tables placed in the file, the size of the database to consider the number of active connections and server memory size to properly set the sorting area. Because this parameter is exclusive to each thread, if the setting is too large, it causes the server to swap heavily. Try to use only the necessary fields instead of SELECT *.

3.3.3 Optimization GROUP BY

By default, MySQL sorts all group by fields, and if the query includes group by but the user wants to avoid the consumption of the sort results, you can specify order by NULL to prohibit sorting.

SELECT xxx from xxx GROUP by XXX ORDER by NULL

3.3.4 Optimizing nested queries

You can use subqueries to do many SQL operations that logically require multiple steps to complete, and also to avoid transaction or table locking. A subquery can be replaced by a more efficient connection join.

3.3.5 optimization or condition

For query clauses that contain or, if you want to take advantage of an index, each condition column between or is required to be indexed.

3.3.6 Optimized Paging query

In general paging queries, creating an overlay index can improve performance better, but when paging is 1000 20 o'clock, the first 1020 records are sorted and 1001 to 1020 records are returned, the first 1000 records are discarded, and the cost of querying and sorting is very high.

-The first optimization idea: the operation of sorting pagination from the index, and finally the other column content required to query the original table based on the primary key.

For example: Film a page after sorting the movie table according to title titles

-Direct Enquiry

Overwrite SQL by index paging back table

-Second optimization idea: The query that transforms the limit query into a location.

If you need to query page 100th, you can record the ID of the last line on page 99 (reverse or positive) and then query again using where to take 99 and the last row ID to be greater than or less than, then use limit n directly. n is the number of rows displayed per page.

For example, 10 rows per page to query the 100th page of data, you can use the steps:

First, query to the ID of the last line on line 99th:

Take 10 rows, which is the 100th page, by retrieving the ID that is less than its value:

Compared to direct query results:

Explain comparison:

3.3.7 Using SQL hints

-Use INDEX

Prompts MySQL reference to use the index, can let MySQL no longer consider other available indexes.

For example: SELECT COUNT (1) from Tab_name Use index (index_name) where xxx; At this point the query will use index_name, and ignore the others.

-IGNORE INDEX

Prompts MySQL to ignore one or the index.

For example: SELECT COUNT (1) from Tab_name Ignore index (index_name); At this point the query ignores the index_name index.

-Force INDEX

Force MySQL to use an index, usage: When the WHERE clause takes a value of id>1, because most of the library tables in the database are larger than 1, they are scanned in full, and use index is not available at this time, so use the Force index.

For example: SELECT * FROM Tab_name Force index (INDEX_NAME) where ID > 1;

4. Common SQL Tips

4.1 Use of regular expressions

-^ match at the beginning of the string

Whether the match has a start

-$ match at the end of the string.

-  . Matches any single character, including line breaks.

-[...] matches any character enclosed in parentheses.

-[^ ...] does not match any of the characters in []

Real Examples:

Use the like format as follows: Select First_Name, e-mail from customer WHERE email as "%@163.com" or email as "%@163,com";

4.2 Using Rand () to extract random rows

Randomly extracting n data: SELECT * from Tab_name ORDER by RAND () LIMIT N;

4.3 GROUP by with ROLLUP

With rollup, more packet aggregation information can be retrieved.

For example, to check the amount of the daily payment of staff statistics. Do not use with rollup as follows:

Join with rollup as follows:

With rollup reflects a kind of OLAP thought, can satisfy the user wants to obtain any one grouping as well as the aggregation information value of grouping combination. In the previous example, with rollup the user to count the total amount of the day and all the total amount. Note: Rollup cannot be used with order by, and the limit is behind rollup.

4.4 Database name, table name capitalization problem

Because Windows, Mac OS, and Unix are inconsistent with the case sensitivity used by the library table name query, it is best to save the library table as canonical and the query statement to be used.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.