SQL Performance Optimization

Last Update:2016-08-18 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Index, Index!!! Jianjian index for frequently queried words!!

However, you cannot build indexes too much. Changes to table records, such as insert and delete, can cause indexes to reflow, increasing the burden on the database.

Optimize your goals 1. Reduce the number of IO

Io is always the most vulnerable to the database, which is determined by the responsibility of the database, most of the database operations more than 90% of the time is occupied by IO operations, reduce the number of IO is the first priority in SQL optimization, of course, is the most obvious optimization means. 2. Reduce CPU calculation

In addition to the IO bottleneck, the optimization of the CPU is what needs to be considered in SQL optimization. Order BY, group By,distinct ... Are all CPU-intensive (these operations are basically CPU-processed in-memory data comparison operations). When our IO optimization is done in a certain phase, reducing CPU computing becomes an important objective optimization method of SQL Optimization. Change the SQL execution plan

Once we have identified the optimization goals, we need to identify ways to achieve our goals. For SQL statements, there is only one way to achieve the above 2 goals, that is to change the SQL execution plan, let him try to "less detours", as far as possible through a variety of "shortcuts" to find the data we need to achieve "reduce the number of IO" and "reduce CPU computing" target analysis of complex SQL statements Explain

Show tables or show tables from database_name; Displays the names of all tables in the current database

show databases; Displays the names of all databases in MySQL

Show columns from table_name to database_name; or MySQL show columns from Database_name.table_name; Display column names in a table

Show grants for [email protected]; Displays the permissions of a user, displaying results similar to the grant command

Show index from TABLE_NAME; Index of the display table

Show status; Displays information about some system-specific resources, such as the number of threads that are running

Show variables; Display the name and value of a system variable
Show Processlist; Displays all the processes that are running in the system, that is, the query that is currently executing.

Show table status; Displays information about each table in the database that is currently being used or specified. Information includes the table type and the last updated time of the table

Show privileges; Show the different permissions supported by the server

Show CREATE DATABASE database_name; Displays whether the CREATE DATABASE statement can create the specified databases

Show CREATE TABLE table_name; Displays whether the CREATE DATABASE statement can create the specified databases

Show Engies; Displays the storage engine and the default engine that are available after installation.

Show InnoDB status; Displays the status of the InnoDB storage engine

Show logs; Displaying logs for the BDB storage engine

Show warnings; Displays errors, warnings, and notifications resulting from the last statement executed

Show errors; Show only errors resulting from the last execution statement about enum

dispute.

For fields with limited and fixed values, it is recommended that you use enum rather than varchar. But!! Other databases may not be supported, causing problems that are difficult to migrate. Turn on cache queries

For the exact same SQL, use the existing execution plan to skip parsing and generating the execution plan.

Scenario: There is a table that changes infrequently, and the server receives a large number of identical queries for that table. Query caching is not appropriate for tables that are frequently updated

Mysql determines whether to hit the cache is very simple, first will be cached results in the reference table, and then use the query statement, database name, the version of the client protocol and other factors to calculate a hash value, the hash value is associated with the results in the reference table. If the hash value calculated based on some related criteria can be associated with the data in the reference table when the query is executed, the query hit

The query must be exactly the same (byte-wise) to be considered the same. In addition, the same query string may be considered different for other reasons. Queries that use different databases, different protocol versions, or different default character sets are considered to be different queries and are cached separately.

The following SQL query cache is considered different:

Scenarios where caching mechanisms fail

If a query statement contains some uncertainties, such as containing the function current (), the query is not cached, and the uncertainty mainly includes the following conditions

· Some functions that reference the indeterminate return value

· Reference a custom Function (UDFs).

· Reference a custom variable.

· Refer to the table in the MySQL system database.

· Any one of the following methods:

SELECT ... In SHARE MODE

SELECT ... For UPDATE

SELECT ... Into OUTFILE ...

SELECT ... Into DumpFile ...

SELECT * FROM ... WHERE Autoincrement_col is NULL

· Use the temporary table.

· Do not use any tables.

· The user has column-level permissions for a table. Additional consumption

If query caching is used, additional resource consumption is generated during read and write operations, and consumption is mainly reflected in the following areas

· The query will check if the cache is hit, this is relatively small consumption

· If the query cache is not hit, MySQL will determine whether the query can be cached, and the system does not have a corresponding cache, the results will be written to the query cache

· If a table is changed, all buffered queries that use that table are no longer valid and are removed from the buffer. This includes queries that map to a changed table using the merge table. A table can be changed by many types of statements, such as INSERT, UPDATE, DELETE, TRUNCATE, ALTER table, drop table, or drop DATABASE.

For InnoDB, some features of things also limit the use of query caching. When the B table is modified in thing a, the modification of table B before the thing is submitted is not visible to other things. In order to ensure the correctness of the cached results, InnoDB takes steps to make all queries involving the B table non-cacheable before thing a commits. If a thing runs for a long time, it can seriously affect the hit rate of the query cache

Do not set the query cache space too large.

Because the query cache is protected by a global lock operation, if the query cache configuration memory is relatively large and contains a large number of query results, when the query cache fails, the long time to hold this global lock. Because the query cache's hit-detection operations and cache-fail detection also rely on this global lock, it can cause the system to zombie in situations where static tables are faster fixed-length type and variable-length type

CHAR (m) defines the length of the column as fixed, and the M value can be between 0~255, and when the char value is saved, a space is padded on their right to reach the specified length. When a char value is retrieved, the trailing space is removed. No case conversions are made during the storage or retrieval process. Char is convenient to store the fixed-length data, the index on the Char field is more efficient, such as the definition of char (10), then regardless of whether you store data reached 10 bytes, take up 10 bytes of space, insufficient to automatically fill with spaces.

varchar (m) defines the length of the column as a variable length string, and the M value can be between 0~65535, (the maximum effective length of varchar is determined by the maximum row size and the character set used. The overall maximum length is 65,532 bytes). The varchar value is saved with only the required number of characters, plus one byte to record the length (two bytes if the column declaration is longer than 255). The varchar value is not populated when it is saved. The trailing spaces on the value save and retrieve are still preserved, conforming to standard SQL. VARCHAR stores variable-length data, but the storage efficiency is no higher than char.

If the possible value of a field is not fixed length, we only know that it cannot exceed 10 characters, it is the most advantageous to define it as VARCHAR (10). The actual length of the varchar type is +1 of the actual length of its value. Space considerations, with varchar appropriate, from the efficiency of the appropriate, with Char, the key is to find a trade-off point according to the actual situation. varchar and text, BLOB types

The Varchar,blob and text types are variable-length types, and for their storage requirements depend on the actual length of the column values (denoted by L in the preceding table), rather than depending on the maximum possible size of the type.

The blob and text types require either three-or 4-byte to record the length of the column values, depending on the maximum possible length of the type. varchar requires a defined size, with a maximum limit of 65535 bytes, and the text is not required. If you assign a value that exceeds the maximum length of a column type to a BLOB or text column, the value is truncated to fit.

A blob is a large binary object that can hold a variable amount of data. 4 blob types Tinyblob, blobs, Mediumblob, and Longblob differ only in the maximum length that they can hold values.

BLOBs can store pictures, text does not, text can only store plain text files.

The only difference between blob and text types is that the sorting and comparison of BLOB values is case-sensitive, and the text value is case insensitive. In other words, a text is a case-insensitive blob.

Efficiency is basically char>varchar>text, but if you are using a InnoDB engine, we recommend using varchar instead of char

Char and varchar can have default values, text cannot specify default values for static tables and dynamic tables

Static table field length fixed, auto-fill, read and write fast, easy to cache and repair, but compared to the hard disk, dynamic table is the field length is not fixed, save hard disk, but more complex, easy to produce debris, slow, the problem is not easy to rebuild. When only one piece of data is needed, use limit 1

A row in the table record should try not to exceed one IO unit

Differentiate in and exist

SELECT * FROM table a where ID in (select id from table B)
This sentence is equivalent to
SELECT * FROM Table A where exists (SELECT * from table B where table b.id= table a.ID)
For each piece of data in table A, the existence of the SELECT * from table B where table b.id= table a.id is executed, and if Table B has the same ID as the current row in table A, the exists is true, the row is displayed, otherwise it is not displayed

Distinguishing in and exists is mainly caused by the change of the driving sequence (which is the key to the performance change), if it is exists, then the other layer table is the driver table, first is accessed, if it is in, then the subquery is executed first .

So in the case of large appearance and small inner table, the exists is suitable for small appearance and large -scale case complex multi-table as little as possible join

The advantage of MySQL is simplicity, but it's also a disadvantage in some ways. The MySQL optimizer is efficient, but because of its limited amount of statistical information, the optimizer is more likely to deviate from the work process. For a complex multi-table Join, on the one hand due to its optimizer constraints, and also in the Join this aspect of the effort is not enough, so performance from the Oracle and other relational database predecessors still have a certain distance. But if it is a simple single-table query, this gap will be very small even in some scenarios to better than these database predecessors. Try to use join instead of subquery

While Join performance is poor, there is a significant performance advantage over MySQL subqueries.

MySQL needs to establish a temporary table for the query results of the inner query statement. The outer query statement then queries the records in the staging table. After the query is complete, MySQL needs to pin these temporary tables. So you can use connection queries in MySQL instead of subqueries. Connection queries do not need to establish temporary tables, which are faster than subqueries. Try to be less ordered

Sorting operations consume more CPU resources, so reducing the ordering can significantly affect SQL response time in scenarios where the cache hit ratio is high enough for the IO capability.

For MySQL, there are several ways to reduce sorting, such as:

The above myths mentioned in the way of sorting by using the index to optimize

Reduce the number of record entries in the sort

No need to sort the data avoid select *

Most relational databases are stored as rows (row), and data access operations are based on a fixed-size IO unit (called block or page), typically 4kb,8kb ... Most of the time, multiple rows are stored in each IO unit, and each row is all the fields that store the row (except for special types of fields such as lobs).

So, whether we are taking a field or multiple fields, the amount of data that the database needs to access in the table is actually the same.

There are exceptions, and that is our query can be done in the index, that is, when only a A, a, a two field, do not need to return to the table, and C This field is not used in the index, the need to return to the table to obtain its data. In such cases, the IO volume of the two will be significantly different. Try to be less or

When there are multiple conditions in the WHERE clause to "or" coexist, the MySQL optimizer does not have a good solution to its execution plan optimization problem, coupled with MySQL-specific SQL and Storage layered architecture, resulting in poor performance, often using union ALL or U Nion (when necessary) in lieu of "or" will have a better effect. Try to use UNION all instead of union

The difference between Union and union all is that the former needs to merge two (or more) result sets and then perform a unique filtering operation, which involves sorting, adding a lot of CPU operations, and increasing resource consumption and latency. So when we can confirm that it is not possible to duplicate a result set or do not care about repeating the result set, try to use union all instead of union. Filter as early as possible

This principle can also be used in SQL authoring to optimize some of the Join's SQL. For example, when we are querying multiple tables for paging data, we'd better be able to filter the good data on a single table, and then join with another table with the result set of the page, so as much as possible to reduce unnecessary IO operations, greatly saving the time spent in IO operations. Avoid type conversions

The "type conversion" here refers to the type conversion that occurs when the type of the column field in the WHERE clause is inconsistent with the passed parameter type:

The conversion of a human being on a column_name directly results in MySQL (which in fact has the same problem with other databases) The index cannot be used, and if it is not to be converted, it should be converted on the incoming parameters and converted by the database itself.

If our incoming data types and field types are inconsistent, and we do not have any type conversion processing, MySQL may either make a type conversion operation on our data or leave it to the storage engine to process it, which will cause the execution plan problem if the index is not available. Prioritize high-concurrency SQL rather than low-frequency execution some "big" sql

For the destructive, high concurrency SQL will always be larger than the low frequency, because the high concurrency of SQL once there is a problem, not even give us any respite to the system will be compressed. And for some, although the need to consume a lot of IO and slow response to SQL, because of the low frequency, even if encountered, the most is to let the whole system to respond slowly, but at least for a while, let us have the opportunity to buffer. Optimize from a global perspective, rather than one-sided adjustment

In particular, it is not forgotten how, pound foolish, to optimize the SQL execution plan by adjusting the index. Explain every SQL that runs in the database whenever possible

Knowing the SQL execution plan can determine if there is an optimization to determine if there is an execution plan problem. After a period of optimization of the SQL running in the database, it is obvious that SQL may have been scarce, and most of them need to be explored, when a large number of explain operations are needed to collect the execution plan and determine whether optimization is needed. Try to avoid null values for fields in the WHERE clause

Causes the engine to discard the index and then perform a full table scan.

Try not to leave null values for the database, and use NOT NULL to populate the database as much as possible. You can set a real content representation for each null-type field and a null counterpart. Avoid using! = in the Where, the < operator

Otherwise the engine discards the use of the index and makes a full table scan. Common query Word Jianjian index avoid using or in where

In and not keywords are used with caution, easy to cause full table sweep face

It is also easy to use the between wildcard to query for consecutive values to avoid using local variables in a WHERE clause

SQL resolves local variables only at run time. The optimizer must access the execution plan at compile time without knowing the value of the variable, so it cannot be an input to the index.

Avoid expression operations on fields in the WHERE clause

Causes the engine to discard the use index

Avoid functional manipulation of fields in the WHERE clause

Do not perform functions, arithmetic operations, or other expression operations on the left side of the WHERE clause ' = '

The system may not use the index correctly to avoid update all fields

Only the required fields are update. Frequent calls can cause significant performance costs, along with a large number of logs. Not as many indexes as possible.

The number of indexes on a table should not be more than 6. Use numeric fields instead of character types as much as possible

Because each character of the string is compared one at a time when the query and the connection are processed, it is sufficient for the numeric type to be compared only once. Use Varchar/nvarchar instead of Char/nchar as much as possible

Variable-length fields have a small storage space, and in a relatively small field search is more efficient for queries ...? Avoid frequent creation and deletion of temporary tables, reducing system table resource consumption select into and CREATE table

When creating a new temporary table, if you insert a large amount of data at one time, use SELECT INTO instead of CREATE table to avoid causing a lot of log to increase speed.

If the amount of data is small, in order to mitigate the resources of the system table, create table first, then insert.

Splitting large Delete and INSERT statements

Because these two operations lock the table, for high-traffic sites, the number of accesses accumulated in the lock table time, the database connection, the number of open files, and so on, may not only cause the Web service to crash, but also cause the entire server to hang up immediately.

So, be sure to split, use the limit condition to sleep for a period of time, batch processing.

Resources

http://blog.csdn.net/eric_sunah/article/details/17510939

http://ju.outofmemory.cn/entry/89569

SQL Performance Optimization

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

SQL Performance Optimization

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

SQL Performance Optimization

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support