Sword refers to architect series-mysql tuning

Source: Internet
Author: User
Tags joins mysql query query meaning install perl

This paper introduces the tuning method of MySQL, including slow log query analysis and explain query analysis SQL execution plan

1. mysql optimization

1, slow log query analysis

You first need to make some settings for the slow log, as follows:

SHOW VARIABLES like ' slow_query_log ';        --Check if slow query set GLOBAL slow_query_log_file= '/var/lib/mysql/mysql-slow.log ' is turned on;  --Set the location of the slow query log set GLOBAL Log_queries_not_using_indexes=on; --whether to log queries that do not use indexes set GLOBAL long_query_time=1;                --Set the SQL statement to record more than a long time set GLOBAL slow_query_log=on;                --Set whether the slow query log is turned on

Then I new T_report_app, t_application and T_developer table, T_report_app, there are more than 20,000 data, the following query, the total time of 1.019sec, more than 1 seconds will be recorded in the slow log.

SELECT * from T_report_app  R, t_application app,t_developer dev WHERE r.application_id = app.id and app.developer_id = Dev.id

To view the Mysql-slow.log output log information, if there are many historical information, you can first make the following command empty:

echo "" > Mysql-slow.log

After re-running, query, the main information is as follows:

# [email protected]: Root[root] @  [192.168.0.190]  Id:             --Execution of user root and host host# query_time:22.834043  lock_time:0.000458 rows_sent:20724  rows_examined:277  SET timestamp=1469083853; SELECT * from T_report_app  R, t_application app,t_developer Dev  --Execution of relevant content where r.application_id = App.id and app. developer_id = dev.id;

We can also take advantage of MySQL's own slow query analysis tool mysqldumpslow , which can be mysqldumpslow -h used to see how to use it.

Mysqldumpslow-t 3 Mysql-slow.log | More

Executing the statement above will output the most time-consuming 3 SQL statements. The output format is as follows:

Count:1 (number of executions) time=0.00s (0s) (execution time) lock=0.00s (0s) (lockout time) rows=2.0 (2), root[root]@[192.168.0.190] (Perform the following on the IP client as root) SELECT query_id, SUM (DURATION) as Sum_duration, SEQ from INFORMATION_SCHEMA. PROFILING GROUP by query_id LIMIT N, n (executed SQL content)

You can also use pt-query-digest for analysis, but first install the tool, as follows:

Yum Install Perl-dbd-mysqlperl makefile.plmakemake testmake Install

After the installation is complete, you can view the analytic SQL statement using the following command.

CD  /var/lib/mysql         --Switch to the folder where the slow log is stored echo "" >mysql-slow.log    --empty previous record pt-query-digest Mysql-slow.log | More

The output is as follows:

# 250ms User Time, 60ms system time, 25.30M RSS, 219.89M vsz# current Date:thu Jul 17:12:52 2016# hostname:localhost. localdomain# files:mysql-slow.log# overall:2 Total, 2 unique, 0 QPS, 0x concurrency ______________________# time range: All events occurred @ 2016-07-21 17:12:47# Attribute Total min max avg 95% StdDev median# = =      ======== ======= ======= ======= ======= ======= ======= =======# Exec time 1s 1ms 1s 611ms      1s 862ms 611ms# Lock time 12ms 357us 12ms 6ms 12ms 8ms 6ms# Rows sent 1016 508 695.79 508# Rows examine 384 107 277 192 277 120.21 192# Query Size 317 142 175 158.50 175 23.33 158.50# profile# Rank Query ID Response time C  Alls r/call v/m item# = ================== ============= ===== ====== ===== ===============# 1 0x67E16746140E091A 1.2207 99.9% 1 1.220.00 SELECT t_report_app# MISC 0xMISC 0.0012 0.1% 1 0.0012 0.0 <1 items># Query 1:0 QPS, 0x concurrency, ID 0x67e16746140e091a at byte 0 ________# this item was included in the report because it matches--limit.# S cores:v/m = 0.00# Time Range:all events occurred at 2016-07-21 17:12:47# Attribute pct total min max av       G 95% StdDev median# ============ = = ======= ======= ======= ======= ======= ======= =======# Count 50    # Exec time 1s 1s 1s 1s 1s 0 1s# Lock time 12ms 12ms 12ms     12ms 12ms 0 12ms# rows sent 98 0 1000# rows Examine 72     277 277 277 277 277 0 277# Query size 44 142 142 142 142 142 0 142# string:# Hosts 192.168.0.190# Users root# query_time distribution# 1us# 10us# 100us# 1ms# 10m s# 100ms# 1s ################################################################# 10s+# tables# SHOW TABLE STATUS like ' T_report_app ' \G# S How to CREATE TABLE ' T_report_app ' \g# EXPLAIN/*!50100 partitions*/select * from T_report_app R, T_application app,t_develop Er dev WHERE r.application_id = app.id and app.developer_id = dev.id LIMIT 0, 1000\g

After adding the normal index for developer_id and application_id, empty the Mysql-slow.log and rerun SQL with the following results:

# 210ms User Time, 130ms system time, 25.30M RSS, 219.89M vsz# current Date:thu Jul 17:43:08 2016# hostname:localhost  . localdomain# files:mysql-slow.log# overall:2 Total, 2 unique, 0 QPS, 0x concurrency ______________________# time range: All events occurred @ 2016-07-21 17:43:01# Attribute Total min max avg 95% StdDev median# = = =    ========= ======= ======= ======= ======= ======= ======= =======# Exec time 36ms 1ms 35ms 18ms      35MS 24ms 18ms# Lock time 305us 125us 180us 152us 180us 38us 152us# Rows sent 1016  508 695.79 508# Rows examine 1.17k 1.08k 599 1.08k 719.83 599# Query Size 318 142 176 159 176 24.04 159# profile# Rank Query ID Response time   Calls r/call v/m item# = ================== ============= ===== ====== ===== ===============# 1 0x67E16746140E091A 0.0350 97.2% 1 0.00.00 SELECT t_report_app# MISC 0xMISC 0.0010 2.8% 1 0.0010 0.0 <1 items># Query 1:0 QPS, 0 x concurrency, ID 0x67e16746140e091a at byte 0 ________# this item was included in the report because it matches--limit.# scores:v/m = 0.00# Time Range:all events occurred at 2016-07-21 17:43:01# Attribute pct Total min max a       VG 95% stddev median# ============ = = ======= ======= ======= ======= ======= ======= =======# Count 50   # Exec Time 35ms 35ms 35ms 35ms 35ms 0 35ms# Lock time + 180us 180us 180us    180US 180us 0 180us# rows sent 98 0 1000# rows Examine 92      1.08k 1.08k 1.08k 1.08k 1.08k 0 1.08k# Query size 44 142 142 142 142 142 0 142# string:# Hosts 192.168.0.190# Users root# query_time distribution# 1us# 10us# 100us# 1ms# 10 Ms ################################################################# 100ms# 1s# 10s+# tables# SHOW TABLE STATUS like ' T_report_app ' \G# SHOW CREATE TABLE ' T_report_app ' \g# EXPLAIN/*!50100 partitions*/select * from T_report_app R, T_application App,t_develo Per dev WHERE r.application_id = app.id and app.developer_id = dev.id LIMIT 0, 1000\g

2, explain query and analysis of SQL execution plan

Use the explain query to parse the execution plan for SQL, as follows:

EXPLAIN SELECT * from T_report_app  R, t_application app,t_developer dev WHERE r.application_id = app.id and App.develo per_id = Dev.id

The following results are output:

Added the normal index for developer_id and application_id, with the following results:

The explanations are as follows:

Column Name Describe
Table Shows which table the query is about
Type A very important column that shows what type of connection is used. The best to worst connection types are const, EQ_REG, ref, range, index, and all
Possible_keys Displays the indexes that may be applied in this table. If empty, there is no possible index to apply
Key The index that is actually used. If NULL, the index is not used
Key_len The length of the index to use. The shorter the length the better, without loss of accuracy
Ref Shows which column of the index is being used
Rows The number of rows that MySQL considers must be checked to return the request
Extra When the value of this column is using Filesort (additional sorting is required) or using temporary (requires staging table for processing), the query needs to be optimized

The Using Filesort:mysql requires additional steps to discover how to sort the rows that are returned. It sorts all rows based on the connection type and the row pointers for all rows that store the sort key values and matching criteria.

The Using Temporary:mysql needs to create a temporary table to store the results, which typically occurs on an order by on a different set of columns, rather than on a group by

About the meaning of the Type field:

Create a new table with the table named Tb_c, where the ID is a clustered index, and B and C establish a federated index.

(1) All: Full table scan

Only full table scan when index is not used

(2) Const: read constant, and at most only one record match, because is a constant, so actually only need to read once;

Select B,c from Tb_c where b=2

There is also a const special case where the table has and only one row satisfies the condition, as follows:

SELECT * FROM (select B,c from Tb_c where b=2) a

(3) Eq_ref: There will be a maximum of one matching result, usually through the primary key or unique key index to access;

(4) Index: Full index scan;

The join type is the same as all except that only the index tree is scanned. This is usually faster than all, because the index file is usually smaller than the data file. When a query uses only columns that are part of a single index, MySQL can use the join type


(5) Rang: Index range scan. This connection type uses the index to return rows in a range, such as what happens when you use > or < to find something

SELECT * FROM Tb_c where b>2

  

(6) The reference query of the driver table index in the Ref:join statement; This connection type occurs only if the query uses a key that is not a unique or primary key or is part of those types (for example, the leftmost prefix is used). For each row union of the previous table, all records are read from the table. This type is heavily dependent on how many records are matched against the index-the less the better

Each property can be referenced by:

(1) http://blog.csdn.net/xifeijian/article/details/19773795

(2) http://www.cnitblog.com/aliyiyi08/archive/2016/04/21/48878.html

2, mysql commonly used SQL optimization

1. Return only the data you need

The return data to the client needs at least the database extract data, network transmission data, the client receives the data as well as the client processing data and so on. If you return unwanted data, you increase the invalid labor of the server, the network, and the client, and the harm is obvious, avoiding the need to be aware of such events:

A, in the horizontal view, do not write the SELECT * statement, but choose the field you need

B, Portrait, write the WHERE clause, do not write a SQL statement without where

C, for aggregate queries, you can further qualify the returned rows with the HAVING clause

2, try to do less repetitive work

This point focuses on the client program, which needs to be noted as follows:

A, control the execution of the same statement multiple times, especially the number of basic data execution is a lot of programmers very little attention

B, reduce the number of data conversion, may require data conversion is a design problem, but the reduction is the programmer can do

C, to eliminate unnecessary sub-queries and connection tables, sub-query in the execution plan is generally interpreted as an external connection, redundant connection table brings additional overhead??

D, update operation do not split into (Delete+insert) the form of operation, although the function is the same, but the performance difference is very large.

3, the use of sub-query

Subqueries can make our programming flexible and can be used to implement some special functions. But in performance, often an inappropriate subquery usage can create a performance bottleneck.

If a subquery's condition uses a field of its outer table, this seed query is called a correlated subquery. Correlated subqueries can be introduced with in, not in, EXISTS, not EXISTS.

A, not in, not EXISTS related subqueries can use a LEFT join instead of a notation

Select Pub_name from publishers where pub_id not in (select pub_id from titles WHERE TYPE = ' bubiness ') SELECT a.pub_name FR OM Publishers a left JOIN the titles B on b.type = "Business" and a.pub_id = b.pub_id WHERE b.pub_id is NULL

Select Titile from the titles where not EXISTS (SELECT title_id from sales WHERE title_id = titles.title_id) Select title from Titles left joins sales on sales.title_id = titles.title_id WHERE sales.title_id is NULL

B, if guaranteed subqueries are not duplicated, in, exists related subqueries can be replaced with inner JOIN. If there is repetition, then the inner connection will go heavy, so it is not equivalent

Select Pub_name from publishers where pub_id in  (select pub_id from titles WHERE TYPE = ' bubiness ') select DISTINCT a.pu B_name from Publishers a INNER joins titles B on b.type = ' business ' and a.pub_id = b.pub_id

The correlation subquery of C, in is replaced by exists

Select Pub_name from Publishers where pub_id in (select pub_id from titles WHERE TYPE = "Business") SELECT Pub_name from Pu Blishers where EXISTS (SELECT 1 from the titles WHERE TYPE = ' business ' and pub_id = publishers.pub_id)

As the statement above is completely correct, note that red is 1, not L (l)

D, do not use COUNT (*) subquery to determine whether there is a record, preferably with a left join or exists

Select ID from the jobs  where (SELECT COUNT (*) from the employee where employee.id = jobs.id) <>0select ID from Jobs  where EXISTS (SELECT 1 from Employee WHERE employee. ID = jobs.id)

4. Use the index as much as possible

Indexing can improve the efficiency of the corresponding select, but it also reduces the efficiency of insert and update, because it is possible to rebuild the index at INSERT or update, so it is important to consider how to build the index, depending on the situation. The number of indexes on a table should not be more than 6, if too many you should consider whether some of the indexes that are not commonly used are necessary.

In order for the optimizer to use the index efficiently, it should be noted when writing statements:

A, try to use the ontology of the field index

Select ID from t where num/2=num1--NUM has index select ID from t where num=num1*2--NUM1 has index

B. Do not format the indexed fields

You should try to avoid the expression of a field in the Where clause, and try to avoid functional manipulation of the field in the WHERE clause

If an index is used, the query criteria should use the indexed field ontology instead of using other functions on this field, otherwise the index will be invalidated. For example, there is a field a in the datasheet that represents a datetime, a type that has a date, an index for that field, and when the query is based on the time range, the following query condition appears:
SELECT * WHERE Date (date_col) between ' 2009-2-1 ' and ' 2009-3-1 '  # It can be written in MySQL.

The index is invalidated because the query condition is not the ontology of field a, but instead uses the date function. The workaround is to change the query criteria to:

SELECT * where Date_col between ' 2009-2-1 00:00:00 ' and ' 2009-3-1 23:59:59 '

When multiple fields are added to a composite index, the index function is lost when a single field is used as an index?? Pending discussion

An index is added to the ID in the Customer_copy table, but the following statement causes a full table scan of the SELECT * from customer_copy where id = ' 1 ' and cname= ' BB ' so the columns involved in where and order by should be considered And is forced to be used.

C, do not use functions for indexed fields

where left (NAME, 3) = ' abc ' or where SUBSTRING (name,1, 3) = ' abc ' WHERE NAME like ' abc% '

The MySQL date is converted to a string:

SELECT * from Time_test WHERE date_format (date_col, '%y-%m-%d ') > ' 2010-08-01 ' and Date_format (Date_col, '%y-%m-%d ') & lt; ' 2020-08-03 '

5, pay attention to the wording of the connection conditions

The connection condition of multi-table connection is of great significance to the selection of indexes, so we need special attention when we write the connection condition.

A, multi-table connection, the connection conditions must be written in full, rather repeat, do not leave gaps.

B, connection conditions use clustered index as far as possible

C, pay attention to the difference between the on part condition and the where part condition??? What is the difference when two SQL statements are executed?

6. Other areas needing attention

Experience has shown that the sooner the problem is discovered, the lower the cost, and many performance problems can be found in the coding phase, in order to identify performance issues early, it is important to note:

A, programmers pay attention to the amount of data in each table.

B, the coding process and the unit test process as far as possible with a large amount of data database testing, it is best to use actual data testing.

C, each SQL statement as simple as possible

D, do not update data for tables that have triggers frequently

E, note the limitations of database functions and their performance

7, for the WHERE clause and contains some of the following keywords, the specific use of the execution plan to view:

When the WHERE clause appears, especially if the WHERE clause contains is NULL,! =, <>, or, in, not in, try to avoid using the! = or <> operator in the WHERE clause,

MySQL uses indexes only for the following operators:<, <=, =, >, >=, Between,in, and sometimes like.

A, "should try to avoid the NULL value of the field in the Where clause to judge, otherwise it will cause the engine to abandon the use of the index for a full table scan, such as:

Select ID from t where NUM was nullselect ID from T where num=0 "

You can set the default value of 0 on NUM, make sure that the NUM column in the table does not have a null value, and then query:

Personal opinion: After testing, is null can also be found with index seek, 0 and Null are different concepts, the above statement of two query meaning and the number of records is different.

2, "should try to avoid using the! = or <> (for the non-equal) operator in the WHERE clause, otherwise the engine discards the use of the index for a full table scan. ”

Personal opinion: After testing the,<> can also be found with index seek.

3. mysql Big Data volume paging optimization

(1) The number of efficient statistical rows:

Select COUNT (*) from news;    Generally this performance will be better some select count (id) from news;   This will perform a full-table scan, and see if the ID is empty is not counted in the

COUNT (*) is faster than count (ID) because if the column name is written in the count () bracket, count counts the number of times the column has a value.

If the count () is written in parentheses, the wildcard does not match all the columns, but rather a direct count of the number of rows, and when a count (*) without a WHERE clause is executed, the MySQL query is very fast because MySQL already knows the number of records in the table beforehand. However, if you execute a count (*) with a WHERE clause, MySQL also performs a full table scan, so we should try to index the columns of the WHERE clause as much as possible.

Queries that use count are difficult to optimize because there are many situations where full table scanning is required, and the only way to optimize is to index the table of the WHERE clause, and more often we should tweak the application to avoid using count, for example, big data paging using the estimated paging method, and so on.

(2) Inquiry record

First, you need to create a new table pagination, with an ID and page field, and a type of integer. The synchronization ID value and the ID in the T_report_app table are as follows:

INSERT into pagination (ID) SELECT ID from T_report_app

Then insert the page value using the following statement.

SET @p:= 0; UPDATE pagination SET Page=ceil ((@p:= @p + 1)/() ORDER by ID DESC;

We paged by 10 records per page, in descending order of ID. If you insert or delete records, you need to synchronize the pagination table records.

This way, if we query the records on page 10th, we can directly know the ID of the 10 data in the T_report_app record, as follows:

  

You can also use another method, as follows:

Select Idfrom (   Select ID, ((@cnt: = @cnt + 1) + $perpage-1)% $perpage Temp--numbers the records that are queried to easily filter the ID of the first record on each page   FRO M T_report_app    JOIN (SELECT @cnt: = 0) T      --Reset @cnt parameter to 0   --WHERE ID < $last _id   ORDER by ID DESC   LIMIT $perpage * $buttons   --page has 10 paging buttons, 10 records per page, you need to filter out 80 records) Cwhere temp = 0;                --Filter The ID of the first record in each page, total 10

Calculates the ID of an offset for each paging button. The above results can be stored in the table, the aspects of paging query.

INSERT into Pagination2 (id,page) (SELECT id, @cntb: = @cntb + 1 as page from (   select ID, (@cnt: = @cnt + 1) + 10-1) % cnt from   T_report_app    join (select @cnt: = 0) T     -ORDER by ID DESC   ) C  Join (select @cntb: = 0) DW Here cnt = 0 ORDER by ID DESC  )

View the data in the Pagination2 as follows:

The 10th page of Max (ID) is 593620, which is consistent with the previous table.

Sword refers to architect series-mysql tuning

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.