MySQL query optimization and MySQL Optimization

Source: Internet
Author: User
Tags mysql query optimization

MySQL query optimization and MySQL Optimization
1. Introduction

A good web application has excellent access performance. MySQL database is an integral part of web applications and an important part that determines their performance. Therefore, it is vital to improve MySQL performance.

MySQL performance improvement can be divided into three parts, including hardware, network, and software. The hardware and network depend on the company's financial resources. We need to talk about them here. The software is further divided into many types. Here we use MySQL query optimization to improve performance.

I recently read some books on query optimization and some articles written by my predecessors on the Internet.

The following is a summary of query optimization:

2. intercepting SQL statements

1. Comprehensive log query

2. Slow query logs

3. Binary logs

4. Process List

Show full processlist;

...

3. Basic query optimization analysis commands

1. EXPLAIN {PARTITIONS | EXTENDED}

2. show create table tab;

3. show indexs from tab;

4. show table status like 'tab ';

5. SHOW [GLOBAL | SESSION] status like '';

6. SHOW VARIABLES

....

Ps: I feel like there is no nutrition on it. The following is the real dry goods.

4. Query Optimization

1. Try to avoid full-text scanning, add indexes to corresponding fields, and apply indexes to query

2. Delete unnecessary or duplicate Indexes

3. Query Rewriting and Equivalent Conversion (predicates, subqueries, and connection queries)

4. Delete unnecessary statements that repeat the content and streamline the statements.

5. Integrate repeated statements

6. cache query results

5. index optimization 5.1. Index advantages:

1. maintain data integrity

2. Improve data query performance

3. Improve table join operations (jion)

4. Sort the query results. If no index is available, the internal File Sorting Algorithm is used for sorting, which is slow.

5. Simplified data aggregation operations

5.2 index disadvantages

1. Indexes occupy a certain amount of storage space.

2. Data insertion, update, and deletion will be affected by the index and the performance will be reduced. Because the index of data changes also need to be updated

3. Multiple indexes. The optimizer is recommended if it takes time.

5.3 index selection

1. Large data volume

2. When data is highly duplicated, do not use

3. If the retrieved data exceeds 20%, full-text scans are used without indexing.

5.4 index details

Data Query:

InnoDB and MyISAM in MySQL are both B-Tree indexes.

B-Tree contains: PRIMARY KEY, UNIQUE, INDEX, and FULLTEXT

B-Tree indexes are not supported (that is, indexes are not used when Fields use the following symbols ):

>,<, >=, <=, ,! =, <>, Like '% **'

[Here we will introduce overwriting indexes]

Let me introduce it in my own way. Overwriting indexes do not actually exist like primary key indexes and unique indexes. They are just a definition of index application in some specific scenarios. [Another understanding is that the queried column is an index column, therefore, the column is indexed ]. It can break through the traditional restrictions, use the above operators, and still use indexes for queries.

Because the queried column is an index column, you do not need to read the row. You only need to read the column field data. [For example, if you want to read a book, you need to find a specific content. If the content appears in the directory, you do not need to go through one page and locate the page in the directory]

How to activate overwriting indexes? What is a specific scenario?

Index field, which appears in select.

Composite indexes may have other special scenarios. For example, a composite index of three columns only requires that the leftmost column of the composite index appear at any place in select, where, group by, and order.

View:

Using index is displayed in Extra in EXPLAIN, which indicates that this statement overwrites the index.

Conclusion:

We do not recommend that you use select * from for query. You should write the required fields and add corresponding indexes to improve query performance.

Based on the test results of the preceding operators:

1. In the form of select * from, where is the primary key, you can kill [except like] (query using the primary key); index is not all.

2. perform the test in the form of select field a from tab where field a "above operator". The results can still be queried using indexes. [Covered Index]

Other index optimization methods:

1. Use the index keyword as the connection Condition

2. Use of composite indexes

3. Merge indexes or and combine the fields involved into composite indexes.

4. the where, and group by fields are indexed.

6. subquery Optimization

In the from statement, it is a non-correlated subquery. You can pull the subquery to the parent layer. In multi-Table connection query, select the connection cost.

The query optimizer uses the nested execution method for subqueries. That is, each row in the parent query executes a subquery, which is executed many times. This execution method is very inefficient.

Advantages of converting a subquery into a connection query:

1. subqueries do not need to be executed many times

2. The optimizer can select different methods and connection Sequence Based on information.

3. The connection condition of the subquery is changed to the filtering condition of the parent query to Improve the efficiency.

Optimization:

Merge subqueries. If multiple subqueries exist, merge them as much as possible.

Expand the subquery, that is, pull up into multi-table queries (equivalent changes are guaranteed at all times)

Note:

Subquery expansion can only expand simple queries. If a subquery contains clustering functions, group by, and DISTINCT, it cannot be pulled up.

Select * from t1 (select * from tab where id> 10) as t2 where t1.age> 10 and t2.age <25;

Select * from t1, tab as t2 where t1.age> 10 and t2.age <25 and t2.id> 10;

Procedure:

1. merge from and modify corresponding parameters.

2. merge where and connect them with and.

3. Modify the corresponding predicate (in change =)

7. Equivalent predicates Rewriting:

1. Rewrite betweeen and to >=, <=, AND so on. Actual measurement: 100,000 pieces of data, before and after rewriting, 1.45 s, 0.06 s

2. in converts multiple or values. When the field is an index, both of them can use the index, or the efficiency is better than in

3. Change name like 'abc % 'To name> = 'abc' and name <'abd ';

Note: In a million-level data test, the like is faster than the last query before the name has no index; after adding an index to the field, the following is a little faster, with little difference, because the two methods use indexes during query.

....

8. Condition simplification and Optimization

1. merge where, having (when groupby and clustering functions do not exist) and join-on conditions as much as possible.

2. Remove unnecessary parentheses, reduce the or and tree layers with syntax, and reduce cpu consumption.

3. Constant transfer. A = B and B = 2 TO a = 2 and B = 2. Try not to use the variable a = B or a = @ var

4. eliminate useless SQL Conditions

5. Try not to use expression calculation on the right side of the where equal sign. do not perform expression Calculation on fields or use functions in the where clause.

6. constant transformation and inequality transformation. For example, the optimization of a> B and B> 10 million data to a> B and a> 10 and B> 10 is significant.

9. External Connection Optimization

Convert external connection to internal connection

Advantages:

1. It takes more time to optimize the processing of external connections than internal connections.

2. After the outer join is eliminated, the optimizer has more options for selecting the multi-table join sequence.

3. You can use the table with the strictest filtering conditions as the External table (the first in the join sequence is the external circulation layer of the multi-layer cyclic body ),

This reduces unnecessary I/O overhead and accelerates Algorithm Execution.

On a. id = B. id and where a. id = B. id. on, the table is connected, and where is used for data comparison.

Note: The premise must be that the result is NULL (that is, the condition limit should not be NULL data rows, meaning internal join)

Optimization Principles:

Streamlined query, elimination of connections, equivalent conversion, and removal of redundant table object connections

For example, the primary key/unique key is used as the connection condition, and the intermediate table column is only used as the equivalent condition. You can remove the intermediate table connection.

10. Other query Optimizations

1. The following will cause the index query to be abandoned and full-text scanning will be used.

1.1. Use in the where clause! = Or <> operator Note: Primary keys are supported. Non-primary key not supported

1.2 avoid using or

According to tests, indexes cannot be used if or is not used. In most cases, indexes are not used, but some cases are used. Therefore, the specific situation is analyzed.

Similar optimization:

Select * from tab name = 'A' or name = 'bb ';

=>

Select * from tab name = 'A'

Union all

Select * from tab name = 'bb ';

Test:

1. In the 100,000 data test, if no index is available, the above query speed is twice faster than the following one.

2. In the 0.3 million data test, when both aa and bb are independently indexed, the query rate below is a little faster than or.

1.3 avoid using not in

Not in generally does not use indexes; primary key fields can

1.4 avoid null judgment in the where clause

1.5. like cannot be prefixed with the percentage sign like '%. com'

Solution:

1. If % prefix is required and the data length is small, such as URL, you can flip the data into the database and check it again. Like reverse '%. com ';

2. Overwrite Indexes

 

1.6 when using an index field as a condition, if it is a composite index, you should use the field name with the leftmost prefix of the Index

2. Replace exists with in

Select num from a where num in (select num from B)

Select num from a where exists (select 1 from B where num = a. num)

1 million data entries, 59417 s and 6.65 s for filtering 4.18 data entries. No other optimizations were made, but only exists was replaced with in.

3. The field is defined as a string. Full-text scanning is performed without quotation marks and no indexes are used during query.

[The following is an excerpt from the midnight pop-up blog: http://www.cnblogs.com/lingiu/p/3414134.html. This is a test that does not match]

4. Try to use table variables instead of temporary tables.

5. Avoid frequent creation and deletion of temporary tables to reduce the consumption of system table resources.

6. If a temporary table is used, you must explicitly delete all temporary tables at the end of the stored procedure. truncate the table first, and then drop the table, this prevents system tables from being locked for a long time.

7. Avoid using a cursor whenever possible, because the efficiency of the cursor is poor. If the cursor operation has more than 10 thousand rows of data, you should consider rewriting.

8. Large data volumes. If the data volume is too large, consider whether the corresponding requirements are reasonable.

9. Avoid large transaction operations as much as possible to improve the system concurrency capability.

.....

11. Blog Summary

After checking the data and coding these days, I learned that MySQL's query optimization is not simply achieved by a certain rule according to a certain formula. The experiment is the only criterion for the test. After tests over the past few days, it is concluded that MySQL query optimization has a general direction, but it is impossible to obtain a 10 thousand optimization formula, after all, the syntax, result focus, and field environment of each SQL query statement are different. It must have carefully studied SQL query optimization and several years of experience in optimization. Haha, I'm just a cainiao.

We recommend that you do not just look at SQL query optimization. You should think about code, test fields, test fields, and test data volume levels.

 

The above are some of my summary, and there may be some shortcomings. After all, you are still a cainiao, and it is not the direction of DBA. If you find any shortcomings or errors, please submit them.

 

Author: Yi ye Fengfeng

Disclaimer: When reprinting, please provide the original article link clearly on the article page.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.