MySQL index and query optimization summary

Source: Internet
Author: User
Tags mysql query mysql index

A simple contrast test

In the previous case, the C2c_zwdb.t_file_count table has only one self-increment id,ffilename field that is not indexed by SQL execution as follows:

In, type=all,key=null,rows=33777. The SQL does not use an index and is a very inefficient full-table scan. With federated queries and other constraints, the database consumes memory crazily and affects the execution of the front-end program.

At this point, add an index to the Ffilename field:

ALTER TABLE C2c_zwdb.t_file_count Add index Index_title (ffilename);

Execute the above query statement again, the contrast is obvious:

In the figure, the type=ref,key= index name (index_title), Rows=1. The SQL uses index index_title and is a constant scan that scans only one row based on the index.

Compared with unindexed, the query efficiency is very obvious when the index is added.

MySQL Index

The comparison test above shows that indexing is the key to fast searching. MySQL indexing is important for the efficient operation of MySQL. For a small amount of data, the lack of proper index impact is not very large, but when the amount of data increases, the performance will drop sharply. If multiple columns are indexed (combined), the order of the columns is important, and MySQL can only make valid lookups on the leftmost prefix of the index.

Here are a few common types of MySQL indexes.

Index sub-column indexes and composite indexes. A single-column index, that is, an index contains only single columns, and a table can have multiple single-row indexes, but this is not a composite index. A composite index, that is, an index that contains multiple columns.

1. mysql Index type

(1) Primary key index PRIMARY key

It is a special unique index and is not allowed to have null values. The primary key index is typically created at the same time as the table.

Of course, you can also use the ALTER command. Remember: A table can have only one primary key.

(2) uniquely indexed unique

The value of a unique index column must be unique, but a null value is allowed. If it is a composite index, the combination of column values must be unique. You can specify when you create a table, or you can modify the table structure, such as:

ALTER TABLE table_name ADD UNIQUE ( column )

(3) Normal index

This is the most basic index and it has no limitations. You can specify when you create a table, or you can modify the table structure, such as:

ALTER TABLE table_name ADD INDEX index_name ( column )

(4) Indexed index

A composite index, that is, an index that contains multiple columns. You can specify when you create a table, or you can modify the table structure, such as:

ALTER TABLE table_name ADD INDEX index_name ( column1 , column2 , column3 )

(5) Full-text index fulltext

Full-text indexing (also known as full-text search) is a key technology used by search engines at present. It can use a variety of algorithms, such as word segmentation, to intelligently analyze the frequency and importance of key words in text, and then intelligently filter out the search results we want according to certain algorithm rules.

You can specify when you create a table, or you can modify the table structure, such as:

ALTER TABLE table_name ADD fulltext ( column )

2. Index structure and principle

B+tree is commonly used in MySQL for indexing, but it is not discussed in this article, depending on the clustering index and the nonclustered index.

B + Tree Introduction

Below this B + tree pictures in many places can be seen, the reason why this is also selected here, because it is thought that this picture can be very good interpretation of the index of the search process.

For example, it is a B + tree. The light blue block we call a disk block, we can see that each disk block contains several data items (shown in dark blue) and pointers (shown in yellow), such as disk Block 1 contains data items 17 and 35, which contains pointers P1, P2, P3,P1 represents a disk block less than 17, P2 represents a disk block between 17 and 35. P3 represents a disk block larger than 35.

Real data exists in leaf nodes, i.e. 3, 5, 9, 10, 13, 15, 28, 29, 36, 60, 75, 79, 90, 99. Non-leaf nodes do not store real data, only data items that guide the direction of the search, such as 17 and 35, do not exist in the data table.

Find process

In, if you want to find the data item 29, then the disk Block 1 is loaded into memory, the first time Io, in memory with a binary lookup determined 29 between 17 and 35, locking disk Block 1 of the P2 pointer, memory time because very short (compared to the disk IO) can be negligible, Disk Block 1 through disk address of the P2 pointer to the disk block 3 is loaded into memory, the second io,29 between 26 and 30, locking disk block 3 of the P2 pointer, loading disk blocks 8 through the pointer to memory, a third Io, while in-memory binary find found 29, the end of the query, a total of three IO. The real situation is, the 3-tier B + tree can represent millions of data, if millions of data to find only three Io, the performance will be huge, if there is no index, each data item will occur once IO, then a total of millions of Io, it is obviously very expensive.

Properties

(1) The index field should be as small as possible.

Through the discovery process of the B + tree above, or by the fact that the actual data exists in the leaf node, the number of IO depends on the height of the B + number.

Assuming that the data of the current data table is n, the number of data items per disk block is M, then the tree height H=㏒ (m+1) n, when the amount of data n is certain, the larger the M, the smaller the H;

m = size of the disk block/size of the data item, the size of the disk block is the size of a data page, is fixed, if the data item occupies less space, the number of data items more than M, the height of the tree is lower. This is why each data item, the index field, is as small as possible, such as an int accounting for 4 bytes, which is less than half the bigint8 byte.

(2) The leftmost matching attribute of the index.

When a B + Tree data item is a composite data structure, such as (Name,age,sex), the B + number is based on the left-to-right order to establish the search tree, such as when (Zhang San, 20,f) such data to retrieve, the B + tree will first compare the name to determine the next direction of the search, If name is the same, then compare age and sex, and finally get the retrieved data, but when the (20,F) does not have the name of the data, B + tree does not know which node to check next, because the search tree when the name is the first comparison factor, You must search by name first to know where to go next. For example, when (Zhang San, F) such data to retrieve, B + tree can use name to specify the direction of the search, but the next field of age is missing, so only the name equal to Zhang San data are found, and then match the gender is the data of F, this is very important property, that is, the index of the leftmost matching characteristics.

Several principles of index building

(1) leftmost prefix matching principle

For multi-column indexes, always start with the first field in the index, and then back, the middle cannot be skipped. For example, a multi-column index (NAME,AGE,SEX) is created, matches the Name field, matches the age field, and then matches the sex field, and cannot be skipped in the middle. MySQL will always match right until it encounters a range query (>, <, between, like) to stop matching.

Generally, when you create a multicolumn index, the most frequently used column in the WHERE clause is on the leftmost.

Look at an example that complements the leftmost prefix matching principle and conforms to this principle.

Example: Table C2c_db.t_credit_detail built with index ( Flistid , Fbank_listid )

SQL statement that does not conform to the leftmost prefix matching principle:

SELECT * from T_credit_detail where fbank_listid= ' 201108010000199 ' \g

The SQL directly uses the second index field Fbank_listid, skipping the first indexed field Flistid, which does not conform to the leftmost prefix matching principle. Use the explain command to view the execution plan for the SQL statement, such as:

As you can see, the SQL does not use an index and is an inefficient full-table scan.

SQL statements that conform to the principle of leftmost prefix matching:

SELECT * from T_credit_detail where flistid= ' 2000000608201108010831508721 ' and fbank_listid= ' 201108010000199 ' \G

The SQL first uses the first field of the index, Flistid, and then uses the second field of the index Fbank_listid, which is not skipped in the middle, conforming to the leftmost prefix matching principle. Use the explain command to view the execution plan for the SQL statement, such as:

As you can see, the SQL uses an index and scans only one row.

By contrast, SQL statements that conform to the principle of the leftmost prefix match are significantly more efficient than the SQL statements that do not conform to the principle, and rise from full-table scanning to constant scanning.

(2) Try to choose a high-sensitivity column as the index.

For example, we will choose the number to do the index, but not the selective index.

(3) = and in can be disorderly

For example A = 1 and B = 2 and c = 3, the build (A,B,C) index can be in any order, and the MySQL query optimizer will help you optimize the form that the index can recognize.

(4) Index column cannot participate in calculation, keep column "clean"

such as:flistid+1> ' 2000000608201108010831508721 '. The reason is very simple, if the index column to participate in the calculation, that each retrieval, will first calculate the index once, then do the comparison, obviously the cost is too large.

(5) To expand the index as far as possible, do not create new indexes.

For example, the table already has an index of a, now to add (A, b) of the index, then only need to modify the original index.

Lack of indexes

Although indexes can improve query efficiency, indexes have their own shortcomings.

Additional Overhead for indexes:

(1) Space: The index needs to occupy space;

(2) Time: It takes time to query the index;

(3) Maintenance: The index needs to be maintained (when data changes);

The use of indexes is not recommended:

(1) A table with a small amount of data

(2) Space tension

Summary of common optimization

Optimization statement A lot, need to pay attention to a lot, for the usual situation summed up a few points:

1, indexed but not used in the case (not recommended)

(1) When the like parameter starts with a wildcard character

Try to avoid the like parameter starting with a wildcard, or the database engine discards full table scanning using the index.

An SQL statement that begins with a wildcard, for example: SELECT * from T_credit_detail where flistid like '%0 ' \g

This is a full table scan that is not used to index and is not recommended for use.

SQL statements that do not start with wildcards, for example: SELECT * from T_credit_detail where Flistid like ' 2% ' \g

Obviously, this uses the index, which is a range of lookups, which is much more efficient than the SQL statements that start with wildcards.

(2) When the Where condition does not conform to the leftmost prefix principle

Examples have been given for example in the content of the leftmost prefix matching principle.

(3) Use! = or <> operation

Try to avoid it! = or <> operator, or the database engine discards full table scanning using the index. Using > or < is more efficient.

SELECT * FROM T_credit_detail where flistid! = ' 2000000608201108010831508721 ' \g

(4) Index column participates in the calculation

You should try to avoid expression operations on the fields in the WHERE clause, which causes the engine to discard full table scans using the index.

SELECT * from T_credit_detail where Flistid +1 > ' 2000000608201108010831508722 ' \g

(5) Null value judgment for a field

You should try to avoid null values for the field in the Where clause, or it will cause the engine to discard full table scans using the index, such as:
Inefficient: SELECT * from T_credit_detail where flistid is null;

You can set the default value of 0 on Flistid, make sure that the Flistid column in the table does not have a null value, and then query:
Efficient: SELECT * from t_credit_detail where flistid = 0;

(6) Use or to connect conditions

You should try to avoid using or in the WHERE clause to join the condition, or it will cause the engine to abandon using the index for a full table scan, such as:
Inefficient: SELECT * from t_credit_detail where Flistid = ' 2000000608201108010831508721 ' or Flistid = ' 10000200001 ';

You can use the following query instead of the above or query:
Efficient: Select from t_credit_detail where Flistid = ' 2000000608201108010831508721 ' union ALL select from T_credit_det ail where Flistid = ' 10000200001 ';

2. Avoid SELECT *

During parsing, the ' * ' is converted to all column names, which is done by querying the data dictionary, which means more time is spent.

So, you should develop a good habit of taking whatever you need.

3. ORDER BY Statement optimization

Any non-indexed item in the ORDER BY statement, or a computed expression, will slow down the query.

Method: 1. Rewrite the order BY statement to use the index;

  2.为所使用的列建立另外一个索引  3.绝对避免在order by子句中使用表达式。
4. GROUP by Statement optimization

Increase the efficiency of the group BY statement by filtering out unwanted records before group by

Low efficiency:

SELECT JOB, AVG (SAL)

From EMP

GROUP by JOB

Having JOB = ' president '

OR JOB = ' MANAGER '

Efficient:

SELECT JOB, AVG (SAL)

From EMP

WHERE JOB = ' President '

OR JOB = ' MANAGER '

GROUP by JOB

5. Replace in with exists

A lot of times it's a good choice to replace in with exists:
Select num from a where num in (select num from B)
Replace with the following statement:
Select num from a where exists (select 1 from b where num=a.num)

6, use Varchar/nvarchar instead of Char/nchar

Use Varchar/nvarchar instead of Char/nchar as much as possible, because the first variable-length field has a small storage space and can save storage space, and secondly, in a relatively small field, search efficiency is obviously higher for queries.

7, you can use distinct without GROUP by

SELECT OrderID from Details WHERE UnitPrice > Ten GROUP by OrderID

Can be changed to:

SELECT DISTINCT OrderID from Details WHERE UnitPrice > 10

8, you can use union all do not use union

UNION all does not execute the SELECT DISTINCT function, which reduces a lot of unnecessary resources.

9. Use a fairly typed example in the Join table and index it

If your application has many join queries, you should confirm that the fields of join in two tables are indexed. In this way, MySQL internally initiates the mechanism for you to optimize the SQL statement for join.

Also, the fields that are used for join should be of the same type. For example, if you want to join a DECIMAL field with an INT field, MySQL cannot use its index. For those string types, you also need to have the same character set. (Two tables may not have the same character set)

Reprint: https://www.qcloud.com/community/article/382852?fromSource=gwzcw.107784.107784.107784

MySQL index and query optimization summary

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.