Analysis of a mysql slow query accident

Last Update:2016-03-29 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Analysis of a mysql slow query accident
Years ago, the project team received a public account. After the launch, the SQL statements used to query sessions in the cid column are dozens of times slower! I have been thinking about this problem for a long time and it has been a knot in my mind since its appearance. The cid column is indexed. It is okay to update a common cid column. Why is there a problem? How does the same prefix affect the index?
Analysis Process1. query the cid under explain. The cid will useMid-qqwanggou001 inserts data with a prefix

Explain
Select *

From analysis_sessions

Where cid = "mid-qqwanggou001-b99359d9054171901c0"

The analysis results are as follows:

From the explain analysis, we can see that this query uses an index, but innodb considers that there are 1.65 million rows of data to be filtered by the mysql server (that is, the where condition is used for filtering ). If these huge data is in the memory, it will not take much time to traverse it. However, it is very likely that the data is stored on the disk. When so much data is read from the disk and then loaded into the memory, a large amount of disk IO will inevitably be very time-consuming.

2. Analyze common cid queries

Explain, cid = "sid-a2f9047ddf528d837e5f60843c83aae9" using data ". This data does not contain a public prefix.

Explain

Select *

From analysis_sessions

Where cid = "sid-a2f9047ddf528d837e5f60843c83aae9"

The analysis results are as follows:

The storage engine returns only one row of data to the mysql server for the same column and the same index. That is to say, innodb only needs to read the leaf node of a secondary index. The I/O workload is much lower than that of the preceding SQL statement.

Preliminary analysis conclusion: For cid queries with long prefixes, the innodb Storage engine returns millions of data to the mysql upper-end server. This is just a phenomenon. I still want to ask, the same table, the same column, the same index structure (B + tree index), the same query, only different data, why are there big differences in results?

Last-step analysis

I had been entangled in this problem for a long time. When I walked the night before, I accidentally thought of the key_len column of the explain result. I never looked at this column and thought it was useless, but 27 is out of line with the definition of 50 varchar In the cid column. 27 is obviously less than 50. First of all, you can be sure that this index uses a prefix index. To put it bluntly, it intercepts the first part of the string as the index data. The gbk encoding used in the analysis_session table. That is to say, the index requires two bytes to represent a varchar. Key_len

27 = 2*12 + 2 + 1

The 27-bit index only contains the first 12 characters. 2 In the middle stores the length, and the next byte stores Null information, because this column allows Null.

The final conclusion: the problem has been quite clear. The cid prefix is 17 characters and is 12 characters larger than the prefix index. That is to say, All the headers that store cid data (million-level), B + tree leaf nodes, and only one B + tree non-leaf node, point to this. Therefore, when you query cid-related data, all CIDS will be returned to the mysql server for where filtering, which is terrible in terms of efficiency. The index is still used, otherwise it will cause full table scan. However, there is a problem with this data design. The search efficiency of the B + tree is O (LogN), but the data is immediately changed to O (N), which is equivalent to a local full table scan.

Reasonable speculation: as long as there is a new cid, the cid query will only become slower!

Extended, better code practice:

During indexing of varchar, blob, text, and other side-length data, the database automatically creates a prefix index, so the B + tree does not index the entire field. Many people like to use the prefix as a string sign. This time, you should pay attention to it. After the prefix is saved to mysql, the retrieval efficiency is reduced. The longer the prefix, the lower the efficiency of B + tree queries.

The following code suggestions are provided:

1. Use the prefix as the suffix and change startWith to endWith.

2. Do not try suffix fuzzy search, like "%. com". This approach is even worse, and the index cannot be used at all, so full table scan is performed.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Analysis of a mysql slow query accident

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Analysis of a mysql slow query accident

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support