Analysis of a mysql slow query accident

Source: Internet
Author: User

Analysis of a mysql slow query accident
Years ago, the project team received a public account. After the launch, the SQL statements used to query sessions in the cid column are dozens of times slower! I have been thinking about this problem for a long time and it has been a knot in my mind since its appearance. The cid column is indexed. It is okay to update a common cid column. Why is there a problem? How does the same prefix affect the index?
Analysis Process1. query the cid under explain. The cid will useMid-qqwanggou001 inserts data with a prefix

Explain
Select *
From analysis_sessions
Where cid = "mid-qqwanggou001-b99359d9054171901c0"

The analysis results are as follows:

From the explain analysis, we can see that this query uses an index, but innodb considers that there are 1.65 million rows of data to be filtered by the mysql server (that is, the where condition is used for filtering ). If these huge data is in the memory, it will not take much time to traverse it. However, it is very likely that the data is stored on the disk. When so much data is read from the disk and then loaded into the memory, a large amount of disk IO will inevitably be very time-consuming.
2. Analyze common cid queries

Explain, cid = "sid-a2f9047ddf528d837e5f60843c83aae9" using data ". This data does not contain a public prefix.

Explain
Select *
From analysis_sessions
Where cid = "sid-a2f9047ddf528d837e5f60843c83aae9"

 

The analysis results are as follows:

The storage engine returns only one row of data to the mysql server for the same column and the same index. That is to say, innodb only needs to read the leaf node of a secondary index. The I/O workload is much lower than that of the preceding SQL statement.
Preliminary analysis conclusion: For cid queries with long prefixes, the innodb Storage engine returns millions of data to the mysql upper-end server. This is just a phenomenon. I still want to ask, the same table, the same column, the same index structure (B + tree index), the same query, only different data, why are there big differences in results?
Last-step analysis
I had been entangled in this problem for a long time. When I walked the night before, I accidentally thought of the key_len column of the explain result. I never looked at this column and thought it was useless, but 27 is out of line with the definition of 50 varchar In the cid column. 27 is obviously less than 50. First of all, you can be sure that this index uses a prefix index. To put it bluntly, it intercepts the first part of the string as the index data. The gbk encoding used in the analysis_session table. That is to say, the index requires two bytes to represent a varchar. Key_len
27 = 2*12 + 2 + 1
The 27-bit index only contains the first 12 characters. 2 In the middle stores the length, and the next byte stores Null information, because this column allows Null.
The final conclusion: the problem has been quite clear. The cid prefix is 17 characters and is 12 characters larger than the prefix index. That is to say, All the headers that store cid data (million-level), B + tree leaf nodes, and only one B + tree non-leaf node, point to this. Therefore, when you query cid-related data, all CIDS will be returned to the mysql server for where filtering, which is terrible in terms of efficiency. The index is still used, otherwise it will cause full table scan. However, there is a problem with this data design. The search efficiency of the B + tree is O (LogN), but the data is immediately changed to O (N), which is equivalent to a local full table scan.
Reasonable speculation: as long as there is a new cid, the cid query will only become slower!
Extended, better code practice:
During indexing of varchar, blob, text, and other side-length data, the database automatically creates a prefix index, so the B + tree does not index the entire field. Many people like to use the prefix as a string sign. This time, you should pay attention to it. After the prefix is saved to mysql, the retrieval efficiency is reduced. The longer the prefix, the lower the efficiency of B + tree queries.
The following code suggestions are provided:
1. Use the prefix as the suffix and change startWith to endWith.

2. Do not try suffix fuzzy search, like "%. com". This approach is even worse, and the index cannot be used at all, so full table scan is performed.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.