The common prefix of strings affects the backtracking analysis of MySQL B + Tree query

Source: Internet
Author: User

    The project group received the public number years ago.

After the

is online, the SQL associated with the query session with the CID column is dozens of times times slower! Thinking about this question for a long time. It has been a knot in my heart since it appeared. CID This column is indexed, the normal CID column update is no problem, why only have problems? How does the same prefix affect the index?
    parsing procedure     1.explain for CID queries. The CID will be mid-qqwanggou001 insert data as prefix Span style= "font-family:"; font-size:19px; Line-height:27px ">

Explain select *from Analysis_sessionswhere cid = "mid-qqwanggou001-b99359d9054171901c0"


Analysis results such as the following:

Watermark/2/text/ahr0cdovl2jsb2cuy3nkbi5uzxqv/font/5a6l5l2t/fontsize/400/fill/i0jbqkfcma==/dissolve/70/gravity /center "width=" 867 "height="/>

From the explain analysis can be seen. The query uses an index, but innodb that there are 1.65 million rows of data that need to be filtered for mysqlserver (that is, by the Where condition).

Assuming that these huge amounts of data are in memory, traversing through it won't take much time. But most likely, the data is on disk. So much data reads from the disk and then loads the memory. A lot of disk IO must be time-consuming.

Compared to the electronic movement of memory. The physical movement of the disk manipulator is several orders of magnitude slower.


2. Analysis of General CID query

Data for explain. cid = "sid-a2f9047ddf528d837e5f60843c83aae9". This data is not prefixed with a public prefix.

   
Explain select *from Analysis_sessionswhere cid = "Sid-a2f9047ddf528d837e5f60843c83aae9"

Analysis results such as the following:


The same column, the same index. This time the storage engine returned only one row of data to MySQLServer. This means that InnoDB only needs to read a leaf node with a level two index.

Compared to the above SQL Io, the pressure is obviously much smaller.


Preliminary Analysis Conclusion: CID query with long prefix. The InnoDB storage engine returns millions data to the top of MySQL server.

This is just a phenomenon, I still want to ask, the same table, the same column, the same index structure (b + Tree index). The same query, just different data. Why is there a difference between the results?


one step closer analysis
This problem has been tangled for a long time, until the night before the walk. Inadvertently will think of the explain results of key_len this column. I never look at this column and feel useless. But 27 is out of tune with the definition of CID, 50 varchar. 27 is significantly less than 50, first of all to be sure, this cable refers to the prefix index, in plain, truncated the previous part of the string as the index data. GBK encoding for analysis_session tables. In other words, the index requires 2 bytes to represent a varchar. Explain Key_len
.
2 * + 2 + 1
A
27-bit index that indexes only the first 12 characters . the middle of the 2 storage length. The following byte stores the null information, since this column is null-approved.

Finally, the question is clear: The CID prefix is 17 characters, which is greater than the 12 characters of the prefix index, that is to say. Store all CID data (millions) B + Tree leaf nodes will only have a B + tree non-leaf node pointer pointing here . So. When you look up CID-related data, all CID will be returned to MySQLServer for where filtering, which, in efficiency, is very scary.

The index is really still being used. Otherwise, it will cause a full table scan. But there is a problem with this data design. B + Tree Search efficiency is O (LOGN), but in the event of this data, immediately become O (N), equivalent to a partial full table scan.

so reasonable to guess. A query that only has new cid,cid will only become slower.

Extended, better code practice:
Varchar,blob, text when the Long data is indexed. The database actively builds the prefix index itself, so the B + tree does not index the entire part of the field. A lot of classmates like to use the prefix as a string of flags, this time to pay attention to, have a precedent. After the prefix is saved to MySQL, the retrieval efficiency is reduced and the prefix is longer. The lower the efficiency of the B + tree query.
Here is a suggestion for the code:
1. Prefix as suffix, startwith change to Endwith

2. Do not try to suffix fuzzy search, like "%.com", this way is worse, completely unable to index, so the full table scan.




The common prefix of strings affects the backtracking analysis of MySQL B + Tree query

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.