Preliminary understanding of MySQL prefix index and index Selectivity

Source: Internet
Author: User

Preliminary understanding of MySQL prefix index and index Selectivity

Sometimes you need to index long character columns, such as BLOB, TEXT, or long VARCHAR columns. This will make the index very large and cause slow query. In this case, we can use the prefix index to index the starting part of the characters, which can greatly save the index space and thus improve the index efficiency. However, this will also reduce the indexing selectivity.

The index selectivity refers to the non-repeated index values (also called cardinality) and the total number of records (# T), Ranging from 1/# T to 1.

The higher the index selectivity, the higher the query efficiency, because the more selective indexes can filter more rows. The selectivity of the unique index is 1, which is the best index selectivity and the best performance. Example:

// Create a table
Create table sakila. city_demo (
City VARCHAR (50) NOT NULL
);

// Self-replication
Insert into sakila. city_demo (city) SELECT 'city' FROM sakila. city;

// Random value assignment data from the city table to the city test table
UPDATE sakila. city_demo SET city = (SELECT city FROM sakila. city order by rand () LIMIT 1 );

Now we have a test dataset. Now, let's count the most common cities in the city table:

SELECT
COUNT (*) AS cnt,
City
FROM
Sakila. city_demo
GROUP
City
ORDER
Cnt DESC
LIMIT 10;

The result set is as follows:

The above 10 cities are the most frequently-seen city prefixes, starting with three prefix letters:

SELECT
COUNT (*) AS cnt,
LEFT (city, 3) AS pref
FROM
Sakila. city_demo
GROUP
Pref
ORDER
Cnt DESC
LIMIT 10;

The result set is as follows:

The above results show that each prefix appears more times than the original city, so the unique prefix is much less than the unique city. Then we continue to add the length test until the prefix selectivity is close to the full column selectivity. Finally, we found that when the length is 7, the most suitable:

Another method for calculating the proper prefix length is to calculate the selectivity of integrity and make the prefix selectivity close to the selectivity of the complete column. The following shows how to calculate the complete column selectivity:

1 SELECT2     COUNT(DISTINCT city)/COUNT(*) 3 FROM 4     sakila.city_demo;

Result set:

Generally, if the prefix selectivity can be close to 0.031, it can be used. Of course, you can also calculate different prefix lengths in a query, which is very useful for large tables.

SELECT
COUNT (distinct left (city, 3)/COUNT (*) AS sel3,
COUNT (distinct left (city, 4)/COUNT (*) AS sel4,
COUNT (distinct left (city, 5)/COUNT (*) AS sel5,
COUNT (distinct left (city, 6)/COUNT (*) AS sel6,
COUNT (distinct left (city, 7)/COUNT (*) AS sel7,
FROM
Sakila. city_demo;

The result set is as follows:

The query shows that when the prefix index length reaches 7, the prefix length is increased, and the range of selective improvement is very small. The longer the index length is, the larger the index, the slower the query, therefore, the length of 7 is more suitable.

Of course, it is not enough to look at the average selectivity, and there are exceptions. Based on the above average selectivity, you may think that the index with the prefix length of 4 or 5 is enough, but if the data distribution is uneven, there may be traps, now we can observe the maximum number of times that the prefix length is 4:

It can be seen that if the prefix is 4 bytes, the most common prefix appears more frequently than the most common cities. That is, the selectivity of these values is lower than the average selectivity.

The following describes how to create a prefix index:

ALTER TABLE sakila.city_demo ADD KEY (city(7));

Advantages:

Make the index smaller and faster.

Disadvantages:

You cannot use prefix indexes for order by and group by operations, or use prefix indexes for overwriting.

Common application scenarios:

Prefix indexes are used for long hexadecimal unique IDs.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.