Sometimes you might want to have columns with a very high number of indexed characters. This will make your index very large and slow. One strategy is to simulate a hash index. The previous part has already said. But sometimes this method is not very good. What will be done.
You can generally use some of the characters in the first part of the index to save space and get good performance. This allows you to use less space for your index, but this reduces selectivity. The selectivity of indexes (index selectivity) is a ratio of the number of index values and the number of rows in the table (#T). Range is 1/#T到1. The higher the selectivity of the index, the better, because MySQL will filter out more rows when it matches. Selectivity of a unique index 1, which is the best.
The index of a prefix for good performance, its selectivity is sufficient. If you index a BLOB and text column, or a very long varchar column, you must define the prefix index because MySQL does not allow you to index all their lengths.
The trick is to select a column prefix, and the length of the prefix has a good selectivity, but it may save more space. The indexing effect of the index prefix should be as close as possible to the overall length of the index.
To know the length of a good prefix, find the most common values and compare them to the most common prefixes. Look at the following examples
CREATE TABLE sakila.city_demo(city VARCHAR(50) NOT NULL);
INSERT INTO sakila.city_demo(city) SELECT city FROM sakila.city;
-- Repeat the next statement five times:
INSERT INTO sakila.city_demo(city) SELECT city FROM sakila.city_demo;
-- Now randomize the distribution (inefficiently but conveniently):
UPDATE sakila.city_demo
SET city = (SELECT city FROM sakila.city ORDER BY RAND( ) LIMIT 1);
Now we have a sample dataset. The results are not really distributed, and we use RAND (), so the results are variable, but there is no effect on this exercise. First we find the cities with the highest frequency.
mysql> SELECT COUNT(*) AS cnt, city
-> FROM sakila.city_demo GROUP BY city ORDER BY cnt DESC LIMIT 10;
The number of occurrences in each city was found to be 45-65. Now we have the highest prefix in the lookup frequency. Start with a 3-letter prefix.
+-----+------+
| cnt | pref |
+-----+------+
| 483 | San |
| 195 | Cha |
| 177 | Tan |
| 167 | Sou |
| 163 | al- |
| 163 | Sal |
| 146 | Shi |
| 136 | Hal |
| 130 | Val |
| 129 | Bat |
+-----+------+
Each prefix appears many times. So there are too many unique prefixes more than a single full length city names. The next approach is to increase the length of the prefix until the selectivity of the prefix is close to the full length of the column. It is found that the prefix length 7 is the most suitable for the experiment.
mysql> SELECT COUNT(*) AS cnt, LEFT(city, 7) AS pref
-> FROM sakila.city_demo GROUP BY pref ORDER BY cnt DESC LIMIT 10;
+-----+---------+
| cnt | pref |
+-----+---------+
| 70 | Santiag |
| 68 | San Fel |
| 65 | London |
| 61 | Valle d |
| 49 | Hiroshi |
| 48 | Teboksa |
| 48 | Pak Kre |
| 48 | Yaound |
| 47 | Tel Avi |
| 47 | Shimoga |
+-----+---------+