Schema optimization and indexing

Last Update:2017-02-27 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Sometimes you might want to have columns with a very high number of indexed characters. This will make your index very large and slow. One strategy is to simulate a hash index. The previous part has already said. But sometimes this method is not very good. What will be done.

You can generally use some of the characters in the first part of the index to save space and get good performance. This allows you to use less space for your index, but this reduces selectivity. The selectivity of indexes (index selectivity) is a ratio of the number of index values and the number of rows in the table (#T). Range is 1/#T到1. The higher the selectivity of the index, the better, because MySQL will filter out more rows when it matches. Selectivity of a unique index 1, which is the best.

The index of a prefix for good performance, its selectivity is sufficient. If you index a BLOB and text column, or a very long varchar column, you must define the prefix index because MySQL does not allow you to index all their lengths.

The trick is to select a column prefix, and the length of the prefix has a good selectivity, but it may save more space. The indexing effect of the index prefix should be as close as possible to the overall length of the index.

To know the length of a good prefix, find the most common values and compare them to the most common prefixes. Look at the following examples

CREATE TABLE sakila.city_demo(city VARCHAR(50) NOT NULL); INSERT INTO sakila.city_demo(city) SELECT city FROM sakila.city; -- Repeat the next statement five times: INSERT INTO sakila.city_demo(city) SELECT city FROM sakila.city_demo; -- Now randomize the distribution (inefficiently but conveniently): UPDATE sakila.city_demo 　　 SET city = (SELECT city FROM 　sakila.city ORDER BY RAND( ) LIMIT 1);

Now we have a sample dataset. The results are not really distributed, and we use RAND (), so the results are variable, but there is no effect on this exercise. First we find the cities with the highest frequency.

mysql> SELECT COUNT(*) AS cnt, city 　　　-> FROM sakila.city_demo GROUP BY city ORDER BY cnt DESC LIMIT 10;

The number of occurrences in each city was found to be 45-65. Now we have the highest prefix in the lookup frequency. Start with a 3-letter prefix.

+-----+------+ | cnt | pref | +-----+------+ | 483 | San 　| | 195 | Cha 　| | 177 | Tan 　| | 167 | Sou 　| | 163 | al- 　| | 163 | Sal 　| | 146 | Shi 　| | 136 | Hal 　| | 130 | Val 　| | 129 | Bat 　| +-----+------+

Each prefix appears many times. So there are too many unique prefixes more than a single full length city names. The next approach is to increase the length of the prefix until the selectivity of the prefix is close to the full length of the column. It is found that the prefix length 7 is the most suitable for the experiment.

mysql> SELECT COUNT(*) AS cnt, LEFT(city, 7) AS pref 　　　-> FROM sakila.city_demo GROUP BY pref ORDER BY cnt DESC LIMIT 10; +-----+---------+ | cnt | pref 　　| +-----+---------+ | 　70 | Santiag | | 　68 | San Fel | | 　65 | London 　| | 　61 | Valle d | | 　49 | Hiroshi | | 　48 | Teboksa | | 　48 | Pak Kre | | 　48 | Yaound 　| | 　47 | Tel Avi | | 　47 | Shimoga | +-----+---------+

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Schema optimization and indexing

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support