When I read an article about database indexing today, the article mentions the prefix index, which is really a new word for people who have been working on database applications for so many years. So I intend to study it.
The prefix index seems to be a concept in MySQL that is not presented in SQL Server and Oracle. So I installed a MySQL to do the experiment and figure out the prefix index.
The prefix index, in other words, is to index the first few characters of the text (specifically, when several characters are indexed) so that the index is smaller, so the query is faster. A bit equivalent to using the left function for fields in Oracle, the function index is established, except that the MySQL prefix index is internally automatically matched at query time and does not need to use the left function.
Other articles mention:
The MySQL prefix index can effectively reduce the size of the index file and increase the speed of the index. But the prefix index also has its drawbacks: MySQL cannot use prefix indexes in ORDER by or GROUP by, nor can they be used as an overlay index (covering index).
The syntax for establishing a prefix index is:
ALTERTABLEtable_nameADDKEY(column_name(Prefix_length));
The key parameter here is prefix_length, which needs to get the appropriate index selectivity (indexselectivity) based on the contents of the actual table. Index selectivity is the ratio of the number of repetitions to the total number.
Select1.0*Count(distinctCOLUMN_NAME)/Count(*)
fromtable_name
For example, we now have an employee table with a FirstName field, which is varchar (50), and we query the index selectivity for that field:
Select1.0*Count(distinctFirstName)/Count(*)
fromEmployee
Get the result 0.7500, and then we want to build a prefix index on FirstName, and we want the selectivity of the prefix index to be as close as possible to the selectivity of indexing the entire field. Let's take a look at 3 characters, how to:
Select1.0*Count(distinct Left(FirstName,3))/Count(*)
fromEmployee
The result is 0.58784, as if the gap is a bit big, let's try another 4 characters:
Select1.0*Count(distinct Left(FirstName,4))/Count(*)
fromEmployee
Get 0.68919, has been promoted a lot, and then try 5 characters, the result is 0.72297, this result and 0.75 is very close, so we think the prefix length 5 is a suitable value. So we can build a prefix index for FirstName:
AlterTableTest. EmployeeAddKey(FirstName (5))
The query statement does not need to change after the prefix index is established, and if we want to query all the employee FirstName for Devin, then SQL still writes:
Select*
fromEmployee E
whereE.firstname='Devin';
Here's a summary of what happens when you use the prefix index:
- String columns (Varchar,char,text, etc.) that require a full field match or a previous match. That is, ' xxx ' or like ' xxx% '
- The string itself may be longer, and the first few characters start to be different. For example, we have a Chinese name prefix index does not make any sense, because Chinese names are very short, and the recipient address using prefix index is not very practical, because on the one hand, the address is generally the beginning of xx province, that is, the first few characters are similar, and the pickup address is generally like '% xxx% ', will not use a previous match. On the contrary, the names of foreigners can be indexed by prefix because their characters are longer and the first few characters are more selective. The same email is also a field that can be indexed using a prefix.
- The index selectivity of the first half of the characters is already close to the index selectivity of the full field. If the length of the whole field is 20, the index selectivity is 0.9, and we prefix the first 10 characters with a selectivity of only 0.5, then we need to continue to increase the length of the prefix character, but this time the advantage of the prefix index is not obvious, there is not much need to build a prefix index.
Prefix index, a solution for optimizing index size