Using examples to understand the concept of indexing is a simple way to do it. So we look at an example of an index.
If we were going to make an online dating site. The user's data will have many columns, such as country, state/region, city, sex, age, eye color, and so on. This site must support a combination of these attributes to query user data. It is also necessary to enable users to sort and limit results through the user's recent online time, other member ratings, and so on. How can we design an index for such a complex requirement?
Oddly enough, the first thing to decide is whether we must use index sorting or whether a file sort is acceptable. Indexed sorting limits how indexes and statement creation are made. For example, the case where we cannot use an index is where the age of the condition is between 18-25, regardless of whether the statement uses an index to sort.
If MySQL uses an indexed range query, it cannot be sorted using another index. If this is a more common query, we know that a lot of queries are used to sort files.
Supports multiple types of filtering
Now we need to know the number of unique values for the column and the columns that often appear after the Where condition. It is highly selective to create indexes on columns with many unique values. Because MySQL will efficiently filter out unwanted values.
Country is not sure if it is selective, but it may appear in most query statements. The sex column is not selective, but it basically appears in each query statement. Based on the above ideas, we can use a prefix (sex,country) to create an index of a series of different combinations.
In traditional ways, adding indexes to a field with low selectivity has little effect. But why do we put a low selectivity column at the beginning of each index? The wrong idea?
We have two reasons to use this. The first reason is that every query will use sex. We can even design to allow users to search through only one sex. Importantly, adding this column will not have any drawbacks, as we still have some tricks in it.
The trick is that even if a query does not have some constraints on sex, we can also guarantee the use of the index through and sex in (' m ', ' f '). Doing so does not filter any rows. So functionally, as with statements that do not contain sex queries. However, we need to include this column because doing so will allow MySQL to use an index with a larger prefix. This technique is more appropriate for this requirement, but if the value is too different, doing so will not work well because the in () list is too large.
This example reveals a general guideline: Consider the entire table comprehensively. When you design an index, don't just think about the type of index you need for the existing statement, but also consider the optimization statement. If you need an index, but you think that some of the statements may be affected by bad, ask yourself if you can change these statements. You should optimize the statements and indexes to find a compromise solution; you don't have to design a perfect index schema.
The next thing we want to consider is the combination of other where conditions and consider which combinations will slow down without an index. Indexing on (sex,country,age) is an obvious choice, and we also need to create indexes on (sex,country,region,age) and (Sex,country,region,city,age).
This will have a lot of indexes. If we want to reuse the index and not generate too many index combinations. We can use the technique of in () and discard (Sex,country,age) and (sex,country,region,age) index. If they are not in the search form, you can specify a list of all countries or all regions of the country to ensure that the constraints are equal to the index prefix.
These indexes can meet the needs of most common queries, but how do we design some of the other less common search terms? Like Has_pictures, Eye_color, hair_color and education? If these columns are low selectivity and are not used frequently, we can not index them and let MySQL scan some extra rows. We can also do this by adding them to the age and using the In () method we used before.
Now you might find that when we create an index, we always put the age column at the end. What's the difference with this column, and why put it in the end? We want to make MySQL as much as possible with all the columns on the index, because only the leftmost prefix can be used until the first condition is found to be a range query. The other columns we have mentioned can be used in the same condition in the where, but the age condition is a range (for example, between 18-25).
We can also switch to the in () list. such as age in (18, 19, 20, 21, 22, 23, 24, 25), but this method is not suitable for this type of query. The general guideline we illustrate is to place the columns of the scope criteria at the end of the index, so the optimizer uses the index whenever possible.
We've already said that you can add more columns to the index and use the in () list to overwrite those columns that are not part of the Where condition, but you may do too much to cause the problem to occur. With too many values in in, the combination of these lists becomes huge, and the optimizer evaluates this, and it slows down the query. Consider the following statement:
WHERE eye_color IN('brown','blue','hazel')
AND hair_color IN('black','red','blonde','brown')
AND sex IN('M','F')
The optimizer converts these into 3*4*2 combinations, and the where condition must be validated for each. 24 combinations are not very extreme numbers. But be careful if that number becomes thousands. The old MySQL version has a lot of problems when it comes to handling a lot of in combinations: The query optimizer uses a lot of beforehand and consumes a lot of memory. If the combination becomes too large, the new version of MySQL will stop evaluating these combinations, but will still limit MySQL's use of indexes.