Mysql Index Case Study

Source: Internet
Author: User
Tags mysql index

The best way to understand the index is to combine examples, so here's a case for an index.

Suppose to design an online dating site, the user Information table has many columns, parcel country, region, city, gender, eye color, and so on. Complete must support the above features of the various combinations to search for users, not some of the user's last online time, other members of the user's screen and other users to sort and limit the results. How does the world index meet the above load requirements?

Surprisingly, the first thing to consider is whether you want to use an index to sort or retrieve the data before sorting. Using index sorting severely restricts the design of indexes and queries. For example, if you want to use an index to sort users based on other members ' ratings, age between and 25 in the Where condition cannot use the index. If MySQL uses an index for a range query, it can no longer be sorted using another index (or subsequent fields of that index). If this is a very common where condition, then of course we will think that many queries need to do sort operations.

Supports multiple filter conditions

Now you need to look at those columns that have a lot of different values, and those columns are the most cumbersome to appear in the WHERE clause. The selectivity of creating indexes on columns with more different values is better. This is generally true because it allows MySQL to filter out unwanted rows more efficiently.

The selectivity of country is usually not high, but many queries may be used. The selectivity of the sex column is certainly low, but it is also used in many queries. The index takes into account the frequency of use or the suggestion to create different combinations so you speak (sex,country) as a prefix.

But according to traditional experience, isn't it supposed that you shouldn't create an index on a column that is selectively? So why do we need to use a field with a low-song-song selectivity as the index prefix column? Our brains are broken?

Of course our brains are not bad. There are two reasons to do this: lower, as described in the preceding example, almost all queries will use the sex column. As mentioned earlier, almost every query will use the sex column, set the Web site design so that each time can only follow a certain name search users. More importantly, the addition of this column to the index is not crazy, even if the query does not use the sex column can be bypassed by the following tricks.

The trick is that the query is not restrictive, so you can add and sex in (' f ', ' m ') to the query criteria. To let MySQL select the index. This does not filter out how it is done, and returns the same results without this condition. However, this condition must be added in order for MySQL to match the leftmost prefix of the index. This trick is very effective in this kind of scenario, but if there are too many different values, it makes the in () list too long, so it doesn't work.

This case shows a level principle: consider that the table is known as an option. At the time of the machine index, do not expect existing queries to consider what needs to be considered, but also to consider optimizing the query. If you find that some queries need to create a new index, but this index discount reduces the efficiency of other queries, then you should think about the release to optimize the original query, you should think about the release can optimize the original query. Queries and indexes should be optimized at the same time to find the cheapest balance, rather than to design the perfect index behind closed doors.

Next, you need to consider the other common where combinations, and you need to know which combinations are slow in the absence of a suitable index. The index on the (sex,country,age) is an obvious choice, and the constant friend may also need a combined index (sex,country,region,age) and (Sex,country,region,city,age).

This will take a large number of indexes. If you want to reuse the index as much as possible instead of creating a large number of index combinations, you can use the previously mentioned in () to subtly avoid the need for both (Sex,country,age) and (sex,country,region,age) indexes. If you do not specify this field, you need to define a list of all countries, or a list of all regions of the country, to ensure that the prefix index has the same constraints (combining all countries, regions, genders) will be a very large condition.

These indexes will satisfy most of the most common queries, but how to design an index for some of the things that are afraid to say, such as has_pictures,eye_color,hair_color and education. These columns are highly selective and not frequently used, so you can choose to ignore them and let MySQL scan for additional lines. Another alternative is to add these columns to the front of the Age column and use the In () technique mentioned earlier in the query to deal with the scenarios that are not specified.

As you may have noticed, we always say that the age column is placed on the last side of the index. Is there any special place in the age column? Why put it at the end of the index? We always try to make MySQL use more indexed columns as much as possible, because only the leftmost prefix of the index is used in the query until the first range condition column is encountered. The previously mentioned column is equal to the condition in the WHERE clause, but the age column is mostly a range query (for example, a person between 18-25 years old).

You can also use in instead of a range query, such as changing the age condition to in (18,19...25), but not all queries can be converted. The principle of the number of seconds here is that the pit can place the columns that need to be queried in the last side of the index so that the Huqiu can use more indexed columns.

As mentioned earlier, you can add more columns to the index and overwrite the columns in the WHERE clause by means of in (). But this skill can not be abused, otherwise it may cause trouble. Because each additional in () condition is added, the combination that the optimizer needs to do will increase exponentially, and may eventually significantly degrade query performance.

Consider the following WHERE clause:

WHERE Eye_color in (' Brown ', ' Blue ', ' Hazel ') and HAIR_COLR in (' Black ', ' red ', ' blonde ', ' borwn ') and sex in (' f ', ' m ') optimizer will turn 3 *4*2=24 combinations, the execution plan requires all 24 combinations of the spread where clause. For MySQL, 24 combinations are not exaggerated, but if you have thousands of combinations, you need to be very careful. The Boss? MySQL has many problems when the in () combination condition is too large. Query optimization can take a lot of time and consume a lot of memory. The new version of MySQL no longer performs a plan evaluation after a certain number of combinations, and may result in MySQL not using the index well.

Avoid multiple range conditions

Let's say we have a last_online column and want to show the users that have been capped over the past few weeks by the following query:

WHERE Eye_color in (' Brown ', ' Blue ', ' Hazel ')

and HAIR_COLR in (' Black ', ' red ', ' blonde ', ' borwn ')

and sex in (' f ', ' m ')

and Last_online > Date_sub (Now (), INTERVAL 7 day)

and age between and 25

There is a problem with this query: He has two range conditions last_online and age columns, MySQL can use the Last_onlie column index or the Age column index, but cannot use them at the same time.

If there are only last_onlie in the condition and no age, then we might consider adding the Last_onlie column after the index. Here's what to do if we can't convert the age field to an in () list, and the person asks for a range query with two dimensions of both Last_onlie and age. The answer is, unfortunately, there is no direct way to solve the problem. But we can convert one of the range queries into a simple equivalent comparison. To achieve this, we need to calculate the good one active column in advance, which is maintained by the city task. Each time the party user logs on, the corresponding value is set to 1, and the last 7 consecutive days of the user has not logged on the value of zero.

This method allows MySQL to use the (active,sex,country,age) index. Active is not completely accurate, but for this query, the accuracy requirements are not that high. If you need accurate data, you can put the Last_online column in the WHERE clause, but not into the index. This is similar to a quick lookup that previously implemented URLs by calculating Urlhash values. Therefore, this query cannot use any indexes, but because this condition is not highly filtered, it does not help much even if the column is added to the index. In other words, the lack of a suitable index has no significant impact on the query.

To the current location, we can see that if the user wants to see both active and inactive users, you can use the in () list in the query. We've added a lot of these lists, but another alternative scenario is to create a separate index with a bit of different combined columns. At a minimum, the following indexes (Active,sex,country,age), (Active,country,age), (Sex,country,age) and (country,age) need to be established. These indexes may be more optimized for a specific query , but given the maintenance of indexes and the cost of additional space, this option is not a good strategy.

In this case, the characteristics of the optimizer are a very important factor affecting the indexing strategy. If the future of MySQL can achieve a loose index scan, you can use multiple scope conditions on an index, it does not need to consider this type of use in () list.

Optimize sorting

In this study case, the final introduction is the sort. Using file sorting is fast for small datasets, but what happens if one query results in millions of rows? For example, WHERE clause is only the sex column, how to sort?

For those columns with very low selectivity, you can add some special indexes to sort them. For example, you can create a (sex,rating) index for the following query:

Selecct <cols> from Profiles WHERE sex= ' M ' ORDER by rating limit 10;

This query uses both order by and limit, which can be slow if there is no index.

Even if there is an index, the query can be very slow if you need to turn the page on the yoghurt interface, and when you page back to the comparison.

The following query can be used to page through the combination of order by and limit offsets to a later point.

Selecct <cols> from Profiles WHERE sex= ' M ' ORDER by rating limit 100000, 10;

Regardless of whether you create an index, this query is a serious problem. Because as the offset increases, MySQL takes a lot of time to scan for data that needs to be discarded. Inverse normalization, pre-calculation, and caching are probably only strategies for resolving such queries. A better approach is to limit the number of pages a user can turn, which is actually less of a user's experience because users rarely really care about the 10000 of pages that have been searched.

Another good strategy for optimizing such indexes is to use deferred correlation, which returns the required primary key by using the abdominal index query, and the rows required to correlate the original table against these primary keys. This reduces the number of rows that MySQL scans that need to be discarded. The following query shows how to efficiently use (sex,rating) indexes for sorting and paging

Select <cols> from Profiles INNER JOIN (select <primary key cols> from profiles WHERE x.sex = ' M ' ORDER by rat ing 10000,10) as X using (<primary key cols>);

  

  

Mysql Index Case Study

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.