Reasonably configure MySQL cache to increase cache hit ratio

Source: Internet
Author: User

As we all know, when the system reads the data, it is hundreds of times times faster to read from memory than from the hard disk. Therefore, most applications now use the cache (a storage area in memory) to maximize the operational efficiency of the system. MySQL database is no exception. Here, the author will combine their work experience, to discuss with you, MySQL Database Cache management skills: How to properly configure the MySQL database cache, improve cache hit ratio.

When will the application get data from the cache?

When the database reads data from the server, it can fetch data from the hard disk's data file, or it can read from the database cache. Now what the database administrator needs to figure out is, in what case, the system reads the data from the cache instead of reading it from the hard disk's data file?

Simply put, the data cache is a piece of memory that stores the user's SQL text and the associated query results. Typically, the next time the user queries, if the SQL text used is the same, and since the last query, the relevant record has not been updated, the database will be directly in the cache content. From this principle, you can see that if you want to use the data in the cache directly, at least the following conditions are met.

The first is that the SQL text used is the same. After the current two times the user has used the same SQL statement (assuming that other conditions are not considered), the server reads the result from the cache without having to parse and execute the SQL statement. It is important to note that the SQL text here must be exactly the same once. If the query is two times, different query criteria are used. The WHERE condition statement is not entered as the first query. Later, the data volume was found to be too large, using the Where condition to filter the query results. At this point, even if the final query results are the same, the system still fetches the data from the data file, not from the data cache. Again, the name of the field used after select must be the same. If there is a different field name or the number of fields used by the previous two queries, the system will consider it to be a different SQL statement and re-parse and query.

Second, from the point of view of data caching, capitalization is not sensitive. If you have two queries before and after, the field names used may differ only by case. If the first use is size, the second is lowercase, and the system thinks it is still the same SQL statement. or keyword capitalization, and so it's not sensitive.

Three is to meet two times between queries, data records including the table structure have not been changed. If the label of the record changes, such as adding a field, and so on, all buffered data systems using the table are automatically emptied. Note here that the change referred to here is a generalized change, including any data in the table or changes to the result. To give a simple example, the first time you query, users need to query the shipment data for 2010 years. After the query, a user inserted a January 2011 shipping information in this table. Then there are users who need to check shipping information for 2010 years. The SQL statement used is exactly the same as the first query. In this case, does the database system use the data in the cache? The answer is No. Because when an intermediate user inserts a record, all cache records associated with the table are automatically emptied. The cache information for this table is not already in the cache when the second query is made. At this point, you need to re-parse and query.

Four is to note the effect of the default character set on cache hit ratios. Typically, if the default character set that is used between the client and the server is different, the system still considers the query to be different, even if the query statement is the same, the record between the two queries and the table structure are not changed. This requires special attention, which is more easily overlooked.

Second, improve the cache hit ratio recommendations.

As can be seen from the above condition analysis, the use of data in the cache has more stringent conditions. In fact, these conditions are reasonable. The main purpose is to ensure the consistency of data. After a deep understanding of these conditions, what the database administrator needs to consider now is how to increase the hit rate of this cache? The author has the following several suggestions.

One is that the same character set is used by the client and server side when configuring. If the client (or third-party tool) differs from the character set used on the server side, the caching feature is not used under any circumstances. Especially in China, the Chinese character set is needed. In particular, it is important to note that the client default character set is the same as the default character set on the server side. Note that here is the same, not compatible. Sometimes even with a different character set, the client can still display normally. This is mainly because some character sets are not the same, but are mutually compatible. In cache management, the need for the same, optical compatibility is not.

The second is on the client, to solidify the query statement. If there are financial personnel and procurement personnel at the same time from the system to query the November shipment data. Obviously they have different job responsibilities, and the content of the required fields is different. At this point in the client, you can allow the user to set the form format that they want. However, I recommend that the SQL statements used in the background should be the same. Here the data will go through three channels: background database, client, user. When I am conscious, the interaction between the background database and the client takes the same SQL statement. Then, when the client interacts with the user, the data is displayed to the user based on user-defined formats, including the arrangement before and after the fields, and the differences that do not include the query condition statements. The query efficiency of the application system can be improved by using the same SQL statement (except that the user has different requirements for the display format).

The third is to improve the in-memory cache configuration to improve the hit ratio. Typically, when the server starts, the operating system negotiates the size of the cache space with the database software. When the cache is running low, the oldest cache record in the cache is overwritten by the most recent message. It can be seen that increasing the cache space increases the hit rate. This is like shooting, the target is more, the chance of hitting a lot higher. However, the more concurrent the user, the less obvious the effect of this setting will be.

Four is to increase the cache hit rate by partitioning the table. In the above condition analysis, you can see that the system will empty the cached records as long as a record is inserted into the queried table. Now take the example of checking shipping records. The shipping records are updated daily, and users will often need to check the previous year's shipping records at the beginning of the day. As the data in this table is updated every hour, the information in the cache is constantly being kept in context. The cache hit rate is obviously not very high at this point. In view of this situation, the author suggests that a partitioned table can be used. If you can use the system settings, the 2010 shipping records are stored separately in a shipping partition table. That is, a separate partition table is used for each year. This 2011-year record will not affect the 2010 partition table. At this point, if the user repeatedly query the shipment information for 2010, as long as they use the same SQL statement (no different query criteria), then you can enjoy the benefits of caching mechanism, improve the application system query effect.

Three, the impact of multiple applications on the cache.

Typically, the cache for the MySQL database is automatically allocated based on the size of the server's memory. If there is only one MySQL application on a single server, it is best. However, in the actual work, in order to reduce the cost of information-based investment, often in the same server to decorate a number of information applications. Because other informational applications also need to use memory space as a cache, the cache space in the MySQL database may be smaller. If this is the case, the database administrator needs to negotiate with the system engineer to manually set different cache space for different applications depending on the performance requirements. In this case, you can avoid the same server on the different information application of the cache conflict.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.