MySQL limit vs. paging optimization

Source: Internet
Author: User

When paging is required in the system, we usually use the limit plus offset method, plus the appropriate ORDER BY clause. If there is a corresponding index, usually the efficiency is good, otherwise, MySQL will need to do a lot of file sorting operations.

A very common and troublesome problem is that when the offset is very large, such as may be the limit 10000,20 query, when MySQL needs to query 10,020 records only return the last 20, the previous 10,000 records will be discarded, this is very expensive. If all pages are accessed in the same frequency, then the average query requires access to half of the table's data. To optimize this query, either limit the number of pages in the page or optimize the performance of large offsets.

One of the simplest ways to optimize this type of paging query is to use the index overlay scan as much as possible instead of querying all of the columns. Then do an association operation as needed and return the required columns. For a large offset, this efficiency can be greatly improved. Consider the following query:

SELECT film_id, description from Sakila.film ORDER by title LIMIT 50, 5;

If the table is very large, then this query is best rewritten as follows:

SELECT film.film_id, film.description from Sakila.film
INNER JOIN (
SELECT film_id from Sakila.film ORDER by title LIMIT 50,5
) as Lim USING (film_id);

This "delayed association" will greatly improve query efficiency, which allows MySQL to scan as few pages as possible, get the records that need to be accessed, and then follow the associated columns back to the original table to query all the columns needed. This technique can also be used to optimize the limit clause in an associated query.

Sometimes it is possible to convert the limit query to a known location query, allowing MySQL to get the corresponding results through range scanning. For example, if you have an index on a location column and the boundary value is calculated beforehand, the query above can be rewritten as:

SELECT film_id, description from Sakila.film
WHERE position between and the ORDER by position;

The problem of ranking data is similar, but it is often mixed with group by. In this case, it is usually necessary to pre-calculate and store the rank information.

The problem with limit and offset is actually offset, which causes MySQL to scan a large number of unwanted rows and discard them. If you can use bookmarks to record where you last fetch data, you can avoid using offset the next time you start scanning directly from where the bookmark is recorded. For example, if you need to flip through a rental record, you can follow the latest rental record backwards, which is possible because the primary key for the rental record is monotonically growing. First, use the following query to get a set of results:

SELECT * from Sakila.rental ORDER by rental_id DESC LIMIT 20;

Assuming that the above query returns a lease record with a primary key of 16049 to 16030, then the next page of the query can start at the point of 16030:

SELECT * from Sakila.rental WHERE rental_id < 16030
ORDER by rental_id DESC LIMIT 20;

The advantage of this technique is that it will perform well regardless of the page-turn.

Other optimizations include using a pre-computed summary table, or correlating to a redundant table that contains only primary key columns and columns of data that need to be sorted.

When paging, another common technique is to add the sql_calc_fount_rows hint (hint) to the limit statement so that you can get the number of rows that are satisfied after the limit is removed, so it can be used as the total number of pages. It seems that MySQL has made some very "advanced" optimizations, such as predicting the total number of rows by some means. But in fact, MySQL knows the number of rows only after scanning all the rows that meet the criteria, so with this hint, MySQL scans all rows that meet the criteria, regardless of whether it is needed, and then discards the rows that are not needed, instead of terminating the scan after the number of rows for limit is satisfied. So the cost of this hint can be very high.

A better design is to replace the specific pages with the "next page" button, assuming that each page shows 20 records, then each time we query is the limit to return 21 records and only 20, if the 21st exists, then we will display the "next page" button, otherwise it means no more data, There is no need to display the "next page" button.

Another approach is to get and cache more data first, for example, Cache 1000, and then each time the paging is fetched from this cache. Doing so allows the application to take different policies based on the size of the result set, and if the result set is less than 1000, all the paging links can be displayed on the page because the data is in the cache, so there is no problem with performance. If the result set is greater than 1000, you can design an additional button on the page that has more than 1000 results found. Both of these strategies are much more efficient than discarding the unwanted data every time a full result set is generated.

Sometimes you might also consider using the values of rows in the results of explain as an approximation of the total number of result sets (in fact, Google's total number of search results is an approximation). When precise results are required, and then count (*) is used alone to meet the requirements, it is often much faster to use an index overlay scan than sql_calc_found_rows.

MySQL limit vs. paging optimization

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.