MySQL LIMIT and paging optimization (continued) _ MySQL

Source: Internet
Author: User
MySQL LIMIT and paging optimization (continued) bitsCN.com
When paging is required in the system, we usually use the LIMIT plus offset method and add the appropriate order by clause. If there is a corresponding index, the efficiency is usually good. otherwise, MySQL needs to perform a lot of file sorting operations.

A very common and troublesome problem is that when the offset is very large, for example, a query like LIMIT and 20, MySQL needs to query 10020 records and then only returns the last 20 records, the first 10000 records will be discarded at a very high cost. If all pages are accessed at the same frequency, such a query needs to access the data of half a table on average. To optimize such queries, either limit the number of pages on the page or optimize the performance of large offsets.

One of the simplest ways to optimize such paging queries is to use indexes to overwrite the scans as much as possible, rather than Querying all columns. Then, perform an association operation as needed to return the required columns. When the offset is large, the efficiency will be greatly improved. Consider the following query:

SELECT film_id, description FROM sakila. film order by title LIMIT 50, 5;

If the table is very large, it is best to rewrite the query as follows:

SELECT film. film_id, film. description FROM sakila. film
Inner join (
SELECT film_id FROM sakila. film order by title LIMIT 50, 5
) AS lim USING (film_id );

The "delayed Association" here will greatly improve the query efficiency. it allows MySQL to scan as few pages as possible, obtain the records to be accessed, and then return the associated columns to all the columns required for the original table query. This technique can also be used to optimize the LIMIT clause in Association queries.

Sometimes, you can convert a LIMIT query to a query at a known location, so that MySQL can obtain the corresponding results through range scanning. For example, if an index exists in a column and the boundary value is calculated in advance, the preceding query can be rewritten:

SELECT film_id, description FROM sakila. film
WHERE position BETWEEN 50 AND 54 order by position;

The problem of ranking data is similar, but it is often used together with group. In this case, you usually need to calculate and store the ranking information in advance.

The LIMIT and OFFSET problems are actually OFFSET problems, which will cause MySQL to scan a large number of unwanted rows and then discard them. If you can use bookmarks to record the location where the data was last retrieved, you can start scanning directly from the location of the bookmarked record next time to avoid using OFFSET. For example, if you need to flip pages based on the lease record, you can trace back based on the latest lease record. this is feasible because the primary key of the lease record increases monotonically. First, use the following query to obtain a set of results:

SELECT * FROM sakila. Marshal order by marshal_id desc limit 20;

Assume that the query above returns a lease record with a primary key of 16049 to 16030, then the next query can start from the point 16030:

SELECT * FROM sakila. Marshal WHERE marshal_id <16030
Order by limit al_id desc limit 20;

The advantage of this technology is that its performance will be good no matter the pages are displayed.

Other optimization methods include using a pre-calculated summary table or associating it with a redundant table. a redundant table only contains the primary key column and the data column to be sorted.

When paging, another common technique is to add the SQL _CALC_FOUNT_ROWS prompt (hint) to the LIMIT statement, so that you can obtain the number of rows that meet the condition after removing the LIMIT, so it can be used as the total number of pages. It seems that MySQL has made some very "advanced" optimizations, such as using some method to predict the total number of rows. But in fact, MySQL only knows the number of rows after scanning all rows that meet the conditions. Therefore, after this prompt is added, MySQL will scan all rows that meet the conditions whether or not required, then, discard unnecessary rows instead of terminating the scan after the number of LIMIT rows is satisfied. Therefore, the cost of this prompt may be very high.

A better design is to replace the page number with the "next page" button. assume that 20 records are displayed on each page, then, when we query each time, LIMIT returns 21 records and only shows 20 records. if 21st records exist, the "next page" button is displayed, otherwise, it means that there is no more data and there is no need to display the "next page" button.

Another approach is to first obtain and cache a large amount of data, for example, cache 1000 records, and then retrieve them from the cache each time the page is clicked. In this way, the application can adopt different policies based on the size of the result set. if the result set is less than 1000, all the paging links can be displayed on the page because the data is cached, therefore, the performance will not be faulty. If the result set is larger than 1000, you can design an additional button on the page, such as "more than 1000 results found. These two policies are much more efficient than generating all result sets and then throwing away unwanted data.

You can also consider using the value of the rows column in The EXPLAIN results as an approximate value of the total number of result sets (in fact, the total number of Google search results is also an approximate value ). When accurate results are required, COUNT (*) is used separately to meet the requirements. If index overwrite scanning can be used, it is usually much faster than SQL _CALC_FOUND_ROWS.

BitsCN.com

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.