background:
One of their own site, because the single-table data record up to 1 million, resulting in slow data access, the background of Google Analytics often report timeouts, especially pages with large page is slow.
test Environment:
First let's familiarize ourselves with the basic SQL statements to see the basic information we will be testing the table
Use Infomation_schema
SELECT * from TABLES WHERE table_schema = ' dbname ' and table_name = ' product '
Query Result:
From here we can see the basic information of the table:
Number of table rows: 866633
Average data length per row: 5133 bytes
Single Table size: 4448700632 bytes
The units of the row and table size are bytes, which we can calculate to know
Average line length: approximately 5k
Single table total size: 4.1g
The fields in the table have varchar, datetime, text, and so on, and the ID field is the primary key.
Test Experiment 1. Using limit start directly, count paging statement, is also used in my program:
SELECT * from product limit start, count
When the start page is small, the query has no performance problems, we look at 10, 100, 1000, 10000 to start the paging execution time (20 per page), as follows:
SELECT * FROM Product limit 10, 20 0.016 seconds
SELECT * FROM product limit 100, 20 0.016 seconds
SELECT * FROM product limit 1000, 20 0.047 seconds
SELECT * FROM Product limit 10000, 20 0.094 seconds
We have seen that as the start record increases, the time also increases, which means that the paging statement limit is very much related to the starting page number, then we change the start record to 40w (that is, the average of records) SELECT * From product limit 400000, 20 3.229 seconds
And look at the time we took the last page to record.
SELECT * FROM Product limit 866613, 20 37.44 seconds
No wonder the search engine crawled our page often reported timeouts, such as the page of the largest page pages obviously this time
The room is unbearable.
There are two things we can sum up:
1) The query time of the limit statement is proportional to the position of the starting record
2) MySQL limit statement is very convenient, but for many records of the table is not suitable for direct use.
2. Performance optimization approach to limit paging problem
Speed up paged queries with table overlay indexes
As we all know, if you only include that index column (overwriting index) in the statement that uses the index query, then this will query quickly.
Because the index is used to find the optimization algorithm, and the data on the query index, no need to find the relevant data address, which saves a lot of time. In addition, MySQL also has the relevant index cache, in high concurrency when the use of caching is better.
In our case, we know that the ID field is the primary key and naturally contains the default primary key index. Now let's look at how the query using the overlay index works:
This time we are querying the last page of data (using the overwrite index, which contains only the ID column), as follows:
Select ID from product limit 866613, 20 0.2 seconds
Increases the speed of about 100 times relative to 37.44 seconds of querying all columns
So if we also want to query all the columns, there are two ways, one is the form of id>=, the other is to use join, to see the actual situation:
SELECT * FROM Product WHERE ID > = (select ID from Product limit 866613, 1) limit 20
Query time is 0.2 seconds, is a qualitative leap ah, haha
Another way of writing
SELECT * FROM Product a JOIN (select id from product limit 866613, c) b on a.id = b.ID
Query time is also very short, praise!
In fact, both use a principle, so the effect is similar
MySQL single table million data record paging performance optimization, reproduced