Query optimization of MySQL massive database and the scheme of paging algorithm

Last Update:2017-01-13 Source: Internet

Author: User

Tags constant

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

SELECT * FROM table1 where name= ' Zhangsan ' and TID > 10000
and implementation:
SELECT * FROM table1 where TID > 10000 and name= ' Zhangsan '
Some people do not know the execution efficiency of the above two statements is the same, because if it is simple to read from the statement, the two statements are indeed different, if the TID is an aggregation index, then the following sentence only from the table after 10,000 records to find the line , and the previous sentence to look up from the whole table to see a few name= ' Zhangsan ', and then based on the constraints of the conditions tid>10000 to provide query results.
In fact, such fears are unnecessary. There is a query analysis optimizer in SQL Server that calculates the search criteria in the WHERE clause and determines which index reduces the search space for the table scan, that is, it enables automatic optimization.
Although the query optimizer can query optimization automatically based on the WHERE clause, it is still necessary to understand the query optimizer'sworkPrinciple, if not, sometimes the query optimizer will not do a quick query according to your intent.
During the query analysis phase, the query optimizer looks at each stage of the query and determines whether it is useful to limit the amount of data that needs to be scanned. If a phase can be used as a scan parameter (SARG), it is called an optimization and can be used to quickly obtain the required data.
Sarg definition: An operation that restricts the search because it usually refers to a specific match, a match that is worth the range, or a connection of more than two conditions. The form is as follows:
Column name operator < constant or variable >
Or
< constant or variable > operator column name
The column name can appear on one side of the operator, and the constant or variable appears on the other side of the operator. Such as:
Name= ' John '
Price >5000
5000< Price
Name= ' John ' and Price >5000
If an expression does not satisfy the Sarg form, it cannot limit the scope of the search, that is, SQL Server must determine for each row whether it satisfies all the conditions in the WHERE clause. So an index is useless for expressions that do not satisfy the Sarg form.
After introducing Sarg, we came toSummaryThe experience of using SARG as well as the different conclusions encountered in practice and some of the data:

1. Whether the like statement belongs to Sarg depends on the type of wildcard character used

such as: Name like ' Zhang% ', this belongs to Sarg

And: Name like '% Zhang ', does not belong to Sarg.

The reason is that the wildcard% is not available for indexing when the string is opened.

2, or will cause full table scan

Name= ' John ' and price >5000 symbol Sarg, and: Name= ' John ' or price >5000 does not conform to SARG. Using or can cause a full table scan.

3, non-operator, function caused by not satisfied with the Sarg form of statements

Statements that do not satisfy the Sarg form are typically those that include non-operator statements, such as not,!=, <>,!<,!>, not EXISTS, no in, not, and so on, as well as functions. Here are a few examples that do not satisfy the Sarg form:

ABS (Price) <5000

Name like '% three '

Some expressions, such as:

Where Price *2>5000

SQL Server also thinks that Sarg,sql server converts this into:

Where Price &GT;2500/2

This is not recommended, however, because sometimes SQL Server does not guarantee that the conversion is completely equivalent to the original expression.

4, in the effect is quite with or

Statement:

Select * FROM table1 where tid in (2,3)

And

Select * FROM table1 where tid=2 or tid=3

Is the same, it will cause a full table scan, and its index will be invalidated if there is an index on the TID.

5, as little as possible with not

6, exists and in the implementation of the same efficiency is the same

Many of the data show that exists is more efficient than in, and should be used not to exists as much as possible instead of in. But in fact, I experimented with it and found that neither the front band nor not, the execution efficiency of the two is the same. Because of the subquery involved, we experimented with the pubs database with SQL Server. We can open the statistics I/O State of SQL Server before running.

(1) Select Title,price from titles where title_id into (select title_id from sales where qty>30)

The result of the sentence is:

Table ' Sales '. Scan Count 18, logical read 56 times, physics read 0 times, pre-read 0 times.

Table ' titles '. Scan count 1, logical read 2 times, physics read 0 times, pre-read 0 times.

(2) Select Title,price from titles where exists (select * from sales where sales.title_id=titles.title_id and qty>30)

The results of the second sentence are:

Table ' Sales '. Scan Count 18, logical read 56 times, physics read 0 times, pre-read 0 times.

Table ' titles '. Scan count 1, logical read 2 times, physics read 0 times, pre-read 0 times.

From this we can see that the execution efficiency of using exists and in is the same.

7, with the function charindex () and the preceding plus wildcard character% of like execution efficiency

Earlier, we talked about the fact that if you add a wildcard character to the like, it will cause a full table scan, so its execution efficiency is low. But some of the information said that the use of function charindex () to replace like speed will have a big upgrade, after my trial, found that this explanation is also wrong:

Select Gid,title,fariqi,reader from Tgongwen where CHARINDEX (' Criminal Investigation Detachment ', reader) >0 and fariqi> ' 2004-5-5 '

Spents: 7 Seconds, in addition: Scan count 4, logical read 7,155 times, physics read 0 times, pre-read 0 times.

Select Gid,title,fariqi,reader from Tgongwen where reader like '% ' + ' criminal Investigation detachment ' + '% ' and fariqi> ' 2004-5-5 '

Spents: 7 Seconds, in addition: Scan count 4, logical read 7,155 times, physics read 0 times, pre-read 0 times.

8, Union is not absolutely more efficient than the execution of or

We've talked about the use of or in a WHERE clause that causes a full table scan, and, generally, the information I've seen is a recommendation to use Union instead of or. As it turns out, this is true for most of these claims.

Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen where fariqi= ' 2004-9-16 ' or gid>9990000

Spents: 68 seconds. Scan count 1, logical read 404,008 times, physics read 283 times, pre-read 392,163 times.

Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen where fariqi= ' 2004-9-16 '

Union

Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen where gid>9990000

Spents: 9 seconds. Scan Count 8, logical read 67,489 times, physics read 216 times, pre-read 7,499 times.

It seems that using Union is much more efficient than using or in general.

But after the experiment, the author found that if or both of the query column is the same, then the Union and the execution speed with or is much worse, although here is the Union Scan index, and or scan is the whole table.

Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen where fariqi= ' 2004-9-16 ' or fariqi= ' 2004-2-5 '

Spents: 6423 Ms. Scan count 2, logical read 14,726 times, physics read 1 times, pre-read 7,176 times.

Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen where fariqi= ' 2004-9-16 '

Union

Select Gid,fariqi,neibuyonghu,reader,title from Tgongwen where fariqi= ' 2004-2-5 '

Spents: 11640 Ms. Scan Count 8, logical read 14,806 times, Physics read 108 times, pre-read 1144 times.

9, the field extraction according to "How much, how much" principle, avoid "select *"

Let's do a test:

Select top 10000 gid,fariqi,reader,title to Tgongwen ORDER by gid Desc

Spents: 4673 ms

Select top 10000 gid,fariqi,title to Tgongwen ORDER by gid Desc

Spents: 1376 ms

Select top 10000 Gid,fariqi to Tgongwen ORDER by gid Desc

Spents: 80 ms

From this, we have to extract a field each less, the data extraction speed will have a corresponding upgrade. The speed of ascension depends on the size of the field you have discarded.

10, COUNT (*) is not slower than count (field)

Some sources say that it is obviously less efficient than the list of names in the world to use the * to count all the columns. Such a statement is in fact unfounded. We look at:

Select COUNT (*) from Tgongwen

Spents: 1500 ms

Select COUNT (GID) from Tgongwen

Spents: 1483 ms

Select COUNT (Fariqi) from Tgongwen

Spents: 3140 ms

Select COUNT (title) from Tgongwen

Spents: 52050 ms

As you can see from the above, if you use COUNT (*) and the speed of count (primary key) are equivalent, and count (*) is faster than any other field except the primary key, and the longer the field, the slower the rollup. I think that if you use COUNT (*), SQL Server may automatically find the smallest field to summarize. Of course, if you write the count (primary key) directly, it will come more directly.

11. Ordered by clustered indexed columns the most efficient

Let's see: (GID is a primary key, Fariqi is an aggregated indexed column)

Select top 10000 gid,fariqi,reader,title from Tgongwen

Spents: 196 Ms. Scan count 1, logical read 289 times, physics read 1 times, pre-read 1527 times.

Select top 10000 gid,fariqi,reader,title to Tgongwen ORDER by GID ASC

Spents: 4720 Ms. Scan count 1, logical read 41,956 times, physics read 0 times, pre-read 1287 times.

Select top 10000 gid,fariqi,reader,title to Tgongwen ORDER by gid Desc

Spents: 4736 Ms. Scan count 1, logical read 55,350 times, physics read 10 times, pre-read 775 times.

Select top 10000 gid,fariqi,reader,title to Tgongwen ORDER by Fariqi ASC

Spents: 173 Ms. Scan count 1, logical read 290 times, physics read 0 times, pre-read 0 times.

Select top 10000 gid,fariqi,reader,title from Tgongwen ORDER BY Fariqi Desc

Spents: 156 Ms. Scan count 1, logical read 289 times, physics read 0 times, pre-read 0 times.

As we can see from the above, the speed of the unordered and the number of logical reads are equivalent to the speed of the "Order by clustered index column", but these are much faster than the "ORDER by nonclustered index column" query speed.

At the same time, according to a certain field to sort, whether the positive sequence or reverse, the speed is basically equivalent.

12, high efficiency of top

In fact, when querying and extracting very large volumes of data sets, the impactDatabaseThe biggest factor in response time is not the data lookup, but the physical i/0 operation. Such as:

Select Top * FROM (

Select top 10000 gid,fariqi,title from Tgongwen

where neibuyonghu= ' office '

ORDER BY gid Desc) as a

ORDER BY GID ASC

This statement, theoretically, the execution time of the entire statement should be longer than the execution time of the clause, but the opposite is true. Because a clause returns 10,000 records after execution, and the entire statement returns only 10 statements, the most important factor affecting database response time is physical I/O operations. Restricting physical I/O operations is the most effectiveMethodOne is to use the top keyword. The top keyword is a system-optimized word used in SQL Server to extract the first or previous percentages of data. The application of the author in practice, found that top is really good and efficient. But this word is not in another large database Oracle, which cannot be said to be a pity, although it can be solved in Oracle in other ways, such as: RowNumber. We'll use the top keyword in a future discussion of "Implementing TENS data paging display stored procedures."

So far, we've discussed how to quickly query out the data methods you need from a large-capacity database. Of course, we introduce these methods are "soft" method, in practice, we also have to consider a variety of "hard" factors, such as: Network performance, server performance, operating system performance, even network adapters, switches and so on.

Three, realize the small data quantity and the massive Data General page display stored procedure

To create a Web application, paging browsing is essential. This problem is a very common problem in database processing. The classic data paging method is: ADO records set pagination, that is, the use of ADO with the paging function (using cursors) to achieve pagination. However, this paging method applies only to situations where the data is small, because the cursor itself has a disadvantage: the cursor is stored in memory and is very time-consuming. Once the cursor is established, the associated record is locked until the cursor is canceled. Cursors provide a step-by-step scan of a particular set, typically using cursors to traverse data line by row, and to perform different operations depending on how the data is removed. For multiple tables and large tables, the cursor (large data set) loop is easy to get the program into a long wait or even crash.

More importantly, for a very large data model, it is a waste of resources to use the traditional method of loading the entire data source every time when paging is retrieved. The popular method of paging is generally to retrieve the chunk of the page size, rather than retrieving all the data, and then stepping through the current row.

The earliest way to implement this method of extracting data based on page size and page number is probably "Russian stored procedures." This stored procedure uses cursors, and because of the limitations of the cursor, this method is not universally accepted.

Later, someone on the Web modified this stored procedure, the following stored procedure is a combination of our office automation instance of the paging stored procedures:

Create procedure Pagination1

(@pagesize int,--page size, such as 20 records per page storage

@pageindex INT--current page number

)

As

SET NOCOUNT ON

Begin

Declare @indextable table (ID int identity (1,1), nid int)--Define table variables

DECLARE @PageLowerBound INT--Defines Deder for this page

DECLARE @PageUpperBound INT--Defines the top code for this page

Set @PageLowerBound = (@pageindex-1) * @pagesize

Set @PageUpperBound = @PageLowerBound + @pagesize

SET ROWCOUNT @PageUpperBound

Insert into @indextable (NID) Select GID from Tgongwen where Fariqi >dateadd (Day,-365,getdate ()) Order BY Fariqi Desc

Select O.gid,o.mid,o.title,o.fadanwei,o.fariqi from Tgongwen O, @indextable t where O.gid=t.nid

and t.id> @PageLowerBound and t.id<= @PageUpperBound ORDER by t.id

End

SET NOCOUNT OFF

The above stored procedures use the latest SQL ServerTechnology

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More