SQL Server Index Structure and usage (III)

Source: Internet
Author: User

SQL Server Index Structure and usage (III)

Author: freedk

I. An in-depth understanding of the index structure
Ii. Improve SQL statements

General paging display and storage process for small data volumes and massive data

Creating a Web application requires paging. This problem is very common in database processing. The typical data paging method is the ADO record set paging method, that is, using the paging function provided by ADO (using the cursor) to implement paging. However, this paging method is only applicable to small data volumes, because the cursor itself has a disadvantage: the cursor is stored in the memory and is very memory-intensive. When the game tag is set up, the related records are locked until the cursor is canceled. A cursor provides a method to scan data row by row in a specific set. Generally, a cursor is used to traverse data row by row and perform different operations based on different data conditions. For the multi-table and Big Table-defined cursors (large data sets) loop, it is easy for the program to enter a long wait or even crash.
More importantly, for a very large data model, it is a waste of resources to load the entire data source every time during paging retrieval. Currently, the popular paging method is to retrieve data in the block area of the page size, instead of all the data, and then execute the current row in one step.
The earliest method to extract data based on the page size and page number is probably the "Russian stored procedure ". This stored procedure uses a cursor. Due to the limitations of the cursor, this method has not been widely recognized.
Later, someone modified the stored procedure on the Internet. The following stored procedure is a paging stored procedure written in conjunction with our office automation instance:

CREATE procedure pagination1 (@ pagesize int, -- page size, such as storing 20 records per page @ pageindex int -- current page number) asset nocount onbegindeclare @ indextable table (id int identity ), nid int) -- Define table variable declare @ PageLowerBound int -- Define the bottom code of this page declare @ PageUpperBound int -- Define the top code set @ PageLowerBound = (@ pageindex-1) of this page) * @ pagesizeset @ PageUpperBound = @ PageLowerBound + @ pagesizeset rowcount @ PageUpperBoundinsert into @ indextable (nid) select gid from TGongwen where fariqi> dateadd (day,-365, getdate ()) order by fariqi descselect O. gid, O. mid, O. title, O. fadanwei, O. fariqi from TGongwen O, @ indextable t where O. gid = t. nid and t. id> @ PageLowerBound and t. id <= @ PageUpperBound order by t. idendset nocount off

The above stored procedures use the latest SQL SERVER technology-Table variables. It should be said that this stored procedure is also a very good paging stored procedure. Of course, in this process, you can also write the TABLE variables as temporary tables: create table # Temp. But it is obvious that in SQL SERVER, using temporary tables does not use Quick table variables. So when I first started using this stored procedure, I felt very good and the speed was better than that of the original ADO. But later, I found a better method than this method.
The author once saw a short article on the Internet, "how to retrieve records from the data table from the nth to the nth". The full text is as follows:

Retrieve the n to m records FROM the publish table: select top m-n + 1 * FROM publish WHERE (id not in (select top n-1 id FROM publish )) keywords with id publish table

When I saw this article at the time, I was truly inspired by the spirit and thought that the idea was very good. Later, I was working on an office automation system (ASP. NET + C # + SQL server), suddenly remembered this article, I think if you modify this statement, this may be a very good paging storage process. So I searched for this article on the Internet. I did not expect this article to be found, but I found a paging storage process written according to this statement, this storage process is also a popular paging storage process. I regret that I didn't rush to transform this text into a storage process:

Create procedure pagination2 (@ SQL nVARCHAR (4000), -- SQL statement without sorting statement @ Page int, -- Page number @ RecsPerPage int, -- number of records per page @ id varchar (255), -- Non-repeated idnumber @ Sort VARCHAR (255) -- sorting field and rule) ASDECLARE @ Str nVARCHAR (4000) SET @ Str = ''select top'' + CAST (@ RecsPerPage as varchar (20 )) + ''' * FROM (''+ @ SQL +'') T WHERE T. ''+ @ ID +'' not in (select top ''+ CAST (@ RecsPerPage * (@ Page-1) as varchar (20 )) + ''' + @ ID + ''' FROM (''+ @ SQL +'') T9 ORDER BY ''+ @ Sort + '') order by ''+ @ SortPRINT @ StrEXEC sp_ExecuteSql @ StrGO

In fact, the preceding statement can be simplified:

Select top page size * FROM Table1 WHERE (id not in (select top page size * Page id FROM Table order by id) ORDER BY ID

However, this stored procedure has a fatal drawback, that it contains not in words. Although I can transform it:

Select top page size * FROM Table1 WHERE not exists (select * from (select top (page size * page size) * from table1 order by id) B where B. id =. id) order by id

That is to say, not exists is used to replace not in, but we have already discussed that there is no difference in the execution efficiency between the two. IN this case, the combination of TOP and not in is faster than using a cursor.
Although using not exists does not save the efficiency of the last stored procedure, using TOP keywords in SQL SERVER is a wise choice. Because the ultimate goal of paging optimization is to avoid generating too many record sets, we have mentioned the TOP advantage in the previous sections. Using TOP, we can control the data volume.
IN paging algorithms, there are two key factors that affect the query speed: TOP and not in. TOP can increase our query speed, while not in will slow down our query speed. Therefore, to increase the speed of our entire paging algorithm, we need to completely transform not in, replace it with other methods.
We know that we can use max (field) or min (field) to extract the maximum or minimum values of almost any field, so if this field is not repeated, then, we can use the max or min of these non-repeated fields as the watershed to make them a reference object for separating each page in the paging algorithm. Here, we can use the operator ">" or "<" to accomplish this mission, so that the query statement conforms to the SARG format. For example:

Select top 10 * from table1 where id>200

The following paging solution is available:

Select top page size * from table1 where id> (select max (id) from (select top (page number-1) * page size) id from table1 order by id) as T) order by id

When selecting a column with no repeated values and easy to tell the size, we usually select a primary key. The following table lists the tables in the office automation system with 10 million data .) For sorting columns and extracting gid, fariqi, and title fields, take pages 1st, 10, 100, 500, 1000, 10 thousand, 0.1 million, and 0.25 million as examples, test the execution speed of the preceding three paging schemes: (unit: milliseconds)

Page number Solution 1 Solution 2 Solution 3
1 60 30 76
10 46 16 63
100 1076 720 130
500 540 12943 83
1000 17110 470 250
10000 24796 4500 140
100000 38326 42283 1553
250000 28140 128720 2330
500000 121686 127846 7168

From the table above, we can see that the three stored procedures can be trusted when executing paging commands below 100 pages, and the speed is good. However, in the first solution, after more than 1000 pages are executed, the speed will decrease. The second solution is that the speed starts to decrease after more than 10 thousand pages are executed. However, the third solution has never been greatly downgraded, And the stamina is still very strong.
After determining the third paging scheme, we can write a stored procedure accordingly. As you know, the stored procedure of SQL server is compiled in advance, and its execution efficiency is higher than that of SQL statements sent through WEB pages. The following stored procedure not only contains the paging scheme, but also determines whether to make statistics on the total number of data based on the parameters sent from the page.

-- Get the data of the specified page:

Create procedure pagination3 @ tblName varchar (255), -- table name @ strGetFields varchar (1000) = ''', -- columns to be returned @ fldName varchar (255) = ''', -- Name of the sorted field @ PageSize int = 10, -- page size @ PageIndex int = 1, -- page number @ doCount bit = 0, -- total number of returned records, if the value is not 0, @ OrderType bit = 0 is returned. -- set the sorting type. If the value is not 0, @ strWhere varchar (1500) = ''' is returned in descending order. (Note: do not add where) ASdeclare @ strSQL varchar (5000) -- subject sentence declare @ strTmp varchar (110) -- temporary variable declar E @ strOrder varchar (400) -- sorting type if @ doCount! = 0 beginif @ strWhere! = ''' Set @ strSQL = "select count (*) as Total from [" + @ tblName + "] where" + @ strWhereelseset @ strSQL = "select count (*) as Total from ["+ @ tblName +"] "end

-- The above Code indicates that if @ doCount is not passed over 0, the total number of statistics will be executed. All the following code is 0 @ doCount:

elsebeginif @OrderType != 0beginset @strTmp = "<(select min"set @strOrder = " order by [" + @fldName +"] desc"

-- If @ OrderType is not 0, execute the descending order. This sentence is very important!

endelsebeginset @strTmp = ">(select max"set @strOrder = " order by [" + @fldName +"] asc"endif @PageIndex = 1beginif @strWhere != '''' set @strSQL = "select top " + str(@PageSize) +" "+@strGetFields+ "        from [" + @tblName + "] where " + @strWhere + " " + @strOrderelseset @strSQL = "select top " + str(@PageSize) +" "+@strGetFields+ "         from ["+ @tblName + "] "+ @strOrder

-- Execute the above Code on the first page, which will speed up the execution.

endelsebegin

-- The following code gives @ strSQL the SQL code to be actually executed
 

set @strSQL = "select top " + str(@PageSize) +" "+@strGetFields+ " from ["+ @tblName + "] where [" + @fldName + "]" + @strTmp + "(["+ @fldName + "])       from (select top " + str((@PageIndex-1)*@PageSize) + " ["+ @fldName + "]       from [" + @tblName + "]" + @strOrder + ") as tblTmp)"+ @strOrderif @strWhere != ''''set @strSQL = "select top " + str(@PageSize) +" "+@strGetFields+ " from ["+ @tblName + "] where [" + @fldName + "]" + @strTmp + "(["+ @fldName + "]) from (select top " + str((@PageIndex-1)*@PageSize) + " ["+ @fldName + "] from [" + @tblName + "] where " + @strWhere + " "+ @strOrder + ") as tblTmp) and " + @strWhere + " " + @strOrderend end exec (@strSQL)GO

The above stored procedure is a general stored procedure, and its annotations have been written in it. In the case of large data volumes, especially when querying the last few pages, the query time generally does not exceed 9 seconds. Other stored procedures may cause timeout in practice, therefore, this stored procedure is very suitable for queries of large-capacity databases. I hope that through the analysis of the above stored procedures, we can provide some inspiration and improve the efficiency of our work. At the same time, I hope our peers can propose better real-time data paging algorithms.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.