Select returns the order of records

Source: Internet
Author: User
Tags format contains execution filegroup insert sql sort sybase
Select
Select returns the order of records

Zhongxing Communication Chongqing Institute Rempo Wu Yuhong

Keywords: SELECT, order, optimize, backup, scan, index

Article Summary:

When we execute the SELECT statement, the order of the records returned by the Select has a significant effect on how we are programmed, and there is a significant relationship between database record backup cleanup and SQL performance optimization. It is therefore necessary to specify the order in which the select Returns records. This paper discusses the order of Oracle/sybase/sql server return records according to the database classification, discusses the characteristics of three kinds of databases from the principle, and probes into the effects of these differences on data query and record backup.

Abbreviations:

Iam:index Allocation Map

Pfs:page Free Spaces
1. Introduction
When we execute the SELECT statement, the order of the records returned by the Select has a significant effect on how we are programmed, and there is a significant relationship between database record backup cleanup and SQL performance optimization. It is therefore necessary to specify the order in which the select Returns records.

The order of the Select Return records has a lot to do with the database type, so the following are discussed separately by database type. This paper mainly discusses the sequence of Oracle/sybase/sql server Return records, discusses the characteristics of three kinds of databases from the principle, and emphatically probes into the effects of these differences on data query and record backup.
2. Oracle
The following assumes that database query optimization is based on rule, and that ORACLE takes two ways to access records in a table:

A. Full table scan (Scan)

A full table scan is a sequential access to each record in the table. Oracle optimizes full table scans in a way that reads multiple data blocks (database block) at a time.

B. Access to tables via ROWID

You can use ROWID access mode to improve the efficiency of the Access table, ROWID contains the physical location information recorded in the table. Oracle uses indexes (index) to achieve the link between the data and the physical location (ROWID) where the data resides. Often indexes provide a quick way to access rowid, so queries based on indexed columns can improve performance. This is usually performed by index scan. (Index Scan)
2.1 Full Table Scan
If the SELECT statement cannot use an index, Oracle reads the data block as a whole table scan, and for the returned result set, Oracle returns the record in rowID order of magnitude. So the select * from MyTable is the same as the SELECT * FROM MyTable ORDER by rowID effect

The ROWID pseudo column can be obtained from the Select rowID from table, and the data type is ROWID type. The Rowid Extended format (Extended Rowid) is returned using the query statement. The rowid of the extended format is made up of 18 characters. These 18 characters can be divided into 4 groups according to the OOOOOO.FFF.BBBBBB.SSS format. Represents the data object number, the data file number (datafile numbers), the data block number, or the block ID of the fragment in the record or record.

It is important to note that the larger the rowid that is not inserted into the record, the greater the likelihood that the recorded rowid will be inserted later. Here are two arguments to prove:

1. After inserting the record block may be large, or small

According to our experiment, suppose that there are now three records in the table assuming that the file number is the same, by block number, the line number is arranged as follows:

108 0

108 1

108 2

After you delete the middle record, you get

108 0

108 2

Add one more record and you'll probably get

108 0

108 1 <---Newly added records

108 2

may also be

108 0

108 2

108 3 <---Newly Added records

Both scenarios are likely to occur, depending on the allocation algorithm within the Oracle block. A more in-depth analysis of this situation can be found in document 2.

2. The number of blocks inserted after the record may be large and may be small

The block number of the inserted record is not linearly incremented, but is controlled by Freelist. The theory and algorithm of freelist can be found in document 1.



Therefore, for a full table scan you can draw the following conclusions:

1. The select * FROM MyTable in Oracle does not guarantee that the order of records returned is in the order of insertion, but in rowid order.

The order of ROWID is consistent with the "physical order" of the record row store. In the absence of an index, select is a full table scan, by "physical order", at which point the Select return record is the fastest by "physical order."

2. The rowid of records that have been inserted will not change.

If the full table scanning mode, the direct use of rownum as a selection criteria, according to the conclusion 1, the records may be different. If the SQL has a time condition or other condition as a filter for SQL statements (which drains the currently inserted value), then the rownum is used as the selection criteria, and the records and records are returned in the same order.

Conclusion 2 features can be used in the purge-backup mechanism of some log tables. For some log tables to improve insert performance, there may be no indexes, and the log tables are purged and backed up during the stored procedure. Use INSERT into Select to select Some records into the backup table, and then delete the records in the log table with the DELETE statement. Through RowNum to control the operation of the number of lines, to avoid the rollback section problem, through the time conditions to implement the conclusion 2, to ensure consistent record.
2.2 Scan by index
For a range of indexes, the index leaf nodes are usually sorted and there are pointers between the leaf nodes, which are easy to scan in Oracle, which is represented by the index leaves. Because the select scans the table by index at this time, the returned records are arranged in the indexed order.

With the above characteristics, the following methods can be applied for index scanning:

1. The index lets you sort the returned records in advance.

Using an index in Oracle enables you to sort the returned records without having to use the order by. For different sorting methods can be done with different indexes, through the hint/*+*/instructions can control the index to work in different scanning mode, so as to achieve different results. such as/*+index (table index_name)/or/*+index_desc (table index_name) */indicates that the records returned are sorted in ascending order by field or descending by field, by scanning in ascending order or descending by index.

For example, for table T (a int,b int) indexed on a index_a,b with index b

The records obtained by the SELECT * from t



A

B

19

43

21st

1

3

10

5

8

11

2

Select/*+index (t index_a) */* from T where a>0 or

SELECT * FROM T where a>0 order by a

A

B

3

10

5

8

11

2

19

43

21st

1

From the execution plan, it is accessed by index scan and by index ROWID.

Select/*+index_desc (t index_b) */* from T where b>0 or

SELECT * FROM T where b>0 order by B

A

B

21st

1

11

2

5

8

3

10

19

43

From the execution plan, it is accessed by index scan and by index ROWID.



2. The record implementation can be returned in the order of insertion by using fields such as time, serial number, etc. as index fields

Also use the above features to illustrate the backup problem in 2.1. When the log table is indexed, select the Index field that qualifies the scan range so that the record inserted after the result set, such as time or serial number, ensures that the records of the insert and delete operations are exactly the same when the number of rows is controlled by rownum. At the same time, index based scans ensure SQL performance.
3.sybase
Regardless of whether your SELECT statement uses an index after the where, Sybase can adjust the use of the index based on the cost. Even if there is no where statement, it is possible to use an index, even if there is a where statement. Of course, the index is definitely not used if the table itself does not create any indexes.
3.1 Tables with no indexes
Tables that are not indexed are called heap tables. The heap table has a corresponding record in the sysindexes table, its indid=0. The first field represents the front page of the heap table, and root represents the end of the heap table. All data pages in the heap table Form a two-way list of Sysindex.first <-> sysindex.root.

For inserted records, all data inserted into the heap table is added to the tail of the table. Sybase uses the indid (=0) and root values of the Sysindex table to find the last data page of the table. If there is room on the page, insert a new record row at the end of the data. If there is no available space on the last page, if there is space available on the next page of the expansion unit, this is used; If the last page is already the last page of the expansion unit, start using a new expansion unit, which will always link to the end of the list for the newly added page. The value of the Sysindex.root is also updated.

For record deletion, when a record is deleted, the record in the page immediately following the deleted record moves to the front of the page, and all unused space is left adjacent to the bottom of the page. When all the rows in a page are deleted, the page is detached from the data chain of the heap table.

For updates, the heap table is based on the following guidelines:

· If the length of the row does not change, it is updated directly on the original line, and there is no movement of the data within the page.

· If the length of the row changes, and the page has enough free space. The row is still in the same position on the page, but the other rows move up or down to keep the page running continuously.

· If the page cannot hold rows. In the allpages-locked heap table, the rows are deleted, and the new row is inserted into the last page. In the Data-only-lockedthe heap table, the row is inserted into another page, and the pointer is pointed to the page in its original position, so that the ID position of the row is unchanged.

For scans, read the data page by Sysindex.first <-> sysindex.root linked list.

For the heap table, according to the above inserts, deletes, updates, the scanning characteristic, may obtain the following conclusion:

1. For a heap table without any indexes, if you ensure that update is not used, or that update does not produce inserts, you can safely use Select to complete the natural sort, where records are returned in the order in which they were inserted.


3.2 Indexed Tables
For Sybase execution plans without indexes, the order of the Select Return records is the same as the order in which the heap table scans are returned.

For Sybase execution plan-indexed tables, select returns records in the order of indexed fields. Sybase organizes the index into a B-tree. Each page in the index contains a top page, followed by the index row at the top of the page. Each index row contains a key value and a pointer to a lower-level page or data row. Each page of the index is called an index node. The top node of the B-tree is called the root node. The underlying node of the index is called a leaf node. Page links in each level of the index are in a two-way linked list.

For indexed tables, the following conclusions are obtained:

1. Controls the return order by controlling the index to control the query.

If we can specify the use of an index through (index index_name) to sort by index index_name. You can also use (index 0) to instruct that indexes are not used, so that the order of records returned is in the heap table.

2. How to not force the specified index, regardless of whether or not the index is used in your SELECT statement, Sybase may adjust the use of the index based on the cost. Because Sybase's cost-execution plan adjusts the use of indexes, it is not possible to use nonclustered indexes to perform a natural sort of return records, as Oracle does, and it is best to add order by to ensure that the order is accurate.

3. If the field you want to sort is a clustered index, you can safely use the index to complete the sort. At this point, Sybase returns records in the order of the clustered index fields, regardless of the execution plan. For a clustered index table, when inserting data, it causes the movement of some records within the page (the value of a large record), which ensures that the physical order of the data is consistent with the clustered index order by mobile Sybase.
4.Ms SQL Server
Regardless of whether your SELECT statement uses an index after the where, SQL Server may adjust the use of the index based on the cost. Even if there is no where statement, it is possible to use an index, even if there is a where statement. Of course, the index is definitely not used if the table itself does not create any indexes.
4.1 Tables with no indexes
Tables that are not indexed are called heap tables or heaps. A heap uses the IAM to manage extents, and multiple IAM forms an IAM chain. The heap has a row within the sysindexes, its indid = 0. sysindexes. The FirstIAM column points to the IAM homepage of the IAM page chain, and the IAM page chain manages the space allocated to the heap. SQL Server 2000 uses IAM pages to browse through the heap. The data pages and rows within a heap are not in any particular order, and are not linked together. The only logical connection between data pages is a connection that is recorded within an IAM page.

For an insert operation, when SQL Server 2000 needs to insert a new row and the current page does not have free space, it uses the IAM and PFS pages to find a page with enough space to hold the row. SQL Server uses an IAM page to find extents assigned to an object. For each of the extents, SQL Server searches the PFS pages to see if a page has enough space to hold the line.

SQL Server assigns a new extents to an object only if it cannot quickly find a page within an existing extent that has enough space to hold the row being inserted. SQL Server allocates extents from available extents within the filegroup using a proportional allocation algorithm. If a filegroup has two files, one of which has twice times more free space than the other, then each page is assigned two pages from the former. This means that each file within a filegroup should have an approximate percentage of space usage.

For a delete operation, the page in the heap table does not move within the page, even if the record is deleted.

SQL Server can be used in a variety of ways for data updates. Updates may occur on the spot, either in a way that is first deleted and then inserted, or by a query processor or storage engine. However, in the heap table, always in the field update mode, for the updated content of the original page can not accommodate the situation, SQL Server 2000 using the steering pointer processing, to ensure that the record is updated after the location of the unchanged.

By scanning an IAM page, a heap can be scanned or serially read to find the extents of the page that holds the heap. Because the IAM represents them in the order in which they exist in the data file, this means that the serial heap scans are all along each file.

According to the principle of inserting, updating, deleting and scanning of the heap table, the following conclusions can be obtained:

1. Setting the scan order using an IAM page means that the rows in the heap are generally not returned in the order in which they were inserted.

2. For records that already exist, the location of the record (database number, file number, page number, line number) will not change.

Conclusion 2 can be applied to the backup-purge mechanism. If the log table is a heap table without indexes, you can exclude the currently inserted record by using fields such as time, serial number, and so on, so that the result set and order returned by the Select and delete two operations are exactly the same, and then the number of records per operation is controlled by the SET rowcount. Enables the backup-purge operation to proceed safely.
4.2 Indexed tables
For a table with no indexes on the SQL Server execution plan, the order of the Select Returns records in the same order as the heap table scan returns.

For SQL Server Execution plan-indexed tables, select returns records in the order of indexed fields. SQL Server organizes the index into a B-tree. Each page in the index contains a top page, followed by the index row at the top of the page. Each index row contains a key value and a pointer to a lower-level page or data row. Each page of the index is called an index node. The top node of the B-tree is called the root node. The underlying node of the index is called a leaf node. Page links in each level of the index are in a two-way linked list.

For indexed tables, the following conclusions are obtained:

1. You can control the return order by controlling the index to control the query method.

If we can specify the use of an index through with (index (index_name)), sort by index index_name.

2. How to not force the specified index, regardless of whether or not the index is used in your SELECT statement, SQL Server may adjust the use of the index based on the cost, even if there is no where statement that might use the index, even if there is a where. Regardless of whether your DELETE statement uses an index after the where, SQL Server may adjust the use of the index based on the cost, even if there is no where statement that might use the index, even if there is a where statement. A select and delete execution plan with the same where statement is likely to be different.

Therefore, the record order of the Select and delete is likely to be inconsistent, and if you want to select the first N records, the resulting recordset will not match the contents, although the number of bars is the same. Although we can force a select to use the index through with (index (index_name)), delete does not enforce the specified index because delete involves deleting the index itself.

In this case, do not use SET ROWCOUNT to control the number of bars if the database is performing well enough to back up the data. However, if you do need to control the number of deleted bars at a time, you can control a smaller range directly in the where condition, such as when the time span is controlled to the hour, and the day's data is backed up by a 24-hour cycle.

Or use DTS for backup.

3. If the field you want to sort is a clustered index, you can safely use the index to complete the sort. At this point, SQL Server returns records in the order of the clustered index fields, regardless of the execution plan.



References and materials:

1. "Oracle freelist and HWM principles and related performance optimization", Rempo

2. The research on the storage and reorganization of data in block, http://www.itpub.net

3. How to extract records in physical order? Http://www.itpub.net

4. How do I find the last line of a table? Physical insertion Order, http://www.itpub.net

5. Oracle 9i for Windows nt/2000 Data System training course, Tsinghua University Press

6. Microsoft SQL Server 2000 Technical Insider, Peking University Press

7. "Heaps of data:tables without clustered Indexes"

This part of the article in my blog site http://blog.csdn.net/youbo2004 can be found.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.