Oracle performance analysis 4: Full scan of data access methods

Last Update:2014-09-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

During SQL statement execution, the Oracle optimizer determines the table access mode based on the statistical information. Generally, there are two basic data access methods: 1) Full scan. During full scan (full table scan or quick full index scan), multiple blocks are read into one IO operation. 2) index scanning. Index scanning first scans the leaf blocks of the index to obtain the Specific Row id (rowid). Then, these row IDs are used to access the parent table to obtain the actual row data, and the access is completed by reading a single block. The full scan mode is described here, and index scanning will be introduced later.

Full scan

When a full scan is performed on a table, all data blocks in the table are taken out and processed to filter data that meets the conditions. Note that Oracle must read the data in the entire data block to the memory before obtaining Qualified Data. Therefore, the Oracle optimizer requires two pieces of information: the number of blocks and the amount of discarded data in each block. The optimizer will determine whether to use full scan based on the two information. First, let's see how the number of retrieved blocks affects the optimizer selection.

Obtain the number of blocks

In general, if you want to retrieve most of the data blocks of a table, you should use full scan. However, it is difficult to evaluate the number of data blocks in the table to be retrieved by the query. Therefore, there are many empirical rules in the use of full scan ": when your query retrieves x % of the data rows in the table, you should select full scan. These rules have some principles, but they are not accurate, because when the retrieved data rows are large, the naturally retrieved data blocks will also be large. In this case, full scanning is not a problem, however, although the retrieved data rows are small, the retrieved data blocks may be large. In fact, full scanning should be used, but these "empirical rules" will not take effect. Let's take a look at the specific example below.
Create a table T1:

create table t1 as select trunc((rownum - 1) / 100) id, rownum value  from dba_source where rownum <= 10000

Then create an index for T1:

create index idx_t1_id on t1(id)

Then collect statistics for T1:

BEGIN  dbms_stats.gather_table_stats(user,                                't1',                                method_opt => 'FOR ALL COLUMNS SIZE 1',                                cascade    => TRUE);END;

Then we execute the query:

select * from t1 where id = 0

The query execution plan is as follows:

SELECT STATEMENT, GOAL = ALL_ROWS TABLE ACCESS BY INDEX ROWID  INDEX RANGE SCAN

The execution plan uses the index range scan. Since the data with a qualified id of 0 has only 100 rows of data in the table, the entire table has 10 thousand rows of data, the queried data only accounts for 1% of the total data, so we think this is a reasonable execution plan.
Next, let's look at the example below to create a table T2:

create table t2 as select mod(rownum,100) id, rownum value  from dba_source where rownum <= 10000

Create an index for T2:

create index idx_t2_id on t2(id)

Then collect statistics for T2:

BEGIN  dbms_stats.gather_table_stats(user,                                't2',                                method_opt => 'FOR ALL COLUMNS SIZE 1',                                cascade    => TRUE);END;

Then execute the query:

select * from t2 where id = 0

The query execution plan is as follows:

SELECT STATEMENT, GOAL = ALL_ROWS TABLE ACCESS FULL

We can see that the execution plan of the table is changed to a full table scan. We can easily find that the query result is still 100 pieces of data, accounting for 1% of the total data volume of table T2, if we further compare the data of T2 and T1, we will find that the id fields of the two tables are exactly the same. Why does T1 select index scan while T2 chooses full table scan?
To understand the reason, we need to analyze the distribution of data on data blocks. In table T1, the distribution of id fields is as follows:

0 0...0 0 1 1...1 1 2 2...2 2......88 88...88 88......99 99...99 99

The id field distribution in Table T2 is as follows:

0 1 2 3 ... 98 99 0 1 2 3 ... 98 99 ...... 0 1 2 3 ... 98 99

From this we can see that the data with id 0 in Table T1 is concentrated on several data blocks, while the data with id 0 in Table T2 is distributed in many different blocks, as a result, the T1 query only needs to read a few blocks to obtain the results. Therefore, index range scanning is used, while the T2 query needs to read most of the blocks, therefore, the optimizer selects full table scan.

Discard

Note that the efficiency of Full scan depends not only on the number of data blocks read, but also on the number of rows in the final result set. From the above example, we can see that when a data block is read, the query will discard the data that does not meet the filter conditions. This discard process consumes resources. Because this operation is in the memory, it consumes CPU resources. The larger the discarded data volume, the more CPU resources it consumes.
Therefore, the larger the number of data blocks read, the larger the discarded data volume, the higher the cost of full scanning (cost.
It is hard to imagine that when the table's data volume increases, the number of discarded rows increases, and the cost of full scanning increases, the Optimizer may eventually abandon full scanning and choose index scanning.

Multi-block read mode

Multi-scan uses multi-block reading, that is, a single IO call will read multiple blocks. The number of read blocks is variable, but there is an upper limit, you can specify this parameter using the db_file_multiblock_read_count parameter. You can view this parameter using the following SQL statement:

select * from v$parameter where name = 'db_file_multiblock_read_count'

The following describes the number of blocks read by Oracle in several cases:

1) Oracle has to read data blocks that exceed a certain boundary range.
In this case, Oracle will read the data blocks that are within the boundary range in one call, and then initiate another call to read the remaining blocks.
2) The block already exists in the memory.
First, read the blocks already in the memory, and then initiate a call to read the remaining blocks, which means that multiple reads may only read one block at a time. For example, if the maximum number of data blocks read is 16, and the number of data blocks read this time is 1-16 and the number of even data blocks is already in the memory, in this example, there will be eight single-block read calls to read odd-number blocks.
3) The read size of multiple data blocks exceeds the operating system limit.
It depends on your operating system, so it is variable.

High Water Level line

The so-called high water level line is the last data block in the table that has data written. It should be noted that even if almost all data rows are deleted and some blocks have actually become completely empty, the high water level line remains unchanged. As shown in the following example, after a table is created and inserted with data:

As the subsequent data changes (delete and modify), the data changes in the table as follows:

Although there is no data in many storage areas, the high level line remains unchanged.
So what is the impact of the high water level line on the Full scan?
During the full scan, Oracle will always read the data blocks in the table's upper and lower level lines. Even if they are empty, this means that many data blocks that do not actually need to be read are also read.

The following example shows how to use the previous table T2.

1) use the following statement to determine the number of data blocks contained in the table:

Select blocks from user_segments where segment_name = 't2' result: 24

2) determine the number of data blocks in the table that contain data;

Select count (distinct (dbms_rowid.rowid_block_number (rowid) block_ct from t2 result: 17

3) execute the following query and view trace information (for details about trace information acquisition, see Oracle Performance Analysis 1)

Alter system flush buffer_cache; -- clear cache select * from t2 where id = 0

The trace information is:

call     count       cpu    elapsed       disk      query    current        rows------- ------  -------- ---------- ---------- ---------- ----------  ----------Parse        1      0.00       0.00          0          0          0           0Execute      1      0.00       0.00          0          0          0           0Fetch        2      0.00       0.06         18         20          0         100------- ------  -------- ---------- ---------- ---------- ----------  ----------total        4      0.00       0.07         18         20          0         100Misses in library cache during parse: 1Optimizer mode: ALL_ROWSParsing user id: 5  Rows     Row Source Operation-------  ---------------------------------------------------    100  TABLE ACCESS FULL T2 (cr=20 pr=18 pw=0 time=35975 us)

100 rows of data are queried, and the number of physical data blocks (disk) is 18, including the reading of a data block in the header (only 17 data blocks contain data ). The execution plan uses a full table scan.

4) delete data

delete from  T2

5) obtain the number of data blocks in the table again.

Select blocks from user_segments where segment_name = 't2' result: 24

6) obtain the number of data blocks that contain data

Select count (distinct (dbms_rowid.rowid_block_number (rowid) block_ct from t2 result: 0

7) execute the query and view the trace information

Alter system flush buffer_cache; -- clear cache select * from t2 where id = 0

The trace information is:

call     count       cpu    elapsed       disk      query    current        rows------- ------  -------- ---------- ---------- ---------- ----------  ----------Parse        1      0.00       0.00          0          0          0           0Execute      1      0.00       0.00          0          0          0           0Fetch        1      0.00       0.21         18         20          0           0------- ------  -------- ---------- ---------- ---------- ----------  ----------total        3      0.00       0.22         18         20          0           0Misses in library cache during parse: 1Optimizer mode: ALL_ROWSParsing user id: 5  Rows     Row Source Operation-------  ---------------------------------------------------      0  TABLE ACCESS FULL T2 (cr=20 pr=18 pw=0 time=214806 us)

We can see that the number of rows queried is 0, but even 18 data blocks are read physically, and the Execution Plan uses full scanning.

Modify high watermark line

We have learned about the performance problems caused by the high water level line for full scanning. The following describes several methods to reduce the high water level line.

Use truncate

truncate table_name

When deleting data, try to use the truncate operation to reduce the high water level line.

Move operation

alter table table_name move

Note that the move operation requires additional tablespace storage, which locks the table, so that other concurrent users execute DML statements on the table and wait. The move operation affects the index of the table. Therefore, the index must be rebuilt.

Shrink operation

Shrink space operations do not require any extra space, but the speed is much slower than moving. The shrink command can be divided into the following two types:

1) only compress the space without adjusting the waterline. It can be executed when the business is busy.

alter table table_name shrink space compact

Compact operations use a series of insert and delete operations to arrange data in front of segments as much as possible. In this process, the RX lock must be applied to the table, that is, the row to be moved must be locked. However, because rowid changes are involved, enable row movement is required.

2) adjusting the waterline will produce locks and can be executed when there are few businesses. oracle will remember the operations in step 1 and only adjust the waterline.

alter table big_table shrink space

Use a new table

Copy the data to the temporary table t, drop the original table, and rename the temporary table t as the original table.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Oracle performance analysis 4: Full scan of data access methods

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support