Index optimization min and max Aggregate functions

Source: Internet
Author: User
The customer mentioned that the system was slow recently, and sampled an awr report, and found that the following SQL statement has been executed for a long time SELECTMAX (dmsample0 _. ORD) ASx0_0_FROMHF_DM_SAMPLEdmsample0_WHERE (dmsample0 _. project_id1_00000001) select * fromtable (dbms_xplan.display_cursor

The customer mentioned that the system was slow recently and sampled an awr report and found that the following SQL statement has been executed for a long time: SELECT MAX (dmsample0 _. ORD) AS x0_0 _ FROM HF_DM_SAMPLE dmsample0 _ WHERE (dmsample0 _. PROJECT_ID = '000000') select * from table (dbms_xplan.display_cursor

The customer mentioned that the system was relatively slow recently. He sampled an awr report and found that the following SQL statement was executed for a long time.
Select max (dmsample0 _. ORD) AS x0_0 _
FROM HF_DM_SAMPLE dmsample0 _
WHERE (dmsample0 _. PROJECT_ID = '201312 ')

select * from table(dbms_xplan.display_cursor('3js6cmjycbndh',null));

SQL_ID 3js6cmjycbndh, child number 0
-------------------------------------
select max(dmsample0_.ORD) as x0_0_ from HF_DM_SAMPLE dmsample0_ where
(dmsample0_.PROJECT_ID='000000000001' )

Plan hash value: 1698372702

-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 5595 (100)| |
| 1 | SORT AGGREGATE | | 1 | 18 | | |
|* 2 | TABLE ACCESS FULL| HF_DM_SAMPLE | 139K| 2456K| 5595 (1)| 00:01:08 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("DMSAMPLE0_"."PROJECT_ID"='000000000001')

SQL> select num_buckets,num_distinct,num_nulls,histogram from dba_tab_columns where table_name='HF_DM_SAMPLE' and (column_name='PROJECT_ID' or column_name='ORD');

NUM_BUCKETS NUM_DISTINCT NUM_NULLS HISTOGRAM
----------- ------------ ---------- ---------------
2 2 0 FREQUENCY
1 137792 0 NONE

It seems that project_id is poorly filtered. It is normal to select the full table here, but we need to note that only two columns of PROJECT_ID and ORD are required for query here.

Because the index is ordered, if the PROJECT_ID and ORD joint indexes are created, the scanning process is from the root node to the branch block node and then to the leaf block node, however, when a leaf block node arrives, only the first or last leaf block of the corresponding leaf block node corresponding to project_id = '20160301' is scanned, in addition, only the first or last row of data is scanned ., This is our common index range scan (min/max). What is different from our usual optimization is that this is a composite index.

create index ind_multi on HF_DM_SAMPLE(PROJECT_ID,ORD);

SELECT MAX (dmsample0_.ORD) AS x0_0_
FROM HF_DM_SAMPLE dmsample0_
WHERE (dmsample0_.PROJECT_ID = '000000000001')

select * from table(dbms_xplan.display_cursor(null,null));

SQL_ID 1vjvuktzmxwm3, child number 0
-------------------------------------
SELECT MAX (dmsample0_.ORD) AS x0_0_ FROM HF_DM_SAMPLE dmsample0_ WHERE
(dmsample0_.PROJECT_ID = '000000000001')

Plan hash value: 4232005098

------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 3 (100)| |
| 1 | SORT AGGREGATE | | 1 | 18 | | |
| 2 | FIRST ROW | | 1 | 18 | 3 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN (MIN/MAX)| IND_MULTI | 1 | 18 | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("DMSAMPLE0_"."PROJECT_ID"='000000000001')

Next, let's look at another method to create an index:
Drop index ind_multi;
Create index ind_multi on HF_DM_SAMPLE (ORD, PROJECT_ID );

Here we create an index for the leading column indexed by ORD, and then look at the execution plan:
SQL _ID fbmhd0u1j3t3u, child number 0
-------------------------------------
Select max (dmsample0 _. ORD) as x0_0 _ from HF_DM_SAMPLE dmsample0 _ where
(Dmsample0 _. PROJECT_ID = '201312 ')

Plan hash value: 1607964330

-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 2 (100)| |
| 1 | SORT AGGREGATE | | 1 | 18 | | |
| 2 | FIRST ROW | | 1 | 18 | 2 (0)| 00:00:01 |
|* 3 | INDEX FULL SCAN (MIN/MAX)| IND_MULTI | 1 | 18 | 2 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - filter("DMSAMPLE0_"."PROJECT_ID"='000000000001')

Here, from the previous index range scan (MIN/MAX) to index full scan (MIN/MAX), index full scan (MIN/MAX) is also a full index scan, what is different from index range scan (min/max) Is that index full scan (min/max) directly scans root, branches, and then to the leftmost or last leaf block node, of course, the project_id will be filtered during the scanning process. If the query does not meet the requirement, Continue scanning the leaf block node from the forward side of the right until the qualified leaf block is found, index range scan (min/max) will first filter through the index's leading column, and then scan the first or last leaf block of the corresponding leaf block node.

Initially, the consumption of these two indexes is not much different, but with the increase of data, there are still some differences between the two index scans. When there are many different PROJECT_ID values, the I/O cost of the index with project_id as the leading column will not change much, but the I/O cost of the leading column with ORD as the index will increase.

Another thing to note is that too many indexes will lead to more updates. We need to make further choices based on our needs, for the current SQL statement, because the project_id field has only two different values, and the Business query also uses ORD to filter more values, we recommend that you use ORD as the leading column to make the index more suitable.

Original article address: index optimization min and max Aggregate functions. Thank you for sharing them with me.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.