Analysis of Oracle B-tree index search principles

Source: Internet
Author: User

Indexes, like tables, also belong to segments. It stores user data and occupies disk space like a table. However, the data storage form in the index is very different from that in the table. When you understand the index, you can imagine a book in which the content of the book is equivalent to the data in the table, and the directory in front of the book is equivalent to the index of the table. At the same time, the disk space occupied by indexes is usually much smaller than that of tables, and its main function is to speed up data search.

 

However, as an optional data structure, you can choose to create an index for a table or not. This is because once an index is created, it means that when Oracle performs DML (including insert, update, and delete) on the table, it must handle additional work (that is, maintenance of the index structure) and storage overhead. Therefore, when creating an index, you need to consider the query performance improvement caused by index creation, and whether it is worthwhile compared with the additional overhead.

 

The B-tree index is a typical tree structure and is always balanced. That is to say, any path from the root node to the leaf node is of an equal distance. Its main components are:
Leaf node: contains entries that direct to the data rows in the table.
Branch node: contains entries pointing to other branch nodes or leaf nodes in the index.
Root Node: a B-tree index has only one root node. It is actually the branch node at the top of the tree.

 

For branch node blocks (including root node blocks), the index entries contained in the blocks are arranged in order (reverse and reverse order can be specified ). Each index entry (or each record) has two fields. The first field indicates the minimum key value contained in the index block linked to the current branch node block (based on the replication principle of the minimum key value of B + tree); The second field is four bytes, the address of the linked index block. This address points to the following index block. The number of records that can be stored in a branch node block is determined by the size of the data block and the length of the index key value.

 

For leaf node blocks, the index entries in the same order as the branch nodes are arranged in ascending order, you can also specify the order in descending order when creating an index ). Each index entry (or each record) also has two fields. The first field indicates the key value of the index, which is a value for a single column index, and multiple values are combined for multiple column indexes. The second field indicates the rowid of the record row corresponding to the key value. This rowid is the physical address of the record row in the table. In a leaf node, each index entry occupies one row of space in the data block. Each row uses two to three bytes as the row header, which stores information such as the tag and lock type. At the same time, in the first field that represents the index's key value, each index column has one byte representing the Data Length, followed by the specific value of the column.

 

The index structure of the branch node and the index information of the leaf node are dumped respectively.

 

Create Test Data

sys@ORCL> select * from v$version where rownum=1;BANNER----------------------------------------------------------------Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prodsys@ORCL> drop table tt purge;drop table tt purge           *ERROR at line 1:ORA-00942: table or view does not existsys@ORCL> create table tt as select * from dba_objects;Table created.sys@ORCL> select count(*) from tt;  COUNT(*)----------     50356sys@ORCL> insert into tt select * from tt;50356 rows created.sys@ORCL> commit;Commit complete.sys@ORCL> select count(*) from tt;  COUNT(*)----------    100712sys@ORCL> create index btree_tt on tt(object_name);Index created.

 

View the blevel and height of an index (blevel: depth of a node. Root is located at Layer 0th, and so on. Height = blevel + 1)

sys@ORCL> select index_name,blevel from dba_indexes where index_name='BTREE_TT';INDEX_NAME                         BLEVEL------------------------------ ----------BTREE_TT                                2sys@ORCL> analyze index btree_tt validate structure;Index analyzed.sys@ORCL> select name,height from index_stats where name='BTREE_TT';NAME                               HEIGHT------------------------------ ----------BTREE_TT                                3

 

Get the object number of btree_tt and dump the index structure.

sys@ORCL> select object_id from dba_objects where owner='SYS' and object_name='BTREE_TT'; OBJECT_ID----------     52614sys@ORCL> oradebug setmypidStatement processed.sys@ORCL> alter session set events 'immediate trace name treedump level 52614';Session altered.sys@ORCL> oradebug tracefile_name/u01/app/oracle/admin/orcl/udump/orcl_ora_5234.trc

 

View the treedump TRC File

----- begin tree dumpbranch: 0x40efaa 4255658 (0: nrow: 2, level: 2)   branch: 0x40f603 4257283 (-1: nrow: 247, level: 1)      leaf: 0x40efab 4255659 (-1: nrow: 182 rrow: 182)      leaf: 0x40efac 4255660 (0: nrow: 182 rrow: 182)      leaf: 0x40efad 4255661 (1: nrow: 186 rrow: 186)      leaf: 0x40efae 4255662 (2: nrow: 189 rrow: 189)      leaf: 0x40efaf 4255663 (3: nrow: 186 rrow: 186)      leaf: 0x40efb0 4255664 (4: nrow: 190 rrow: 190)      leaf: 0x40efb1 4255665 (5: nrow: 185 rrow: 185)      leaf: 0x40efb2 4255666 (6: nrow: 179 rrow: 179)      leaf: 0x40efb3 4255667 (7: nrow: 187 rrow: 187)      leaf: 0x40efb4 4255668 (8: nrow: 181 rrow: 181)      ............................................      ............................................   branch: 0x40f6fb 4257531 (0: nrow: 248, level: 1)      leaf: 0x40f602 4257282 (-1: nrow: 228 rrow: 228)      leaf: 0x40f604 4257284 (0: nrow: 226 rrow: 226)      leaf: 0x40f605 4257285 (1: nrow: 224 rrow: 224)      leaf: 0x40f606 4257286 (2: nrow: 223 rrow: 223)      leaf: 0x40f607 4257287 (3: nrow: 217 rrow: 217)      leaf: 0x40f608 4257288 (4: nrow: 253 rrow: 253)      leaf: 0x40f609 4257289 (5: nrow: 232 rrow: 232)      ............................................      ............................................      leaf: 0x40f6f8 4257528 (244: nrow: 191 rrow: 191)      leaf: 0x40f6f9 4257529 (245: nrow: 181 rrow: 181)      leaf: 0x40f6fa 4257530 (246: nrow: 99 rrow: 99)----- end tree dump

 

Explain the TRC File
The first column of each row indicates the node type. branch indicates the branch node (including the root node), and leaf indicates the leaf node.
The second column indicates the Node Address in hexadecimal notation.
The third column indicates the Node Address, in decimal format.
The fourth column indicates the position relative to the previous node: the root node starts from 0, and the other branch nodes and leaf nodes start from 1.
The fifth column indicates: (nrow) Number of index entries contained in the current node (including Delete entries)
Column 6: (level) level of the branch node. In the Oracle index, the level number is reversed. That is to say, if an index has N layers, the level Number of the root node is N, the level of the branch node next to the root node is N-1
The seventh column indicates: (rrow) the number of valid index entries, because if an index entry is deleted, it is not immediately cleared from the index block. Therefore, nrow minus rrow indicates the number of deleted index entries.

 

The preceding method dumps the entire index in a tree. At the same time, we can dump an index node to see what is stored in it.
The index block content of the root node is dumped below.

The TRC file shows that the root node branch: 0x40efaa 4255658 (0: nrow: 2, level: 2)

sys@ORCL> select dbms_utility.data_block_address_file(4255658 ) fno,  2              dbms_utility.data_block_address_block(4255658 ) bno  3         from dual;       FNO        BNO---------- ----------         1      61354sys@ORCL> alter system dump datafile 1 block 61354;System altered.sys@ORCL> oradebug tracefile_name/u01/app/oracle/admin/orcl/udump/orcl_ora_5234.trc

View the TRC content of the Root Node

header address 230057028=0xdb66444kdxcolev 2KDXCOLEV Flags = - - -kdxcolok 0kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Ykdxconco 2kdxcosdc 0kdxconro 1kdxcofbo 30=0x1ekdxcofeo 8026=0x1f5akdxcoavs 7996kdxbrlmc 4257283=0x40f603kdxbrsno 0kdxbrbksz 8056 kdxbr2urrc 0row#0[8026] dba: 4257531=0x40f6fbcol 0; len 18; (18):  41 4c 4c 5f 43 4f 4c 5f 50 52 49 56 53 5f 4d 41 44 45col 1; len 6; (6):  00 40 ee c5 00 2c----- end of branch block dump -----

In this example, the level of the root node is 2, and the leaf is 0.
Kdxcolok indicates whether there are DML active transactions on the index
Kdxconco indicates the number of columns in the index entry.
Kdxcosdc indicates the number of index structure changes. When you modify an index key value, add 1 to the value.
Kdxconro indicates the number of index entries in the current index node.
Kdxcofbo indicates the number of bytes from which the current index node starts to record
Kdxcofeo indicates the maximum byte at which the available space of the current index node is located.
Kdxcoavs indicates the total available space of the current index node. That is, the value of kdxcofeo-kdxcofbo.
Kdxbrlmc indicates the position of the branch node.
Kdxbrsno indicates the last modified Index Entry number. The value 0 indicates that the index is created.
Kdxbrbksz indicates the available data block size. From here we can know that even if pctfree is 0, we cannot use up 8 K data blocks completely.

row#0[8026] dba: 4257531=0x40f6fbcol 0; len 18; (18):  41 4c 4c 5f 43 4f 4c 5f 50 52 49 56 53 5f 4d 41 44 45col 1; len 6; (6):  00 40 ee c5 00 2c

This part is the index entries recorded in the root node, with a total of 1 row (in the definition of the B + tree, if you follow the minimum key code duplication principle, then, each non-leaf node in the tree contains M sub-trees, which must have an M key code ). Each index entry points to a branch node. COL 1 indicates the address of the linked branch node. If no other branch nodes exist under the root node, col 1 is a term; col 0 indicates the minimum key value linked to the branch node. Note that COL 0; Len 18; (18): -- column row number, starting from 0, followed by the column length and column value, this value is called the separator key, and this Separator
Key can be used to differentiate the actual index values. Therefore, we also know that branch block does not store the complete index values, as long as it can be distinguished. That is to say, Oracle only records the index key value prefix in the branch block, instead of all values, because it can save space and store more index entries. At the same time, we can also understand why the like '% XXX' method does not take the btree index for queries, because branch block
The prefix is stored.

 

The contents of the leaf node blocks are dumped below

Select any leaf: 0x40f6fa 4257530 (246: nrow: 99 rrow: 99)

sys@ORCL> select dbms_utility.data_block_address_file(4257530) fno,  2              dbms_utility.data_block_address_block(4257530) bno  3         from dual;       FNO        BNO---------- ----------         1      63226sys@ORCL> oradebug setmypidStatement processed.sys@ORCL> alter system dump datafile 1 block 63226;sys@ORCL> oradebug tracefile_name/u01/app/oracle/admin/orcl/udump/orcl_ora_6177.trc

The content of the leaf node is extracted as follows:

Block header dump:  0x0040f6fa Object id on Block? Y seg/obj: 0xcd86  csc: 0x00.a3506  itc: 2  flg: -  typ: 2 - INDEX     fsl: 0  fnx: 0x0 ver: 0x01  Itl           Xid                  Uba         Flag  Lck        Scn/Fsc0x01   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.000000000x02   0xffff.000.00000000  0x00000000.0000.00  C---    0  scn 0x0000.000a3506 Leaf block dump===============header address 221234268=0xd2fc45ckdxcolev 0KDXCOLEV Flags = - - -kdxcolok 0kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Ykdxconco 2kdxcosdc 0kdxconro 99kdxcofbo 234=0xeakdxcofeo 4692=0x1254kdxcoavs 4458kdxlespl 0kdxlende 0kdxlenxt 0=0x0kdxleprv 4257529=0x40f6f9kdxledsz 0kdxlebksz 8032row#0[7992] flag: ------, lock: 0, len=40col 0; len 30; (30):  73 75 6e 2f 74 6f 6f 6c 73 2f 74 72 65 65 2f 53 77 69 74 63 68 53 74 61 74 65 6d 65 6e 74col 1; len 6; (6):  00 40 f3 25 00 0arow#1[7953] flag: ------, lock: 0, len=39col 0; len 29; (29):  73 75 6e 2f 74 6f 6f 6c 73 2f 74 72 65 65 2f 54 68 69 73 45 78 70 72 65 73 73 69 6f 6ecol 1; len 6; (6):  00 40 f0 74 00 31row#2[7914] flag: ------, lock: 0, len=39col 0; len 29; (29):  73 75 6e 2f 74 6f 6f 6c 73 2f 74 72 65 65 2f 54 68 69 73 45 78 70 72 65 73 73 69 6f 6ecol 1; len 6; (6):  00 40 f0 74 00 32........................................................row#97[4727] flag: ------, lock: 0, len=35col 0; len 25; (25):  79 43 62 43 72 53 75 62 53 61 6d 70 6c 69 6e 67 54 79 70 65 31 37 30 5f 54col 1; len 6; (6):  00 40 f1 f1 00 0crow#98[4692] flag: ------, lock: 0, len=35col 0; len 25; (25):  79 43 62 43 72 53 75 62 53 61 6d 70 6c 69 6e 67 54 79 70 65 31 37 30 5f 54col 1; len 6; (6):  00 40 f4 a2 00 10----- end of leaf block dump -----

 

Different values from the branch node are parsed as follows:
Kdxlespl indicates the number of uncommitted transactions when the leaf node is split
Kdxlende indicates the number of deleted index entries.
Kdxlenxt indicates the address of the next leaf node of the current leaf node.
Kdxlprv indicates the address of the previous leaf node of the current leaf node.
Kdxledsz indicates the space to be deleted.

 

The next part of the dump file is the index entry part. Lock: 0 indicates that the lock information in ITL 0 indicates that the lock is not locked; Len: indicates the length of the index value; flag indicates that the tag, such as the delete tag. Col indicates the column number. Starting from 0, then the key value of the index and the last three parts of the rowid (relative file number, block number, and row number) are: Col 0 is the key value, COL 1 is the rowid.
That is to say, the leaf node stores the complete index key value and the rowid of the related index key value (this rowid removes the data object number part ), at the same time, the leaf node also stores two pointers (DBA) pointing to the previous leaf node and the next leaf node respectively. in this way, the leaf node is a two-way linked list structure. We can see that the system structure of the B-tree index is described as a tree-like three-dimensional structure. However, the arrangement corresponding to the data file is of course a plane form, that is, like the following. Therefore, when Oracle needs to access an index block, it is bound to move on this structure.
/Root/branch/leaf /... /Leaves/branches/leaves /... /Leaves/branches/leaves /... /Leaf/branch /.....
When Oracle needs to obtain an index block, it first starts from the root node and knows the branch node of the next layer based on the key value to be searched, then, access the branch nodes at the next layer, and then access the branch nodes at the next layer based on the key value. In this way, access the leaf nodes at the bottom layer. It can be seen that the physical I/O block is sequential and sequential. In the process of obtaining the final physical block, we cannot read multiple blocks at the same time, because we do not know which block to access when the current block is not obtained. Therefore, when accessing the data block on the index, it will correspond to the DB file sequential read wait event. The root cause is that we jumped from one index block to another in order, to find the final index block.

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.