Analysis of oracle B-tree index search principles

Source: Internet
Author: User


Analysis of oracle B-tree index search principles the same as the index and table, also belongs to a segment. It stores user data and occupies disk space like a table. However, the data storage form in the index is very different from that in the table. When you understand the index, you can imagine a book in which the content of the book is equivalent to the data in the table, and the directory in front of the book is equivalent to the index of the table. At the same time, the disk space occupied by indexes is usually much smaller than that of tables, and its main function is to speed up data search. However, as an optional data structure, you can choose to create an index for a table or not. This is because once an index is created, it means that when oracle performs DML (including INSERT, UPDATE, and DELETE) on the table, it must handle additional work (that is, maintenance of the index structure) and storage overhead. Therefore, when creating an index, you need to consider the query performance improvement caused by index creation, and whether it is worthwhile compared with the additional overhead. The www.2cto.com B-tree index is a typical tree structure and is always balanced. That is to say, any path from the Root node to the Leaf node is of an equal distance. Its main components are: Leaf node (Leaf node), which contains entries pointing directly to the data rows in the table. Branch node: contains entries pointing to other Branch nodes or leaf nodes in the index. Root node: a B-tree index has only one root node. It is actually the Branch node at the top of the tree.

For branch node blocks (including root node blocks), the index entries contained in the blocks are arranged in order (reverse and reverse order can be specified ). Each index entry (or each record) has two fields. The first field indicates the minimum key value contained in the index block linked to the current branch node block (based on the replication principle of the minimum key value of B + tree); The second field is four bytes, the address of the linked index block. This address points to the following index block. The number of records that can be stored in a branch node block is determined by the size of the data block and the length of the index key value. For leaf node blocks, the index entries in the same order as the branch nodes are arranged in ascending order, you can also specify the order in descending order when creating an index ). Each index entry (or each record) also has two fields. The first field indicates the key value of the index, which is a value for a single column index, and multiple values are combined for multiple column indexes. The second field indicates the ROWID of the record row corresponding to the key value. This ROWID is the physical address of the record row in the table. In a leaf node, each index entry occupies one row of space in the data block. Each row uses two to three bytes as the row header, which stores information such as the tag and lock type. At the same time, in the first field that represents the index's key value, each index column has one byte representing the Data Length, followed by the specific value of the column. Under www.2cto.com, the index structure of the branch node and the index information of the leaf node are dumped to create the test data [SQL] sys @ ORCL> select * from v $ version where rownum = 1; BANNER ------------------------------------------------------------ Oracle Database 10g Enterprise Edition Release 10.2.0.1.0-Prod sys @ ORCL> drop table tt purge; drop table tt purge * ERROR at line 1: ORA-00942: table or view does not exist sys @ ORCL> create table tt as select * from dba_o Bjects; Table created. sys @ ORCL> select count (*) from tt; COUNT (*) ---------- 50356 sys @ ORCL> insert into tt select * from tt; 50356 rows created. www.2cto.com sys @ ORCL> commit; Commit complete. sys @ ORCL> select count (*) from tt; COUNT (*) ---------- 100712 sys @ ORCL> create index btree_tt on tt (object_name); Index created. view the Blevel and height of an index (blevel: depth of a node. Root is located at Layer 0th, and so on. Height = blevel + 1) [SQL] sys @ ORCL> select index_name, blevel from dba_indexes where index_name = 'btree _ tt '; INDEX_NAME BLEVEL ------------------------------ ---------- BTREE_TT 2 sys @ ORCL> analyze index btree_tt validate structure; Index analyzed. www.2cto.com sys @ ORCL> select name, height from index_stats where name = 'btree _ tt'; name height ---------------------------- ---------- BTREE_TT 3 get btree_tt Dump [SQL] sys @ ORCL> select object_id from dba_objects where owner = 'sys 'and object_name = 'btree _ tt '; OBJECT_ID ---------- 52614 sys @ ORCL> oradebug setmypid Statement processed. sys @ ORCL> alter session set events 'immediate trace name treedump level 52614 '; www.2cto.com Session altered. sys @ ORCL> oradebug tracefile_name/u01/app/oracle/admin/orcl/udump/orcl_ora_5234.trc view the treedump trc file [SQL] ----- Begin tree dump branch: 0x40efaa 4255658 (0: nrow: 2, level: 2) branch: 0x40f603 4257283 (-1: nrow: 247, level: 1) leaf: 0x40efab 4255659 (-1: nrow: 182 rrow: 182) leaf: 0x40efac 4255660 (0: nrow: 182 rrow: 182) leaf: 0x40efad 4255661 (1: nrow: 186 rrow: 186) leaf: 0x40efae 4255662 (2: nrow: 189 rrow: 189) leaf: 0x40efaf 4255663 (3: nrow: 186 rrow: 186) leaf: 0x40efb0 4255664 (4: nrow: 190 Rrow: 190) leaf: 0x40efb1 4255665 (5: nrow: 185 rrow: 185) leaf: 0x40efb2 4255666 (6: nrow: 179 rrow: 179) leaf: 0x40efb3 4255667 (7: nrow: 187 rrow: 187) leaf: 0x40efb4 4255668 (8: nrow: 181 rrow: 181 )...................................... ........................................ .......... branch: 0x40f6fb 4257531 (0: nrow: 248, level: 1) leaf: 0x40f602 4257282 (-1: nrow: 228 rrow: 228) leaf: 0x 40f604 4257284 (0: nrow: 226 rrow: 226) leaf: 0x40f605 4257285 (1: nrow: 224 rrow: 224) leaf: 0x40f606 4257286 (2: nrow: 223 rrow: 223) leaf: 0x40f607 4257287 (3: nrow: 217 rrow: 217) leaf: 0x40f608 4257288 (4: nrow: 253 rrow: 253) leaf: 0x40f609 4257289 (5: nrow: 232 rrow: 232 )...................................... ........................................ .......... leaf: 0x40f6f84257528 (244: n Row: 191 rrow: 191) leaf: 0x40f6f9 4257529 (245: nrow: 181 rrow: 181) leaf: 0x40f6fa 4257530 (246: nrow: 99 rrow: 99) ----- end tree dump www.2cto.com: the first column of each row in the trc file indicates the node type, branch indicates the branch node (including the root node), and leaf indicates the second column of the leaf node: node Address, which is in hexadecimal notation and in the third column: node address. In decimal notation, the fourth column indicates the position of the root node relative to the previous node. The value starts from 0, other branch nodes and leaf nodes are counted from 1 as the Fifth Column: (nrow) Number of index entries contained in the current node (including delete entries) Column 6: (level) level of the branch node, in the oracle index, the level number is reversed. That is to say, if an index has N layers, the level Number of the root node is N, and the layer of the branch node at the next layer of the root node is Level 7 indicates the number of valid index entries in the N-1, because if an index entry is deleted, it is not immediately cleared out of the index block. Therefore, nrow minus rrow indicates the number of deleted index entries. In this way, the entire index is dumped in a tree. At the same time, we can dump an index node to see what is stored in it. The index block content of the root node is dumped below. The trc file shows that the root node branch: 0x40efaa 4255658 (0: nrow: 2, level: 2) [SQL] sys @ ORCL> select dbms_utility.data_block_address_file (4255658) fno, 2 dbms_utility.data_block_address_block (4255658) bno 3 from dual; fno bno ---------- 1 61354 sys @ ORCL> alter system dump datafile 1 block 61354; www.2cto.com System altered. sys @ ORCL> oradebug tracefile_name/u01/app/oracle/admin/orcl/udump/orcl_ora_5234. Trc view the trc content of the root node [SQL] header address 230057028 = 0xdb66444 kdxcolev2 kdxcolevflags =---kdxcolok 0 kdxcoopc 0x80: opcode = 0: iot flags = --- is converted = Y kdxconco 2 kdxcosdc 0 kdxconro 1 kdxcofbo 30 = 0x1e running 8026 = 0x1f5a kdxcoavs 7996 running 4257283 = running kdxbrsno 0 kdxbrbksz 8056 running 0 row #0 [8026] dba: 4257531 = 0x40f6fb col 0; len 18; (18): 41 4c 4c 5f 43 4f 4c 5f 50 52 49 56 53 5f 4d 41 44 45 col 1; len 6; (6): 00 40 ee c5 00 2c www.2cto.com ----- end of branch block dump ----- kdxcolev: index level number, in this example, the level of the root node is 2, and the leaf value is 0 kdxcolok. Indicates whether there is a DML active transaction kdxconco on the index. kdxcosdc indicates the number of columns in the index entry: the number of index structures that change. When you modify an index key value, add 1 kdxconro to indicate the number of index entries in the current index node. kdxcofbo indicates: the current index node records kdxcofeo starting from the number of bytes. The value of kdxcoavs indicates the maximum available space of the current index node. That is, kdxcofeo-kdxcofbo value kdxbrlmc indicates that the position of the branch node kdxbrsno indicates that the last modified Index Entry number is 0, indicating that the new index kdxbrbksz indicates: the available data block size. From here we can know that even if pctfree is 0, for 8 K data blocks, we cannot completely use [SQL] row #0 [8026] dba: 4257531 = 0x40f6fb col 0; len 18; (18): 41 4c 4c 5f 43 4f 4c 5f 50 52 49 56 53 5f 4d 41 44 45 col 1; len 6; (6): 00 40 ee c5 00 2c this part of content is the index entries recorded in the root node, a total of 1 line (in the definition of B + tree, if you follow the minimum key code rewrite principle, then, each non-leaf node in the tree contains m sub-trees, which must have an m key code ). Each index entry points to a branch node. col 1 indicates the address of the linked branch node. If no other branch nodes exist under the root node, col 1 is a TERM; col 0 indicates the minimum key value linked to the branch node. Note that col 0; len 18; (18): -- column row number, starting from 0, followed by the column length and column value, this value is called the separator key. This separator key can be used to distinguish the actual index value. Therefore, we also know that the branch block does not store the complete index value, as long as it can be distinguished. That is to say, Oracle only records the index key value prefix in the Branch block, instead of all values, because it can save space and store more index entries. At the same time, we can understand why the like '% XXX' method does not use the Btree index for queries, because the Branch block stores the prefix. The contents of the dump leaf node block under www.2cto.com randomly select a leaf: 0x40f6fa 4257530 (246: nrow: 99 rrow: 99) [SQL] sys @ ORCL> select dbms_utility.data_block_address_file (4257530, 2 dbms_utility.data_block_address_block (4257530) bno 3 from dual; fno bno ---------- 1 63226 sys @ ORCL> oradebug setmypid Statement processed. sys @ ORCL> alter system dump datafile 1 block 63226; sys @ ORCL> oradebug tracefile_name/u01/app/ Part of the oracle/admin/orcl/udump/orcl_ora_6177.trc leaf node is extracted as follows: [SQL] Block header dump: 0x0040f6fa Object id on Block? Y seg/obj: 0x509 csc: 0x00. a3506 itc: 2 flg:-typ: 2-INDEX fsl: 0 fnx: 0x0 ver: 0x01 Itl Xid Uba Flag Lck Scn/Fsc 0x01 0x0000. 000.00000000 0x00000000. 0000.00 ---- 0 fsc 0x0000.00000000 0x02 0xffff. 000.00000000 0x00000000. 0000.00 C --- 0 scn 0x0000. 000a3506 www.2cto.com Leaf block dump ================= header address 221234268 = 0xd2fc45c kdxcolev0 kdxcolevflags =---kdxcolok 0 kdxcoopc 0x80: Opcode = 0: iot flags = --- is converted = Y kdxconco 2 kdxcosdc 0 kdxconro 99 kdxcofbo 234 = 0xea kdxcofeo 4692 = 0x1254 kdxcoavs 4458 running 0 kdxlende 0 kdxlenxt 0 = 0x0 running 4257529 = running kdxledsz 0 kdxlebksz 8032 row #0 [7992] flag: ------, lock: 0, len = 40 col 0; len 30; (30 ): 73 75 6e 2f 74 6f 6f 6c 73 2f 74 72 65 65 2f 53 77 69 74 63 68 53 74 61 74 65 6d 65 6e 74 col 1; len 6; (6 ): 00 40 f3 25 0 0 0a row #1 [7953] flag: ------, lock: 0, len = 39 col 0; len 29; (29 ): 73 75 6e 2f 74 6f 6f 6c 73 2f 74 72 65 65 2f 54 68 69 73 45 78 70 72 65 73 73 69 6f 6e col 1; len 6; (6 ): 00 40 f0 74 00 31 row #2 [7914] flag: ------, lock: 0, len = 39 col 0; len 29; (29 ): 73 75 6e 2f 74 6f 6f 6c 73 2f 74 72 65 65 2f 54 68 69 69 73 45 78 70 72 65 73 73 69 6f 6e www.2cto.com col 1; len 6; (6 ): 00 40 f0 74 00 32 ........ ........................................ ........ Row #97 [4727] flag: ------, lock: 0, len = 35 col 0; len 25; (25 ): 79 43 62 43 72 53 75 62 61 6d 70 6c 69 6e 67 54 79 70 65 31 37 30 5f 54 col 1; len 6; (6 ): 00 40 f1 f1 00 0c row #98 [4692] flag: ------, lock: 0, len = 35 col 0; len 25; (25 ): 79 43 62 43 72 53 75 62 61 6d 70 6c 69 6e 67 54 79 70 65 31 37 30 5f 54 col 1; len 6; (6 ): 00 40 f4 a2 00 1 The value of 0 ----- end of leaf block dump ----- is parsed as follows: kdxlespl indicates the number of uncommitted transactions kdxlende indicates when the leaf node is split: kdxlenxt indicates the number of deleted index entries. kdxlprv indicates the address of the next leaf node of the current leaf node. kdxledsz indicates the address of the previous leaf node of the current leaf node: the following part of the deleted space dump file is the index entry part. Lock: 0 indicates that the lock information in ITL 0 indicates that the lock is not locked; len: indicates the length of the index value; flag indicates that the tag, such as the delete tag. Col indicates the column number. Starting from 0, then the key value of the index and the last three parts of the rowid (relative file number, block number, and row number) are: col 0 is the key value, col 1 is the rowid. That is to say, the Leaf node stores the complete index key value and the rowid of the related index key value (this rowid removes the data object number part ), at the same time, the leaf node also stores two pointers (DBA) pointing to the previous leaf node and the next leaf node respectively. in this way, the leaf node is a two-way linked list structure. We can see that the system structure of the B-tree index is described as a tree-like three-dimensional structure. However, the arrangement corresponding to the data file is of course a plane form, that is, like the following. Therefore, when oracle needs to access an index block, it is bound to move on this structure. Www.2cto.com
/Root/branch/leaf /... /Leaves/branches/leaves /... /Leaves/branches/leaves /... /Leaf/branch /..... when oracle needs to obtain an index block, it first starts from the root node and knows the branch node of the next layer based on the key value to be searched, then, access the branch nodes at the next layer, and then access the branch nodes at the next layer based on the key value. In this way, access the leaf nodes at the bottom layer. It can be seen that the physical I/O block is sequential and sequential. In the process of obtaining the final physical block, we cannot read multiple blocks at the same time, because we do not know which block to access when the current block is not obtained. Therefore, when accessing the data block on the index, it will correspond to the db file sequential read wait event. The root cause is that we jumped from one index block to another in order, to find the final index block. Author: linwaterbin

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.