"Oracle" 7. Talking about Oracle's index by B-tree algorithm

Last Update:2016-05-03 Source: Internet

Author: User

Tags in degrees

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Index

1.B Tree Index (b-tree)

B-Tree index is the most commonly used in our daily work index, everyone usually said in the job "index" by default is a B-tree index;

The index is very simple, it is easy to understand, with a book directory to describe the most appropriate, B-tree index structure is similar to the library directory.

Structure of the 2.B Tree index:

The top level of the index is the root, which includes entries that point to the next level of the index. The next level is the branch block, which also points to the lower level of the next layer in the index, the bottom of which is the leaf node, which contains an index entry to the table row. Yushang is a bidirectional association, this side with the key value ascending or descending scan index;

3. Format of index leaf entries

An index entry contains the following components:

Entry Header: Stores the number of columns and locks information

Key column Length/value pairs: Used to define the column size in the key, followed by the column values (the number of such length/value pairs is the maximum number of columns in the index).

4. Characteristics of Index leaf entries

In the B-tree index of a non-partitioned table:

When multiple rows have the same key value, if the index is not compressed, the key value appears duplicated

When a row contains all the keys that are listed as NULL, the row does not have a corresponding index entry. Therefore, a full table scan is always performed when NULL is specified in the WHERE clause.

5. Performance of DML operations on indexes

When a DML operation is performed on a table, the Oracle server maintains all indexes.

The following shows the effect of performing DML commands on indexes:

Performing an insert operation causes the index entry to be inserted in the corresponding block.

Deleting a row results in a logical deletion of the index entry. The space occupied by the deleted row is not available for subsequent new leaf entries.

Updating the key columns results in the logical deletion and insertion of the index. The PCTFREE setting has no effect on the index, except when it is created. You can add a new entry to an index block even if the index block has less space than pctfree specifies.

definition

The B-Tree is a specialized multi-path tree that is especially designed for use on disk. Each node in a B-tree can contain a large number of keys. Each of its nodes can also contain many subtrees. B-Trees are designed to branch out in this multitude of directions and contain a large number of keys for each node, making the tree height relatively small. This means that only a few of the nodes must read the retrieved block from disk. Its purpose is to obtain fast access to data, and with disk drives, which means to read out a very small number of records. It is important to note that a large node size (with a lot of node keys) is also associated with the fact that a disk drive can read data quite a number of times in a match.

B-Tree with order definition

B-Tree is also called balanced multi-path search tree. A M-Order B-Tree (note: Do not simply think that a M-order B-Tree is a M-fork tree, although there are four-fork tree, eight-fork tree, KD tree, and vp/r tree/r* Tree/r+ tree/x-tree/M-Tree/Segment tree/Hilbert R-Tree/Priority R-Tree and other spatial partitioning tree, but with the B-tree is completely different from the

Each node in the tree contains a maximum of M children (m>=2);

Root nodes and leaf nodes, each of the other nodes has at least [Ceil (M/2)] Children (where ceil (x) is an upper-bound function);

Joghen nodes are not leaf nodes, there are at least 2 children (special case: No Child root node, that is, the root node is a leaf node, the whole tree has only one root);

All leaf nodes appear on the same layer, and the leaf nodes do not contain any keyword information (it can be seen as an external contact or a contact where the query failed, in fact these nodes do not exist, pointers to these nodes are null); (Reader feedback @ rengat: There is a mistake here, the leaf node is just a pointer without children and pointing to the child, These nodes also exist, and there are elements. @ Researcher July: In fact, the key is to think of what as a leaf node, because, as in the red and black tree, each null pointer as a leaf knot, but did not draw it out.

Each non-terminal node contains n keyword information: (N,P0,K1,P1,K2,P2,......,KN,PN). which
A) Ki (I=1...N) is the keyword, and the keyword is sorted in ascending order of K (i-1) < Ki.
b) Pi is a contact point pointing to Subtree, and the key of the pointer P (i-1) to all nodes of the subtree is less than Ki, but both are greater than K (i-1).
c) The number of keywords n must satisfy: [Ceil (M/2) -1]<= n <= m-1.

The B-Tree of an M-order satisfies the following conditions:

1. Each node has a maximum of M subtrees tree;

2. Root node, at least one of the other branch nodes is M/2 subtrees tree;

3. The root node has at least two subtrees trees (unless the B-tree contains only one node);

4. All leaf nodes are on the same layer. The leaf nodes of B-tree can be regarded as an external node, and contain no information;

5. The non-leaf nodes with J children happen to have j-1 key codes, and the key codes are arranged in ascending order.

B-tree defined by degrees

For the above 5 points, the following: Each node in the B-tree can contain keywords (such as the previous D H and Q T X) have an upper bound and lower bound. This lower bound can be used as a minimum degree called a B-tree (the Chinese version of the algorithm is translated in degrees, the minimum number of nodes within the node of the youngest child) m (m>=2) is indicated.

Each non-root internal node has a maximum of M children, each non-root node must contain at least m-1 keyword, if the tree is non-empty, the root node contains at least one keyword;

Each node can contain a maximum of 2m-1 keywords. Therefore, an internal node can have up to 2m children. If a node happens to have a 2m-1 keyword, we say that the node is full (and later on the b* tree as a common variant of the B-tree, the b* tree requires that each inner node is at least 2/3 full, rather than half-full as required by the B-Tree here);

When the key word m=2 (t=2 means, mmin=2,m can >=2) when the B-tree is the simplest (there are many people would mistakenly think that B-tree is a binary search tree, but the binary search tree is a binary search tree, B-Tree is a B-tree, B-tree is a tree containing M (m>= 2) A keyword balanced multi-path lookup tree), at which point each inner node may therefore contain 2, 3, or 4 children, i.e. a 2-3-4 tree, whereas in practice, a much larger T-value is usually used.

Each node in the B-tree can contain a large number of keyword information and branches according to the actual situation (of course, it cannot exceed the size of the disk block, depending on disk drives, the size of the general block is around 1k~4k); So the depth of the tree is reduced, This means finding an element as long as a few nodes are read into memory from the external memory disk and quickly accessing the data to be found.

The M-Order multipath tree is an ordered tree in which each node has a maximum of M child nodes. For each node, if k is the actual number of child nodes, the number of keys for the node is k-1. If the keys and subtrees are arranged in a search tree way, then this is called M's sequential multi-path search tree. For example, the following is Order 4. Note that the first row of each node in a multi-search tree shows the key, while the second row shows the child nodes of the pointer. Of course, in any useful application there will be a record of the data associated with each key, so that the first row of each node may have an array of records, where each record contains a key and its associated data. Another approach is to have an array of records containing the first row of each node, where each record contains an associated data record, which is a key and a record number found in another file. This last method is often used when the data record is large. This example will use the first method.

This means that the keys and subtrees are "arranged in front of the search tree"? Suppose we define a node as follows:

typedef struct   {   int Count;         Number of keys stored in the current node   ItemType key[3];   Array to hold the 3 keys   long branch[4];    Array of fake pointers (record numbers)   } NodeType;

Then, the 4-step multipath search tree must meet the following conditions in the order of the associated keys:

The keys on each node are sorted in ascending order.

The following conditions are met for each given node (called a node):

Start recording node.branch subtree [0] less than Node.key key [0].

The Node.branch subtree starting from the record [1] has a larger key than Node.key [0] and is less than node.key[1 at the same time].

Starting with the record node.branch subtree [2] has only a larger key than Node.key [1] and is less than node.key[2 at the same time].

Starting with the record node.branch subtree [3] has only those larger than Node.key [2].

Note that if less than the full numeric key is in node, these 4 conditions are truncated so that they speak the appropriate number of keys and branches.

This is generalized to a way that has multiple paths with other sequential search trees.
M-Order B-Tree is M-order, making the multi-path search tree:

All leaf nodes are at the bottom;

All internal nodes (perhaps except the root node) have at least ceil (M/2) (non-empty) child nodes;

The root node can have as little as 2 child nodes if it is an internal node and can obviously have no child nodes if the root node is a leaf (that is, the entire tree is only by the root node);

Each leaf node (more than if it is a root node) must contain at least ceil (M/2)-1 keys.

Note that ceil (x) is called an integer function. Its value is the smallest integer greater than or equal to X. thereby ceil (3) = 3,ceil (3.35) = 4,ceil (1.98) = 2,ceil (5.01) = 6,ceil (7) =7 etc.

B-Tree is by virtue of the fact that all leaf nodes must be fairly well balanced at the bottom of the tree. Conditions (2) trying to keep the tree by insisting that at least half of each node is the maximum number of child nodes is quite dense. This causes the tree to "fan out" so that the path from the root to the leaf is short, even in a tree that contains a large amount of data.

Operation

The following is a 5-order example of a B-tree, which means that all internal nodes (other root nodes) have at least ceil (5/2) = ceil (2.5) = 3 child nodes (and therefore at least 2 keys). Of course, the maximum number of child nodes a node can have is 5 (the maximum number of keys is 4). Depending on condition 4, each leaf node must contain at least 2 keys. In practice, the order of B-trees is far greater than 5.

Q: How do you find the search s and j in the tree above? How would you do a sort of "in order" traversal, that is, will produce alphabetical traversal in ascending order? (doing this kind of traversal is inefficient because it will require a lot of disk activity, so it will be slow!) ）

Insert a new block

According to Cruise (see the following reference) the insertion algorithm process is as follows: When inserting a block, first make a search for it on the B-tree. If the block is not already in the B-tree, this unsuccessful search will end at a leaf node. If there are still empty nodes in the leaf node, insert the new block here. Note that this may require some existing keys to move one to the right to divide the space into new blocks. Conversely, if the leaf node is full, so that no room is added to the new item, then the node must be "split" about half the keys into a new node to the right of this one. The middle value (middle) key is moved up to the parent node. (Of course, if the node has no room, it may have to be split into good.) Note that when added to an internal node, it is not only possible that we have a position to move the right side of some keys, but the relevant pointers must be moved to the right to be good. If the root node is constantly splitting, the median key moves up to a new root node, causing the tree to increase by one at height.

Let's give an example by way of a similar approach by cruise. Insert the following letter into the order 5 empty B-Tree: C N G A H E K Q M F W L T Z D p r x y s three-dimensional P-[R x y of order 5 refers to nodes can have up to 5 children and 4 keys. There must be a minimum of 2 keys for all other nodes than the root. The first 4 letters are inserted into the same node, resulting in such a result:

When we try to insert H, we find that there is no space in this node, so we divide it into 2 nodes, moving the average commodity g into a new root node. Note that in practice, we have just left a and C in the current node and inserted h and n into a new node, located on the right side of the original node.

The operation of inserting e,k and Q can proceed without requiring any splitting:

Inserting m requires a split. In addition, M happens to be the median key and so on is moved up to the parent node.

Then don't need any division to add the letters F,W,L and T.

When Z is added, the rightmost leaf node must be split. The middle block T is moved up to the parent node. Note that by moving the average key up, the tree is kept fairly balanced, each resulting in a 2-key node.

The insertion of D causes the leftmost leaf to be split. D is exactly the median key, etc. is a move up to the parent node. The letters p,r,x and Y are then added without any splitting:

Finally, S is inserted, using N,P,Q and R to split the node, sending the number of bits Q up to the parent node. However, the parent node is full, so it splits, sending the number of bits m, highest to form a new root node. Note How to check in the 3 pointer from the old parent node in the node that contains the D and G revisions.

Delete Block

When we leave in the last part, delete H. Of course, we first search to find H. Since h is in the leaf and the blade has a minimum number of keys than the more, this is easy. We moved in and H was above K and L, where K was over. This shows that:

Next Delete T. Since t is not a leaf, we find its successor (the next block in ascending order), this happens to be W, moving W instead of T. So what we really want to do is from the leaf node, we already know what to do because this leaf has extra keys to remove W in all cases, we reduce the missing to the missing in a leaf node, by using this Method.

Next, delete R. Although r is at the leaf node, this leaf node does not have an extra key; Deleting a node in the result has only one key, if the sibling node is not in the order of 5 B-tree Accept or there is an extra button on the right side, we can borrow the parent node and move the sibling nodes. In our specific case, the brothers and sisters ' rights have an extra key. Therefore, inheriting wˉˉs (the last key of the node where the deletion occurred) is moved down from the parent node and moved up in X. (Of course, move on S to allow W to be inserted in the appropriate position.) ）

Finally, let's delete e this time will cause a lot of problems. Although e is at the leaf node, the leaves have no excess keys, and there are no siblings left or right in front of them. In this case, leaf nodes can be combined with one of these two siblings. This includes moving down, which is the key to the parent node between these two leaf nodes. In our example, let's combine the C-leaf junction with the F-containing node We also move Down D.

Of course, you immediately see that the parent node now contains only one key, g which is unacceptable. If the problem node has a sibling to its direct left or right with a spare key, then we will again "borrow" the key. Suppose for the right sibling (with the QX node) in it has a more important somewhere q right then we'll try to make m down to the node's child nodes and move Q. However, the old Zuozi of Q will then have to become M's right subtree, in other words, n p nodes, which will be appended to the right of the new position of m by this pointer field. Because in our case, we have no way to borrow the key of the same sibling, we must once again combine with the sibling node and move down from the parent node. In this case, the height of the tree is shrunk by one.

An example

Here is a 5 B-tree in a different order, let's try to remove the C-node from it.

We first find the direct successor, which will move D to replace C., too many nodes need to be moved will make our work difficult.

Since neither sibling has an extra key on the left or right side of the node that contains E, we must combine the nodes with one of these two siblings. Let's consolidate from a B-node.

But now there are not enough keys for the F-node. However, its siblings have an extra key. Therefore, we borrow m nodes from the brothers, move to the parent node, and bring the J node to join the F-node note, the K-node and the L-node are reconnected to the right of the J-node.

Another example of B-tree

The following is a fully coded example. There are two scenarios: Btmake creates a B-tree table, and Btread allows the user to look up a table (read) block from the B-tree. In this embodiment, each key is a word and the associated data is the definition of the word. Coding details are quite complex.

Itemtype.h

Table.h (set up an abstract base class table)

Btree.h (derived B-Tree table Class)

Btree.cpp

Btree.txt (text file used for data)

Btmake.cpp (Creating a B-tree table)

Btread.cpp (reading data from the B-Tree table)

In btree.h you will find that we are building a 12-order B-tree. Each node of the B-tree is a count that contains an array of keys (actually keys and related data), a child node pointer array (record number), and how many keys are actually stored in that node. The latter is required because the node may not fully fill the key.

In the same file you will find the declaration of the Bttableclass class. Note that there are four data fields listed: How many nodes of the root node record number are in the B-tree count, and the number of bytes per node (whenever we do input node/output required), we are using the entire current node. This last field gives us a convenient place to put the data for the nodes that we work on at any given point. Keep in mind that objects of this class inherit three data fields from the abstract base class: The number of tables in the file stream, the items (including the word and its definition), as well as instructions if we open the table with a character read or write mode. You can see a lot of these purposes and the details of one of its associated data files in a separate drawing. If you prefer, use the enlarged version of the graph.

Note how the definition of debug can be commented on in btree.h. If you want to run already contains debug code make sure it is a comment. Look again for the function dumps, check and checksubtree further details. In Btmake.cpp note how to load the function check to see if debug has been defined, if yes, to execute some debug code. Even if there is one in btree.cpp, it depends on whether debug is defined conditionally to collate some debug code. Here, the program ends and the key point printed in the program is a letter for the indication of what operation is being performed, and the debug code prints the entire B-tree dump. Are the letters and their meanings are as follows:

R-read from file execution

W-Make a write file

Performing Push-P

I-the insert is being executed

Performing Split-S

No attempt was made here to explain all the details of this example program, as it is quite time consuming and exhausting. A detailed explanation of the search function of the B-tree to illustrate that some of the program's operations are available. If needed, the reader can examine the features more closely. Most of the functionality is actually quite a push-down feature, which is quite complicated by the exception of simplicity.

Change

According to the changes in Schaeffer (see Resources), B-trees and large business databases are often used to provide fast access to data. In fact, he said, they are "for needs to be inserted, deleted, and the main scope of the search should be?" The standard document of the program. The variant called B + tree is regular. Another variant is the B * tree, which is very similar to the B + tree, but tries to keep the node about two-thirds full minimum.

In the B + tree, the data records are stored only on the leaves. The internal node stores only the key. These keys are used to indicate a search to the appropriate leaf. If a target key is less than the key in the internal node, then only the pointer to the left of it is followed. If a target key is greater than or equal to the key in the internal node, then only the pointer to the right behind it. The blades are also connected together so that all B + tree keys can be traversed in ascending order simply by going through all the nodes along the bottom level of the tree through this list.

When a B + tree is implemented on disk, it is possible that the leaf contains a key, which refers to a record where the pointer field points to the data associated with the key. This allows data files to exist separately from the B + tree, whose functionality is given as an "index" to the sort of data in the data file. This is how the B + tree is used in the database. Of course, the pointer is a record number, the typical dummy pointer we use when we create dynamic Data structures on disk. Note that this B + Tree index scheme allows a data file to be sorted with several such indicators, each given by a different key field.

As an example, consider a B + tree in order 200, whose leaves can each contain up to 199 keys (about 200). Let's assume that the root node has at least 100 child nodes (although we know it is allowed at least two). A2 level B + Tree to meet these assumptions can be stored up to about 10,000 of the records, since at least 100 leaf nodes, each containing at least 99 keys (about 100). This type of A3 level B + tree can store up to approximately 1 million items. A4 level B + Tree, which can store up to approximately 100 items. In order to improve data access speed, the root node is usually saved in main memory. Even the child nodes of the root can be accommodated in the main memory. Therefore, one can find that only 2 or 3 disk reads a key of billions. If, because it is common, the associated data records are stored in a separate file, there is an additional reading of the data associated with the key. Also, note that if the root node has fewer, we assume 100 child nodes, which will further slow down lookups.

Business Ideas

B-Tree algorithm is very practical, but his first translation, feel quite deep, write unclear, welcome to exchange discussions.

The use of Oracle Index makes our operation simple and quick, I believe we have deep experience.

Algorithm, in-depth thinking, can make people become smart Oh! Share!

Reference:

Cis-btree

B-Tree Index Learning Summary

Elementary introduction to algorithm and data structure (10): Balance Tree B Tree

Introduction to the structure of B-tree indexes and bitmap indexes

CSDN Blog B-tree Index

"Oracle" 7. Talking about Oracle's index by B-tree algorithm

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More