Everyone is DBA (VII) B-Tree and + + Tree

Source: Internet
Author: User

B-Tree (B-tree) is a balanced lookup tree designed for auxiliary access devices such as disk, which implements find, sequential read, insert, and delete operations with O (log n) time complexity . Because the B-tree and B-tree variants perform well in reducing the number of disk I/O operations, they are often used to design file systems and databases.

    • Node relationships in B-trees
    • Definition of B-tree
    • Operation of the B-tree
    • Variant of B-tree
    • Advantages of B + Tree
    • B + Tree C # code implementation

In 1972, Rudolf Bayer and Ed McCreight, who worked at Boeing Labs, invented the B-tree. At that time they did not explain what the "B" in the B-tree meant, so many guesses were "Balanced", "Broad", "Bushy", "Boeing", "Bayer", etc., but usually use "Balanced" to describe the tree as balanced. "The more you think about B, the more you understand B-trees," said Ed McCreight, who answered the question of the origins of "B" in the CPM 2013 meeting. "。

As a B-tree with a key value of English letters, a node with a shallow shadow is a node to check when looking for the letter R.

Node relationships in B-trees

The nodes in the B-tree are divided into internal nodes (Internal node) and leaf nodes (leaf node), and the internal nodes are the non-leaf nodes (non-leaf node).

The internal node of the B-tree can contain more than 2 sub-nodes, so at design time you can pre-set the number range that can contain child nodes, which is the upper bound (Upper Bound) and the lower bound (Lower Bound). When you insert or delete data to a node, it also means that the number of child nodes changes. In order to maintain a predetermined number of ranges, internal nodes may be merged (Join) or split (split). Because the number of child nodes has a certain range, B-trees do not need to change frequently to maintain balance. But at the same time, because the nodes may not be fully populated, some space is wasted.

Each internal node in a B-tree contains a certain number of key values (keys). These key values also play the role of dividing the child nodes. For example, if an internal node contains 3 child nodes, you must actually have 2 key values: A1 and A2. where all the values on the left subtree of the A1 are smaller than A1, the values in the subtree between A1 and A2 are greater than A1 and all values on the right subtree of A2,A2 are greater than A2.

Typically, the number of key values is set between D and 2d, where D is the minimum number of key values that can be included. It is known that D + 1 is the smallest number of nodes that can have child nodes, that is, the minimum degree of the tree (degree). Factor 2 will ensure that the nodes can be merged or split.

If an internal node has a 2d key value, then adding a key value to the node will cause the number of 2d + 1 to be greater than the upper bound of the range, then split the number of 2d + 1 nodes into 2 d number of nodes, and 1 key values are promoted to the parent node.

Similarly, if an internal node and its neighbor node (Neighbor) contain a D key value, deleting a key value will cause the node to have a D-1 key value that is less than the lower bound of the range, which results in merging with the neighbor node. The combined node includes the number of d–1 plus the number of neighbors D and the 1 key values in the parent node of both, a total of d–1 + D + 1 = 2d of the number of nodes.

Depth (Depth) describes the number of layers (levels) in the tree. The B-Tree maintains the balance of the tree by requiring all leaf nodes to remain at the same depth. Depth usually grows slowly as the key value is added continuously.

Definition of B-tree

Some of the terms in the B-tree definition are often confusing, such as the definition of order. Knuth Donald In 1998 defines the order as the maximum number of nodes that the node contains.

Using the order to define a B-tree, a M-order B-tree, the following conditions need to be met:

    1. Each node contains a maximum of M child nodes.
    2. Root nodes, each non-leaf node contains at least M/2 child nodes.
    3. If the root node contains child nodes, it contains at least 2 child nodes.
    4. A non-leaf node that has a K child node will contain k-1 key values.
    5. All leaf nodes are in the same layer.

The following is a B-tree with height (height) of 3.

The number of disk accesses required for most operations on the B-tree is proportional to the height of the B-tree.

Use h to represent the height of the B-tree; Use N to represent the number of key values in the entire tree n > 0;m is the maximum number of child nodes that an internal node can contain, when the tree is full n = mh–1; Each internal node contains a maximum of m-1 key values; using d means that the internal node can contain the minimum number of child nodes , that is, the minimum degree (degree) has D =⌈m/2⌉.

The h for the optimal condition of the B-tree is:

The worst-case condition of the B-tree is h:

Reasonable selection of the minimum degree of D value, you can greatly reduce the height of the tree, you can reduce the number of disk access required to find any key value. Compared to the self-balancing binary tree, the height is increased by the speed of O (log n), which is many times larger than the logarithm of the B-tree. For most tree operations, the B-tree saves approximately the number of node checks of the LG D factor compared to the self-balancing binary tree. Because finding any node in a tree usually requires one disk access, the number of disk accesses is greatly reduced.

Operation of the B-tree

Query operations

Querying a key value in a B-tree is similar to querying a key value in a binary tree. Queries are started from the root node, and the top-down traversal is performed recursively. On each layer, the query key value is compared to the key value in the internal node to determine which subtree to traverse.

Binary tree related queries can refer to the following articles:

    • "Binary search Tree"
    • "Self-balancing binary search tree"
    • "Balancing The Search Tree" (2-3-4 tree)

Insert operation

When inserting a new key value, first locate the leaf node where the key value should be inserted in the tree:

    • If the leaf node contains the number of key values in the upper and lower bounds of the set, the new key values are inserted directly, and the key values are kept in the order of the nodes.
    • Otherwise, the node is full and the node is split into 2 nodes:
      1. Select middle Value (Median) as the dividing point;
      2. The key value less than the median value is placed in the New left node, and the key value greater than the median value is placed in the New Right node;
      3. Inserts an intermediate value into the parent node. This can cause the parent node to be full and split in the same way. If the parent node does not exist, such as a root node, a new parent node is created, causing the tree to grow in height.

Delete operation

Deleting the key values in the B-tree can be implemented by different policies, which describe the common location-deletion strategy: Remove the key values after locating them, and then refactor the entire tree to balance. Balance refers to the nature of the B-tree still maintained.

    1. Searches for the location of the key value to be deleted.
    2. If the key value is in the leaf node, it is deleted directly.
    3. If the key value is in the internal node, because it is playing the role of the split child node, it is necessary to find an alternative key value to continue the split of two child nodes after deletion. At this point, you can select the largest key value in the left child node, or the smallest key value in the right child node. Removes the selected key value from the child node and inserts it into the replaced position.
    4. If the node after removing the key value does not meet the requirement for the minimum number of key values, the entire tree needs to be rebalanced, and the balance operation includes rotation (Rotation), composition (Join), and so on.

Variant of B-tree

The term "B-tree" In practice also represents a variety of B-tree variants, they have similar structure, but each has its own characteristics and advantages:

    • The key value stored in the B-tree in its internal node is no longer stored in the leaf node, and the internal node stores not only the key value, but also the key value associated with the ancillary data, or a pointer to the associated satellite data. At the same time, the B-tree maintains a 1/2 padding of the internal nodes.
    • The b + Tree is a variant of the B. Tree, and the key values stored in the internal node also appear in the leaf nodes, but the associated satellite data or pointers are not stored in the internal nodes. Not only the key values are stored in the leaf node, but the associated ancillary data or pointers are also stored. In addition, the leaf node adds a pointer to the next sequential associated leaf node to improve sequential read speed.
    • The b* Tree is also a variant of the B-tree, requiring the internal nodes outside the root node to be at least 2/3 populated, rather than 1/2. To maintain this structure, when a node fills up, it does not split the node immediately, but shares its key value with the next node, and when all two nodes are filled, the 2 nodes are split into 3 nodes.
Advantages of B + Tree

The B + Tree is a variant of the B. Tree, and the key values stored in the internal node also appear in the leaf nodes, but the associated satellite data or pointers are not stored in the internal nodes. Not only the key values are stored in the leaf node, but the associated ancillary data or pointers are also stored. In this way, all the subordinate data is stored in the leaf node, and only the key and child pointers are saved in the inner node, thus maximizing the branching capability of the inner node.

In addition, the leaf node adds a pointer to the next sequential associated leaf node to improve sequential read speed.

Common file systems and databases are implemented using B + trees, for example:

    • file Systems : NTFS, ReiserFS, NSS, XFS, JFS, ReFS, BFS, EXT4;
    • relational database : DB2, Informix, SQL Server, Oracle, Sybase ASE, SQLite;
    • NoSQL Database : CouchDB, Tokyo Cabinet;

The advantages of the B + tree are :

    • Because the internal node does not store the secondary data associated with the key value, the space saved by the internal node can hold more key values. It also means that more key-value information is available when you access a page from disk.
    • The leaf nodes form a chain, so a full scan of the tree is a linear traversal of all the leaf nodes.
B + Tree C # code implementation

Project Bplustreepractice on GitHub implements a simple B + tree using the C # language, which includes inserting key-value pairs, searching keys, deleting keys, storing to disk block files, etc., but not implementing the two-way list and scanning capabilities of the node chain.

The following is a sample test code.

1 usingSystem;2 usingSystem.IO;3 4 namespaceBplustreepractice5 {6   class Program7   {8     Static voidMain (string[] args)9     {Ten       //Specify Disk File location One       stringTreefilename = A         @"E:\BPlusTree_"+ DateTime.Now.ToString (@"yyyymmddhhmmssffffff") +". Data"; -Stream Treefilestream = -         NewFileStream (Treefilename, FileMode.CreateNew, fileaccess.readwrite); the  -       //Initialize B + tree, fixed-length string as key, map to long reshape -       intKeylength = -; -       intNodecapacity =2; +Bplustree tree = -Bplustree.initializeinstream (Treefilestream,0L, Keylength, nodecapacity); +  A       //Insert 0 to 7 a total of 8 key-value pairs at        for(inti =0; I <8; i++) -       { -Tree. Set (i.ToString (), (Long) (I * +));//Key is a string, Value is a long type -       } -  -       //output a B + tree to the command line in Console.WriteLine (tree. Totext ()); -  to       //gets the specified key-value pair + Console.WriteLine (); -Console.WriteLine (string. Format ("Tree ' s first key is {0}.", tree. Firstkey ())); theConsole.WriteLine (string. Format ("Check key {0} exists {1}.",  *         "3", tree. ContainsKey ("3"))); $Console.WriteLine (string. Format ("{0} ' next key is {1}.","6", tree. Nextkey ("6")));Panax NotoginsengConsole.WriteLine (string. Format ("Get key {0} with value {1}.","2", tree. Get ("2"))); -Console.WriteLine (string. Format ("Index key {0} with value {1}.","4", tree["4"])); the Console.WriteLine (); +  A       //Delete a key value pair theTree. RemoveKey ("6"); + Console.WriteLine (tree. Totext ()); -  $ Console.readkey (); $     } -   } -}

This article, "B-tree and + + Tree" by Dennis Gao published from the blog Park Personal technology blog, without the author's consent to prohibit any form of reproduction, any automatic or artificial crawler reprint or plagiarism behavior are bullying.

Everyone is DBA (VII) B-Tree and + + Tree

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.