Turning to algorithms and data structures: a ten-balanced search tree of B-trees

Source: Internet
Author: User
Tags mysql index

The 2-3 trees in the balance lookup tree are explained earlier and the red and black trees are implemented. 2-3 tree species, a node has a maximum of 2 keys, while the red-black tree uses a staining method to identify the two keys.

Wikipedia defines B-trees as "in computer science, B-Tree (B-tree) is a tree-like data structure that can store data, sort it, and allow the time complexity of O (log n) to run for lookups, sequential reads, insertions, and deletions. B-Tree, which is generally a node can have more than 2 child nodes of the two-fork lookup tree. Unlike the self-balancing binary lookup tree, B-tree is the system's most optimized read and write operation for large chunks of data . The b-tree algorithm reduces the intermediate process that is experienced when locating records, thus speeding up the access speed. Widely used in databases and file systems . ”


A B-tree can be seen as an extension of the 2-3 lookup tree, that is, he allows each node to have M-1 child nodes.

    • Root node has at least two child nodes
    • Each node has a M-1 key and is sorted in ascending order
    • The values of the child nodes at M-1 and M key are located between M-1 and M key corresponding to value
    • Other nodes have at least M/2 child nodes

is a m=4-order B-Tree:

You can see that the B-tree is an extension of 2-3 trees, and he allows a node to have more than 2 elements.

The insertion and balancing of the B-tree is similar to the 2-3 tree, which is not covered here. The following is inserted into the B-tree in turn

6 10 4 14 5 11 15 3 2 12 1 7 8 8 6 3 6 21 5 15 15 6 32 23 45 65 7 8 6 5 4

The Demo animation:

b + tree is a kind of deformation tree of the tree, which differs from B-tree in that:

    • The nodes with K nodes must have K key codes;
    • The non-leaf nodes only have index function, and the information related to the records is stored in the leaf node.
    • All leaf nodes of a tree form an ordered list that can traverse all records in the order in which the key codes are sorted.

For example, is a B + tree:

Is the Insert animation for B + trees:

The difference between B and C + trees is that the non-leaf nodes of the + + tree contain only navigational information and do not contain actual values, and all leaf nodes and connected nodes are linked using a linked list for easy interval lookup and traversal.

The advantages of B + trees are:

    • Since the B + tree does not contain data information on the internal node, it can store more keys in the memory page. The data is stored more tightly and has better spatial locality. So accessing the leaves at several points on the associated data also has a better cache hit ratio.
    • B + Tree leaf nodes are chain-linked, so the convenience of the whole tree only need a linear traversal of the leaf node. And because the data order is arranged and connected, it is convenient to find and search the interval. The B-tree requires recursive traversal of each layer. Adjacent elements may not be contiguous in memory, so cache hit is not as good as a B + tree.

The advantage of B-trees, however, is that since each node in the B-tree contains both key and value, the frequently accessed elements may be closer to the root node and therefore more quickly accessed. Here is a diagram of the difference between B-and + + trees:


The analysis of B-and + + trees is similar to the analysis of the 2-3 trees previously explained,

For a subtree with a node of n degrees m, finding and inserting requires logm-1n ~ logm/2n times comparison. This is a good proof that for the B-tree with degrees M, the number of child nodes per node is between M/2 and M-1, so the height of the tree is between logm-1n and logm/2n.

This efficiency is very high, for n=62*1000000000 nodes, if the degree is 1024, then logm/2n <=4, that is, in 62 billion elements, if the degree of the tree is 1024, it only needs less than 4 times to navigate to the node, Then use the binary search to find the value you are looking for.


B-Tree and + + are widely used in file storage systems and database systems, before we look at the common storage structure before we explain the application:

The main memory of our computer is basically random access memory (random-access Memory,ram), he is divided into two categories: Static random access memory (SRAM) and dynamic random access memory (DRAM). SRAM is faster than DRAM, but much more expensive, generally as a CPU cache, and DRAM is usually used as memory. This kind of memory their structure and storage principle is more complex, basically is the use of electrical signals to save information, there is no machine operation, so the access speed is very fast, the specific access principle can be viewed Csapp, in addition, they are volatile, that is, if the power outage, the storage of DRAM and SRAM saved information will be lost.

The more we use disk, the disk can hold a lot of data, from GB to terabytes, but his reading speed is slow, because it involves machine operation, reading speed is millisecond, from DRAM reading speed is 100,000 times times faster than disk, from the SRAM read speed than from disk read faster than 1 million times times. Here's a look at the structure of the disk:

For example, disks consist of platters, each with two sides, also known as a disk surface, which is covered with magnetic materials. The center of the platter has a rotatable spindle (spindle) that rotates the disc at a fixed rotational rate, typically 5400 rpm (Revolution per minute,rpm) or 7200RPM. The disk contains more than one such platter and is encapsulated within a sealed container. On the left, a typical disk surface structure is shown. Each surface is composed of a set of concentric circles that become tracks (track), each of which is divided into a set of sectors (sector). Each sector contains an equal number of data bits, usually (512) sub-sections. Sectors are separated by gaps (GAP) and do not store data.

The above is the physical structure of the disk, now look at the disk read and write operations:

For example, a disk with a read/write header to read and write bits stored on a magnetic surface, while the read-write head is connected to one end of a transmission arm. By moving the drive arm back and forth along the radius axis, the driver can position the read-write head on any track, which is called a seek operation. Once the track is positioned, the disc rotates and each bit on the track passes through the head, the read-write head can perceive the value in place, and the value can be modified. Access time to the disk is divided into seek time , rotation time , and transfer time .

Due to the characteristics of the storage media, the disk itself is much slower than main memory, coupled with the mechanical movement of the cost, so in order to improve efficiency, to minimize disk I/O, reduce read and write operations. To do this, the disk is often not read strictly on-demand, but is read-ahead every time, even if only one byte is required, and the disk starts from this location, sequentially reading a certain length of data into memory. The rationale for this is the well-known local principle in computer science:

When a data is used, the data around it is usually used immediately.

The data that is required during the program run is usually relatively centralized.

Due to the high efficiency of disk sequential reads (no seek time required and minimal rotational time), pre-reading can improve I/O efficiency for programs with locality.

The length of the read-ahead is generally the integer multiple of the page. Page is the logical block of Computer Management memory, hardware and operating system tend to divide main memory and disk storage area into contiguous size equal blocks, each storage block is called a page (in many operating systems, the page size is usually 4k), main memory and disk in the page to exchange data. When the program to read the data is not in main memory, will trigger a page fault, the system will send a read signal to the disk, the disk will find the starting position of the data and sequentially read one or several pages back into memory, and then return unexpectedly, the program continues to run.

The designers of file systems and database systems use the principle of disk pre-reading to set the size of a node equal to one page, so that each node can be fully loaded with only one I/O. To achieve this, the following techniques are required to implement B-tree in practice:

Each time a new node is created, directly request a page space (512 or 1024), so that a node is physically stored in a page, in addition to the computer storage allocation is page-aligned, the implementation of a node only one time I/O. For example, the degree m of the B-tree is set to 1024, so that in the previous example, only less than 4 lookups in 60 billion elements can be located to a storage location.

At the same time in the B + tree, the inner node only stores the navigation key, does not store the specific value, so that the number of nodes less, to be able to read all the main memory, external contact storage key and value, and order, with good spatial locality. So B-plus + trees are more suitable for data structures with file systems. Below is a B-tree for content storage.

In addition, b/b+ tree is often used as the index of the database, it is recommended that you directly see the Zhang Yang MySQL index behind the data structure and algorithm principle this article, this article on how to use B + tree in MySQL index has a more detailed introduction, recommended reading.


In the previous two articles, we introduced the 2-3 trees in the balance finding tree, and after the red and black trees, this paper introduces the b/b+ tree which is commonly used in file system and database system, and he expands the number of storage for each node, so that the continuous data can be quickly located and accessed, which can reduce the searching time effectively. Increase the spatial locality of storage and thus reduce IO operations. He is widely used in file systems and databases, such as:

    • WINDOWS:HPFS File System
    • mac:hfs,hfs+ File System
    • Database: Oracle,mysql,sqlserver, etc.

I hope this article will help you understand the b/b+ tree.

Turning to algorithms and data structures: a ten-balanced search tree of B-trees

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.