Introduced
B-trees are a balanced, multi-path lookup tree designed to quickly read data (reducing the IO Operation secondary tree) for hard disks.
Most databases and file indexes are now stored using B-trees or transformations.
Directory
1: Why B-trees are high efficiency
2: B-Tree storage
3:b Tree Disadvantages
One: Why the B-tree is highly efficient
In large-scale data storage operations, it cannot be loaded into memory at once. Therefore, the internal and external exchange can not be avoided. Therefore, the less the frequency, the higher the efficiency performance.
Let's take a look at the following picture:
This is a typical B-tree structure with an initial factor of 1000. With a height of only 3 B-trees, you can store 1002001000 of your data.
Suppose we want to query the last data:
1: Load root node search from hard disk. IO once
2: The node of the second layer is loaded according to the pointer information of the root node. IO once
3: Repeat 2. IO once
We found that IO was only used 3 times and we queried the data we wanted. So the B-tree is highly efficient.
Add: The node of the B-tree, shown on the hard disk, is the page in the cylinder or the disk block (block). If you persist the index to memory, it only takes one time.
Efficient basis for B-trees: data is sorted.
II: B-Tree structure
Let's start with a classic picture:
Here is a structure diagram of the B-tree stored in the hard disk.
Add: 17,35 in the root node is called a keyword (key), and more complex types of data are often attached to it in practice.
Another dull definition of the paragraph:
1: Each non-root node has at least t-1 keywords, non-root nodes have at least a T child. T is called the degree (degree), t>=2.
. 2: Each node has a maximum of 2t-1 keywords, with up to 2t children per inner node.
3: Each leaf node has the same depth, that is, the height of the tree H.
Realize
Search: Search from the root node to find the return. The recursive subordinate node cannot be found until the leaf node is found, and the return cannot be found.
Insert: root node inserted, node full split, then full recursive split. Dissatisfaction directly inserted.
Delete: Query to node, delete. The definition merge is not satisfied.
Update: Queries to child nodes to update data.
Three: B-Tree disadvantage
From the above, it is very fast to query a single data. But if the range is checked, the B-tree is queried from the root node every time.
Therefore, in practical applications, often using B-tree deformation, a + + tree to store, only the leaf node storage data, each leaf node points to the next.
Main reference Resources
1: Introduction to Algorithms
Algorithm data structure (i)-B tree