Lightning MDB source Code Analysis series (3)

Source: Internet
Author: User

The first two chapters of this series have described the system architecture and the underlying memory mapping of the system build, which describes in detail the core of Lmdb, external memory b+tree operations. This paper describes the usage of B+tree in Lmdb from the basic principle, memory operation mode, external memory operation mode and related functions in Lmdb.

Introduction

The dynamic find tree is mainly: two fork search tree (binary search trees), Balanced binary search tree (Balanced binary search trees), red black tree (Red-black tree), B-tree/b+-tree/b*-tree (B ~tree). The first three is a typical two-fork search tree structure, the time complexity O(log2N) is related to the depth of the tree, so the depth of the tree will naturally improve the search efficiency, in the case of large-scale data storage, such as 1 million key needs to be compared, Binary tree query, you need to access the number of disk IO will be 20 times to the existing disk random access performance, for large applications this is unacceptable performance, the basic idea is to talk about multiple key or binary tree subtree stored together, in a page for access, such as organized into form, 1 million key compared with IO access times only two times, through further optimization, B-tree came into being.

The B-tree series can be thought of as a multi-path balanced lookup tree that effectively lowers the tree hierarchy and supports external memory data organization.

definition

B-tree is also called balanced multi-path lookup tree. The characteristics of an M-order B-tree (M-fork tree) are as follows:

(where Ceil (x) is a function that takes the upper limit)

1) Each node in the tree has a maximum of M children;

2) root nodes and leaf nodes, each node has at least ceil (M/2) children;

3) Joghen node is not a leaf node, then at least 2 children (special case: No Child root node, that is, the root node is a leaf node, the whole tree has only one root);

4) All leaf nodes appear on the same layer, and the leaf nodes do not contain any keyword information (which can be seen as an external node or a node where the query failed, in fact these nodes do not exist, pointers to these nodes are null);

5) Each non-terminal node contains n keyword information: (N,P0,K1,P1,K2,P2,......,KN,PN). which

A) Ki (I=1...N) is the keyword, and the keywords are sorted by order K (i-1) < Ki.

b) Pi is a contact point pointing to Subtree, and the key of the pointer P (i-1) to all nodes of the subtree is less than Ki, but both are greater than K (i-1).

c) The number of keywords n must meet: Ceil (M/2)-1 <= n <= m-1.

Each node in the B-tree can contain a large number of keyword information and branches according to the actual situation (of course, it cannot exceed the size of the disk block, depending on the disk drives), the size of the general block is around 1k~4k, so the depth of the tree is reduced. This means finding an element as long as a few nodes are read into memory from the external memory disk and quickly accessing the data to be found.

For simplicity, here is a small amount of data to construct a 3-fork tree form. The above figure, such as the root node, where 17 represents a file name for a disk, the Red square indicates where the contents of the 17 file are stored on the hard disk, and P1 represents a pointer to the 17 left subtree.

Its structure can be simply defined as:

typedef struct {

/* Number of files */

int file_num;

/* File name (key) */

char * file_name[max_file_num];

/* Pointer to child node */

Btnode * Btptr[max_file_num+1];

/* The location where the file is stored on the hard disk */

File_hard_addr Offset[max_file_num];

}btnode;

If each disk block can store exactly one b-tree node (2 file names exactly). Then a Btnode node represents a disk block, and the subtree pointer is the address that holds the other disk block.

To simulate the process of finding file 29:

(1) Locate the root disk Block 1 of the file directory according to the root node pointer, and import the information into the memory. "Disk IO operation 1 times"

(2) In memory, there are two file names 17,35 and three data that store other disk page addresses. According to the algorithm we find 17<29<35, so we find the pointer p2.

(3) According to the P2 pointer, we locate the disk Block 3 and import the information into the memory. "Disk IO operation 2 times"

(4) In memory, there are two file names 26,30 and three data that store other disk page addresses. According to the algorithm we find 26<29<30, so we find the pointer p2.

(5) According to the P2 pointer, we locate the disk Block 8 and import the information into the memory. "Disk IO operation 3 times"

(6) There are two filenames in memory at this time 28, 29. According to the algorithm we find the file 29, and locate the disk address of the file memory.

Analyzing the above procedure, it is found that 3 disk IO operations and 3 memory lookup operations are required. As for the file name lookup in memory, because it is an ordered table structure, you can use binary lookup to improve efficiency. As for the 3 disk IO operations, the determining factor that affects the overall B-tree lookup efficiency.

Of course, if we use a balanced binary tree disk storage structure to find, disk IO operations at least 4 times, up to 5 times. And the more files, the less disk IO operations The B-tree uses to balance the binary tree, and the more efficient it is.

The above only introduces the b-tree of this structure, as well as the insertion and deletion of the tree nodes, and the implementation of the relevant algorithms and code, which will give the corresponding examples in the further study.

The above is a brief introduction to the use of b-tree this structure how to access the data on the external memory disk, let's take another example of this b-tree insert (insert), delete Basic operation is described in detail:

The following is a 5-step B-tree example, as shown:

It satisfies the above conditions: root node and leaf node, each other node has at least ceil (5/2) = 3 children (at least 2 keywords); Of course up to 5 children (up to 4 keywords). The keywords in uppercase letters, in ascending alphabetical order.

The nodes are defined as follows:

typedef struct{

int Count; Number of key elements in the current node

ItemType Key[4]; Storing an array of keyword elements

Long branch[5]; Pseudo-pointer array, (number of records) for easy determination of merging and splitting situations

} NodeType;

insert Operation : When inserting an element, first in B-tree, if it does not exist, it ends at the leaf node, and then inserts the new element in the leaf node, note: If the leaf node space is sufficient, Here, you need to move the element to the right that is larger than the newly inserted keyword, and if the space is full so that there is not enough space to add the new element, divide the node into a new adjacent right node and split the half number of key elements into the parent node (of course, If the parent node space is full, the split action is also required, and when the key element in the node moves to the right, the relevant pointer also needs to move to the right. If a new element is inserted at the root node and the space is full, the split operation is performed so that the intermediate key element in the original root node moves up to the new root node, thus causing the tree's height to increase by one level.

Let's step through an example. Insert the following character letter to the empty 5-order B-tree: C N G A H E K Q M F W L T Z D P R X Y s,5 Order means a node with a maximum of 5 children and 4 keywords, the root node outside the nodes have at least 2 keywords, first, the node space enough, 4 letters into the same knot Points, such as:

When we try to insert H, the node discovers that there is not enough space to divide it into 2 nodes, move the middle element g up to the new root node, and in the implementation, we leave A and C in the current node, and H and N place the new right neighbor node. Such as:

When we insert e,k,q, we don't need any split operation.

Inserting m requires a split, note that M happens to be an intermediate keyword element, so it moves up to the parent node

Inserting f,w,l,t does not require any split operations

When inserting z, the right-most leaf node space is full, splitting is required, the middle element t is moved up to the parent node, note that by moving the middle element up, the tree eventually remains balanced, and the node of the split result has 2 key elements.

Inserting d causes the leftmost leaf node to be split, D is exactly the middle element, moves up to the parent node, and then the letter p,r,x,y is inserted without any split operation.

Finally, when inserting s, the nodes that contain n,p,q,r need to be split, the intermediate element Q is moved to the parent node, but the situation is that the space in the parent node is full, so also split, move the middle element m above the parent node to the newly formed root node. Note that the third pointer that was previously in the parent node is modified to include the D and G nodes. The completion of such a specific insert operation, the following describes the delete operation, the deletion operation is relative to the insert operation to consider the situation more points.

Remove (delete) Action: first find the element to be deleted in the b-tree, if the element exists in the B-tree, then the element is deleted in its node, if the element is deleted, first determine whether the element has left and right child nodes, if there is, then move up the child node in a similar element to the parent. , and then the situation after the move, if not, after the move after the direct deletion.

After deleting an element and moving the element, if the number of elements in a node is less than Ceil (M/2)-1, you need to see whether one of its neighboring sibling nodes is plump (the number of elements in the node is greater than Ceil (M/2)-1), and if it is plump, borrow an element from the parent node to satisfy the condition; That is, when the number of nodes is less than ceil (M/2)-1, the node is "merged" into a node with a neighboring sibling node to satisfy the condition. Let's go through the following examples to learn more about it.

An example of a 5-step b-tree constructed with the above insertion operation is to remove h,t,r,e in turn.

First delete element H, of course first find h,h in a leaf node, and the leaf node number 3 is greater than the minimum number of elements ceil (M/2) -1=2, the operation is very simple, we only need to move K to the original position of H, Move the position of L to K (that is, the element following the delete element in the node moves forward)

Next, delete T, because T is found in the leaf node, but in the middle node, we find his successor W (the next element in ascending alphabetical order), move w up to the position of T, and then delete the W in the child node that contains W, where the number of elements in the child's node is greater than 2. No merging operations are required.

The next step is to remove r,r in the leaf node, but the number of elements in the node is 2, the deletion results in only 1 elements, which is already less than the minimum number of elements ceil (5/2) -1=2, if one of its neighboring sibling nodes is more plump (the number of elements is greater than ceil (5/2) -1=2), You can borrow an element from the parent node. Then the most plump adjacent sibling node is moved up the last or the first element to the parent node, in this instance, the right adjacent sibling node is more plump (3 elements greater than 2), so first borrow an element of the parent node w down to the leaf node, instead of the original s position, S forward , then x moves up to the parent node in the adjacent right sibling node, and finally deletes the x in the adjacent right sibling node, followed by the element forward.

The last step to delete E, the deletion will cause a lot of problems, because E is located in the number of nodes just to meet the minimum number of elements (Ceil (5/2) -1=2), and the adjacent sibling node is the same situation, delete an element can not meet the conditions, Therefore, the node is required to merge with an adjacent sibling junction, first moving the element in the parent node (the element between the two two node elements that need to be merged) to its child nodes, and then merging the two nodes into a single node. So in this example, we first move the element d in the parent node down to the node where E has been deleted and only F, and then merge the nodes with D and F and adjacent sibling nodes containing a,c into a single node.

You might think that this is the end of the delete operation, but in fact, in the case of this particular situation, you will immediately find that the parent node contains only one element g, no standard, which is unacceptable. If the neighbor of the problem node is more plump, you can borrow an element from the parent node. Assuming that the right sibling node (containing q,x) has more than one element (the right side of Q has elements), then we move M down to the small sub-node of the element and move the Q up to the position of M, at which point the Zuozi of Q becomes the right subtree of M, which is the N,p node that is attached to the right pointer of M. So in this case, we have no way to borrow an element that can only be combined with a sibling node into a single node, and the only element in the root node, m, moves down to the child node so that the height of the tree is reduced by one layer.

In order to further discuss the situation of deletion in detail. Give another example:

Here is a different 5-step b-tree, so let's try to remove C

The D element in the right child node of element c is then removed to the position of C, but when the element is moved up, there is only one element of the node.

And because it contains the node of E, its neighbors are just out of poverty (the minimum number of elements is 2), it is impossible to borrow elements to the parent node, so only merge operations, so here will contain a, a, a left sibling node and e containing the node to merge into a node.

In this case, there is only one element f node, at this time, the adjacent sibling knot is plump (the number of elements is 3> the minimum number of elements 2), so you can think of the parent node borrowing elements, the parent node in the J down to the node, the corresponding if the node in the J after the element is moved forward, Then the first element (or the last element) in the adjacent sibling node is moved up to the parent, and the subsequent element (or the preceding element) moves forward (or back), noting that the node containing the k,l was previously attached to the left of M and now becomes attached to the right of J. Thus each node satisfies the B-tree structural properties.

B+-tree: It is a b-tree deformation tree which is produced by the file system.

The difference between the B+-tree of a M-order and the b-tree of M-order is:

1. N subtrees tree nodes contain n keywords; (b-tree is n subtrees tree has n-1 keywords)

2. All the leaf nodes contain information about all the keywords, and pointers to the records containing these keywords, and the leaf nodes themselves are linked by the size of the keywords from a large order of origin. (B-tree leaf node does not include all the information needed to find)

3. All non-terminal nodes can be considered as the index part, and the nodes contain only the largest (or smallest) keywords in the nodes of their sub-roots. (B-tree's non-final node also contains valid information that needs to be looked up)

A) Why is a B + tree better suited to the file index and database index of the operating system in the actual application than B-tree?

1) b+-tree disk read and write cost less

The internal node of the B+-tree does not have pointers to specific information about the keyword. As a result, the internal nodes are relatively b-tree smaller. If you keep all of the same internal nodes in the same disk block, the number of keywords that the disk block can hold is more. The more keywords you need to find when you read into memory at once. The number of Io reads and writes is correspondingly lower.

For example, suppose that a disk block in a disc holds 16bytes, while a keyword of 2bytes, a keyword specific information pointer 2bytes. An internal node of a 9-order B-tree (a node with a maximum of 8 keywords) requires 2 disks fast. And the B+-tree internal node only needs 1 disks fast. When an internal node needs to be read into memory, the B-tree is more than b+-tree a block lookup time (the disk is the time of the disc rotation).

2) B+-tree query efficiency is more stable

Because a non-endpoint is not a node that ultimately points to the contents of a file, it is only the index of the keyword in the leaf node. So any keyword search must take a path from the root node to the leaf node. The path length of all keyword queries is the same, resulting in a query efficiency equivalent for each data.

The relevant code for B-tree/b+-tree can be found in subsequent references. Subsequent individuals will also send a series of detailed blog posts.

External Memory Operation

The previous theoretical introduction has been extracted from elsewhere, describing in detail the b-tree in memory and the various situations that need to be considered to ensure the structure of the b-tree. As described in the previous article, this data structure is very suitable for persistent preservation, the main reason is that each time the change of data (increase, deletion) caused by the change in the shape of the tree is relatively small, so when stored in external memory, there is no need to change the entire b-tree, just modify the affected node, This greatly reduces the number of IO times.

In real systems (databases and file systems), when using B-tree storage, the data is not stored in a similar manner, but instead is managed by the page, and page-type memory management has proven to be a mature and efficient way of managing the operating system implementation. Using pages to manage data files in other applications has the following advantages:

1. Bulk IO operations to avoid unnecessary IO operations

By the adjacent IO operation is normalized to the same page to operate, you can avoid multiple IO, if the operating system is busy, and the same page more random IO requests but the time is more dispersed, at this time the lowest efficiency, the system will have to seek every time. Normalized to a page, you can avoid a similar scenario.

2. Can use operating system advantages as far as possible

Operating system also in the page, if the B-tree data management page size set to the system page size, when the IO operation, memory operation, the OS will be able to optimally store, transfer, thereby improving performance.

3. Let b-tree into the actual combat stage

According to the previous description, B-tree need to pre-define the order, such as 5-order, 7-order, if the actual situation is defined in a similar way, you need to pre-evaluate the application needs to store the number of data, which is similar to the database system, file system is not possible task. In the case of a page, the approximate scheme avoids this problem, in the real database system, take the index stored in a page, a page is equivalent to a node, a page node full of data inserted when the page split. This avoids the need to pre-define the order, but also avoids the inconsistency of memory size due to inconsistent data size of different nodes. Why use pages instead of defining orders, and how can you maintain b-tree structures and properties by using page management?

Lmdb Use

How do you implement the B-TREE structure in Lmdb?

The code in Lmdb is divided into two pieces to realize the b-tree structure.

1. Page Management

Page management implements the external memory operation described above, and the specific functions implemented include

    • New, assign, release pages (Mdb_page_new,mdb_page_alloc, Mdb_page_malloc,mdb_page_free)

Mdb_page_malloc: Allocating memory for new pages

From the operating system to request 1 or n pages, usually a page, n pages for the overflow page, the default will be assigned when the last page will be initialized to 0.

Mdb_page_free: Frees up a single page and puts it into a reusable page list.

Mdb_page_new:: New Page

First call Mdb_page_alloc to assign the page, then initialize the page, create a new page, think that the page is a new page, so it needs its entire space to be available, the initialization settings will reflect this.

Mdb_page_alloc: Assigning pages

Assign one or n pages, and if you assign N, n pages are contiguous pages. If there is no dirty space available in the transaction, the allocation fails, and the available dirty space is the size of the array that stores the dirty page ID, configured to 131071. Lmdb all the available dirty pages are also maintained in a b-tree,freedb record the last time the transaction ID placed in the page, each time the allocation from the FreeDB to find enough space for reuse, the general allocation of a page can be satisfied, a continuous page, may need to try several times, Therefore, multiple pages are generally overflow pages and must be continuous pages to meet the requirements. FreeDB's build process and storage format are found in other blogs in this series.

    • Copy page (mdb_page_copy)

Copy page content from one page to another page, this feature is mainly used for cow

    • Page split Merge (Mdb_page_merge, mdb_page_split)

The split merge of a page is the action that is required to meet the balance child node when the B-tree insert is deleted. See the description of INSERT and delete earlier in this article for specific application conditions.

Page splitting:

Mdb_page_split: Implementation of the above B-tree operation process, considering only one node, append mode, braches/leaf/leaf2 and other different pages of the processing process, the basic process is based on a certain algorithm to determine the split point, According to the definition of b-tree, when splitting, it is not necessary to guarantee the split, just to ensure that the page node remains half full. After the split point is determined, the data is moved and inserted into the data that caused the split and the pointer is modified to maintain the b-tree structure, while the decision will result in the upper division and root splitting, if recursive.

Page Merging:

Mdb_page_merge: The merge process is also implemented as a result of node deletion. The basic process is to place the merged target page as a dirty page, and then make a copy of the node according to the above theory, or for the internal node to adjust the page pointer and move the upper and lower nodes, for this page after the completion of the balance operation, where the balance operation may lead to the merge operation, Until the B-tree is re-satisfied with the definition.

    • Dirty page read-write (Mdb_page_spill,mdb_page_unspill,mdb_page_dirty,mdb_page_flush,mdb_page_touch)

Mdb_page_spill: Write dirty pages back to disk, which is designed to nest long transactions, some nested long transactions use a large number of pages, in order to avoid the consumption of memory, the dirty page can be written back to the disk, write back to disk as a commit, because there will only be a write transaction between multiple processes, threads, Therefore, there is no problem writing back to the disk before committing. And as long as there is space, the page will not be brushed into the disk. At execution time, calculate whether the space is sufficient, not enough to store the ID in the IDL array, then swipe into the disk, and then decide whether to keep the p_dirty tag according to the environment variable.

Mdb_page_unspill: Spill page re-read back, this does not need to touch, directly set the dirty flag on it. Lmdb supports nested transactions, so if the lookup page belongs to a page that has already been spilled, it needs to find the entire nested path, from the leaves to the following, to confirm that the MIDL list (dirty space) has enough space, that there is no prompt for the transaction space is full, otherwise load the page and set the Dirty page markup.

Mdb_page_dirty: Sets dirty page markup and adds dirty pages to the list of dirty pages in the transaction.

Mdb_page_flush: Used when a transaction commits, when the page dirty page mark is cleared, the data is updated to disk (by writing the file). If you use the characteristics of the test (Mmap write), you can clean the dirty page to complete the work, because the write operation to the system to complete, Otherwise, you need to calculate the file start and end address after the page is written to disk, and finally free dirty page memory, special reasons need to retain memory of the page does not participate in flush.

Mdb_page_touch: Implement cow Technology, copy a page, and insert a page that updates the B-tree pointer relationship into the b-tree, which means that changes are made on the copied page when the modification occurs, and the other transactions see the data before the transaction is committed, The new thing after the commit is the data after the change.

    • Page lookup (mdb_page_search_root, Mdb_page_search, Mdb_page_search_lowest)

Mdb_page_search_root: From the B-tree root node retrieval, according to the value of the key, from the root node to traverse the subtree to get each layer of the corresponding page, retrieve key within the page, and then according to the B-tree lookup method to determine the next layer of child node page, layer traversal, This will ultimately determine the position of the key or determine that there is no corresponding key in the B-tree. The page is also stored in the cursor page stack. This allows the cursor to reuse the corresponding page, facilitating subsequent updates.

Mdb_page_search/mdb_page_search_lowest will call Mdb_page_search_root to complete the search

Mdb_page_search, in addition to completing the search for additional work, is to ensure that the b-tree used is the most current version within the visibility of this transaction and that the page is dirty when needed.

Mdb_page_search_lowest: Retrieves the first qualified value from the current branch page.

    • Page Capture (mdb_page_get,mdb_page_list)

Mdb_page_get: Get the page, originally according to the mmap principle, read the corresponding page is very simple, calculate the address can be, but Lmdb, considering that the transaction may use a large number of pages, the transaction free space is full, a portion of the page Spill/flush to disk, It is therefore necessary to determine in get when the spill list, in the words from which to get, otherwise directly computed gets.

Mdb_page_list: Displays all keys on the page, which is a tool method.

2. Cursor operation

The cursor operation implements the B-tree node operation, which points to the B-tree node that currently needs to be manipulated, and then operates the data based on the operation provided (insert, Del), and then carries out a series of complex operational processes to maintain the b-tree structure. The specific features that are implemented include:

    • Cursor Traversal (Mdb_cursor_sibling,mdb_cursor_next,mdb_cursor_prev,mdb_cursor_first
      , Mdb_cursor_last)

Mdb_cursor_first: Positions the cursor to the smallest leaf node of the B-tree (first), rather than the first result position when queried by key. If duplicate data is supported, special processing is needed to move to duplicate data first.

Mdb_cursor_last: Similar to first, except to the maximum leaf node (last one)

Mdb_cursor_next: Cursor moves to the next node

Mdb_cursor_prev: Cursor moves to the previous node

Mdb_cursor_sibling: Moves the cursor to the sibling node, either the previous page or the next page.

If the current page has a key, the behavior is similar to next, Prev, or move to the corresponding key position of the next page.

    • Additions and deletions (Mdb_cursor_get, Mdb_cursor_set, Mdb_cursor_del, Mdb_cursor_del0, Mdb_cursor_put, Mdb_cursor_count)

Mdb_cursor_get: According to the position and condition of the cursor to get the value, most commonly used: mdb_get_current, get the value of the node that the cursor refers to, the basic idea is to see whether the page index is already greater than the number of keys, is greater than the cursor has to point to the next page, This is not possible for a non-duplicate key that takes the current value, so the acquisition fails. Then according to whether the LEAF2 page (key repeat), is based on the macro value, otherwise determine whether the leaf value has a copy (completely duplicate key and value, some words initialize the xcursor and start to take value, otherwise the words directly read the corresponding position value.

Mdb_cursor_set: The cursor is set (positioned) to the specified key position, if already on the correct page, only need to determine whether the key is within the range of the page key, determine the maximum and minimum value can be determined. Then according to the corresponding flag, as said in get, make judgments and read or set some variables. Otherwise, the page lookup first locates the key page (Mdb_page_search), then locates the page position (Mdb_node_search), and then sets the related variables.

Mdb_cursor_count: Returns the number of results represented by the cursor, the unique key returns one, and repeats the key to return the number of repetitions.

Mdb_cursor_put: The key, value pairs are stored in the database, the default is the new increase, if the key already exists is updated, the basic process is: To determine the premise cursor, key non-empty, to confirm the various flags are legitimate, such as multiple value, But the database does not support duplicate key This situation is illegal, after the flag is appropriate, determine whether it is an empty tree, non-empty when the cursor points to the correct location, such as append mode to the maximum database node, the normal point to the location should be inserted. Then touch all pages to make all the pages writable. If the LEAF2 type page, the key, value is completely duplicated, add key is OK, and then determine whether the value is too large, too large to be converted to subtree for storage. When converting to Subdb/subpage, we first set up various variables according to various flags, including applying for new pages and so on, then the rest is the insertion action of the nodes according to the above theory described above, the value is placed in corresponding position, paging, etc., when necessary unspill, Put to overflow page, etc., if inserting multiple data at one time also need to repeat one insert at a time.

Mdb_cursor_del,mdb_cursor_del0: Deletes the specified key, value. The first is based on a variety of flags to set various variables, followed by the page is dirty page, followed by deletion, subdb/subpage,overflowpage and so on, you need to recycle the corresponding page to free-list, such as subdb delete the last node, You need to delete the whole subtrees tree. The real key is deleted in del0, it removes the corresponding key from the page, the entire b-tree is rebalance after the deletion is complete, and all the other cursor in the same transaction that points to the currently deleted page is corrected, notifying the other cursor that this page has been deleted.

    • Open, close, reuse, initialize
      Mdb_cursor_touch: Sets the database and all pages in the cursor stack as dirty pages. This may have a small number of pages actually do not need to set the dirty page is actually set to dirty page situation, but this to achieve cow to provide maximum convenience, only need to modify the root page pointer, otherwise you need to track many pages.
      Mdb_cursor_open: Open the cursor, first determine whether the flag is legitimate, the legitimate request for memory and call Init initialization
      Mdb_cursor_renew: Reusing cursors, which can be reused renew when this cursor is no longer in use.
      Mdb_cursor_close: Closes the cursor, removes it from the cursor list of the transaction, and frees up memory.

Mdb_cursor_copy: Copies the cursor, copying everything from one to the new cursor.

Mdb_cursor_shadow: A cursor that backs up cursor transactions for envcopy

Mdb_cursor_init: Set Various variables, if the database status is Db_stale, you need to get the latest root node.

    • Page
Mdb_cursor_pop: POPs a page from the cursor stack
Mdb_cursor_push: Pressing a page onto the stack typically presses all pages on the entire search path onto the stack.
    • State
      Mdb_cursor_chk: Check that the cursor is correct
      MDB_CURSOR_TXN: Get cursor corresponding transaction
      MDB_CURSOR_DBI: Get cursor corresponding database

The above content is a brief introduction to the Lmdb internal use of the B-tree principle, as well as the b-tree operation of the various functions of the simple introduction, about B-tree storage subdb,subpage,overflowpage, etc. is not carefully enough, If you have time, you will be in a subsequent post to open a detailed description of the above abnormal key-value data storage and operation mode. The B-tree in Lmdb is B+-tree, and all leaf nodes are at the same level.

This article is a lot of shortcomings, welcome to communicate correct.

Reference documents:

Https://en.wikipedia.org/wiki/B-tree

http://blog.csdn.net/hbhhww/article/details/8206846

http://slady.net/java/bt/view.php

Computer programs and Art

Lightning MDB source Code Analysis series (3)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.