Linux Kernel base tree Application Analysis

Source: Internet
Author: User

Linux Kernel base tree Application Analysis

LInux kernel base tree Application Analysis-- Lvyilong316

The base tree can be seen as a trie tree with binary strings as keywords. It is a multi-tree structure similar to a multi-layer index table, each intermediate node contains an array of pointers pointing to multiple nodes, and the leaf node contains pointers pointing to the actual object (because the object does not have a tree node structure, the parent node is considered as a leaf node ).

Figure 1 shows a base tree sample where the tree has a branch of 4 (2 ^ 2) and a tree height of 4. Each leaf node of the tree is used to quickly locate the offset in the 8-bit file, you can find the 4x4x4x4 = 256 (number of leaf nodes) page. For example, the path composition values of the two leaf nodes corresponding to the dotted line in the figure are 0x00000010 and 0x11111010, point to the cache page corresponding to the corresponding offset in the file.

Figure 1

In the Linux kernel, the base tree is used to map the Handle id or page index of an object to a pointer to an object (specifically, it is converted to a path composed of pointers in some columns ), this is achieved by using the id segment as the index of the pointer array of each layer node (the items in the pointer array are called slot below. Segments are typically obtained using a bitmask that shifts the id right to a specified number of digits and the specified length, such as (id> n) & IDR_MASK. For example, a 32-bit id value can be converted into eight single-bit strings (each containing four digits) in a 4-bit segmentation method. Each string can be regarded as 1-bit from high to low ~ The slot index of the layer-8 node obtains the pointer pointing to the next node through the slot index of the previous node, so that until the last layer, the index points to the final object. As shown in figure 2, the id is 8 bits. The method of dividing by 4 bits can form a two-layer base tree. The lowest layer has a total of (2 ^ 4) * (2 ^ 4) = 2 ^ 8 = 256 leaf nodes, so the function stores 256 objects, and the maximum id of the object is 256-1 = 255 (id starts from 0 ).

Figure 2

From this point of view, the retrieval of objects in the base tree is a little slower than that of a fixed array, but it uses the idea of changing time to space. it is very suitable for scenarios where the number of nodes dynamically changes, and its time complexity is acceptable, reaching O (log2nN), where 2n is the number of pointer slots for each node, while n corresponds to the bit length of the segment mask.

1. Application of the base tree in the Linux Kernel 1.1 File Cache Page Management

In earlier versions of the kernel (for example, 2.4.0), the file page cache is organized through the common hash page_hash_table (hash Based on the index corresponding to the cache page ), the specified page of the specified file can be quickly searched through the hash, and there is not much additional memory consumption, but its disadvantages are also obvious, because all access files are cached on the same hash page and pagecache_lock is used for query, the concurrent access performance of multiple processes is reduced, in a particular situation, it is intolerable. Therefore, you can use the file address space in the 2.6 kernel to manage the cache pages by yourself, so that the page search of each file does not affect each other and improves the concurrency performance. "The file pages of the Linux2.6 kernel are managed by the base tree, and the page index determines its location in the tree. The data structure of the object in the file address space is as follows:

    
    
  1. struct address_space{
  2. struct inode *host; /*owner:inode,bIock_device*/
  3. struct radix_tree_root pagetree;
  4. )

The page_tree points to the root of the base tree, And the Pointer Points to the radix_tree_root structure.

    
    
  1. Struct radix_tree_root {
  2. Unsigned int height; // The height of the tree
  3. Gfp_t gfp_mask;
  4. Struct radix _ tree _ node * rnode; // point to the root node of the base tree
  5. };

Rnode points to the root node of the base tree. The root node is a radix_tree_node structure.

    
    
  1. struct radix_tree_node{
  2. unsigned int heigh; /*Height from the bottom*/
  3. unsigned int count;
  4. struct rcu_head rcu_head;
  5. void *slots[RADlX_TREE_MAP_SIZE];
  6. unsigned Iong tags[RADIX_TREE_MAX_TAGS][RADIX_TREE_TAG_LONGS];
  7. };

Height indicates the height of the node. count indicates the number of child nodes (that is, the number of non-empty slots), and slots indicates the child node pointer array (for leaf nodes, it points to the corresponding page structure ), the tags array uses bitmap to indicate whether each subtree contains a page with the corresponding flag. The two marks are dirty and written back respectively.

# DefinePAGECACHETAGDIRTY0

# Definepagecache?agvvrlteback1

1.2 process-related communication ipc Object Management

The ipc object (such as the shared memory object shm) in the earlier kernel (such as 2.6.11) is managed using a Fixed Array (the Object id is the subscript of the array), which has a defect, that is, when the number of objects increases dramatically and the number of original array objects is insufficient, it is necessary to re-allocate the new array through grow_ary (), and then copy the content between the new and old arrays, when the number of objects changes greatly, it is necessary to face frequent array allocation and release, which is detrimental to the performance. Therefore, in 2.6.24, the ipc object is managed by using the base tree idr, although the positioning of objects through idr is not as direct as the Array (the time complexity is the height of the tree), but in exchange for good dynamic performance, it will not face large-scale memory allocation when adding objects, you only need to create one or several (extension tree) Tree nodes, and the performance of getting idle IDS is better than that of arrays, which directly affects the speed of inserting new objects. The idr structure is used to manage the base Tree Containing ipc objects, indexed by the Object id:

    
    
  1. struct idr{
  2. struct idr_Iayer *top;
  3. struct idr_Iayer *id_free;
  4. int Iayers;
  5. int id_free_ cnt;
  6. spinlock_t lock;
  7. };

Idr_layer is the tree node structure, top points to the root node, and layers is the tree height. id_free maintains a temporary idle node linked list, And id_free_cnt indicates the number of nodes in the idle linked list.

2. Code analytics 2.1 Linux \ lib \ radix-tree.c

(1) tree height and maximum index conversion:

    
    
  1. Static _ init unsigned long _ maxindex (unsigned int height)
  2. {
  3. Unsigned int width = height * RADIX_TREE_MAP_SHlFT; // when the RADIX_TREE_MAP_SHlFT value is 6, it indicates that each node has 2 ^ 6 = 64 slots and the value is 4, 2 ^ 4 = 16 Slots
  4. Int shift = RADlX_TREE_INDEX_BlTS-width:
  5. If (shift <0)
  6. Return ~ 0UL;
  7. If (shift> = BITS_PER_LONG)
  8. Return 0UL;
  9. Return ~ 0UL> shift;
  10. }

First, the width of the total index is obtained from the height (number of layers of the tree) and the index bit width of each node (RADIX_TREE_MAP_SHlFT uses several digits for indexing, then convert the bit width to the maximum index (that is, the number of leaves ), for example, the maximum value of a 32-bit index is 2 ^ 32-bits, and the maximum index is 2 ^ (2*4)-1. (4-digit index, each node can have 2 ^ 4 = 16 branches (slot), so the second layer has 2 ^ (2*4) nodes, start from the leftmost end of the second layer, and the maximum index is 2 ^ (2*4)-1 ). You can call this function cyclically to obtain the maximum index values of trees of various heights stored in a static array height_to_maxindex. This is implemented by calling radix_tree_init ()-> radix_tree_init_maxindex () during initialization.

(2) Insert an object

The root parameter points to the root node, and the index indicates the page index, item:

    
    
  1. Int radix_tree_insert (struct radix_tree_root * root, unsigned long index, void * item)
  2. {
  3. Struct radix_tree_node * node = NULL, * sIot;
  4. Unsigned int height, shift;
  5. Int offset;
  6. Int error;
  7. BUG_0N (radix_tree_is_indirect_ptr (item ));
  8. // If the current index exceeds the maximum index of the tree, you must call radix_tree_extend () to extend the height of the tree until the maximum index can accommodate the index value in the parameter.
  9. If (index> radix_tree_maxindex (root-> height )){
  10. Error = radix_tree_extend (root, index );
  11. If (error)
  12. Return error;
  13. )
  14. SIot = radix_tree_indirect_to_ptr (root-> rnode );
  15. // Take the height of the current tree he-ght and the initial right shift of the page index
  16. Height = root-> height;
  17. Shift = (height-1) * RADIX_TREE_MAP_SHIFT;
  18. Offset = 0;/* uninitiaIised var warning */
  19. // Search from the height layer in a loop based on the index until the 1st layer node (subtree nodes are allocated as needed in the middle)
  20. WhiIe (height> 0 ){
  21. // If sIot is a null pointer, an intermediate node needs to be allocated.
  22. If (sIot = NULL ){
  23. /* Have to add a chiId node .*/
  24. If (! (Slot = radjx_tree_node_alloc (root) // call the SIab distributor to allocate a new node.
  25. Return-ENOMEM;
  26. Slot-> height = height; // you can specify the node height.
  27. // If the node is not empty, the new node is allocated as its subnode; otherwise, the new node is allocated as the root node.
  28. If (node ){
  29. // Add the newly allocated node pointer to the offset slot in the pointer array of the node
  30. Rcu_assign_pointer (node-> sIots [offset], slot );
  31. Node-> count ++; the number of child nodes of worker node increases by 1
  32. } Else
  33. Rcu_assign_pointer (root-> rnode, radix_tree_ptr_to_indirect (sIot ));
  34. }
  35. // Adjust the index. The node and sIot go down (the sIot points to the subnode of the node), adjust the shift count, and the height is reduced by 1.
  36. Offset = (index> shift) & RADIX_TREE_MAP_MASK; // calculate the slot of the data items in the current layer based on the data item index. For example, if the index is 32 bits, the key is set to 4 bits, the slot location of the data item in the top layer is the slot location corresponding to the first four bits, and the slot location corresponding to the second layer (from top to bottom) is the slot location corresponding to the next four bits.
  37. Node = sIot;
  38. SIot = node-> sIots [offset];
  39. Shift-= RADIX_TREE_MAP_SHIFT;
  40. Height --;
  41. )
  42. /* Bit indexes of node at Layer 2. Point the slot (array item) corresponding to the set to the object indicated by item to complete object insertion */
  43. If (node ){
  44. Node-> count ++;
  45. Rcu_assign_pointer (node-> sIots [offset], item );
  46. ...
  47. }

(3) Delete an object:


    
    
  1. Void * radi × _ tree_deIete (struct radix_tree_root * root, unsignedIong index)
  2. {
  3. /* Use the path array to store node pointers and indexes along the search path. The length of the array is the maximum path length (the maximum height of the number) + 1, an extra NULL pointer (used as a sentry )*/
  4. Struct radix_tree_path path [RADIX_TREE_MAX_PATH + 1], * pathp = path;
  5. Struct radix_tree_node * slot = NULL;
  6. Struct radix_tree_node * to_free;
  7. Unsjgned int height, shift;
  8. Int tag;
  9. Int offset;
  10. // Height is initialized to the height of the tree
  11. Height = root-> height:
  12. // Check whether the index of the object to be deleted is beyond the tree range
  13. If (index> radix_tree_maxindex (height ))
  14. Goto out;
  15. // The sIot class points to the root node. During the following process, the slot always points to an intermediate node.
  16. SIot = root-> rnode;
  17. // Return directly for an empty tree with a height of 0
  18. If (height = 0 ){
  19. Root_tag_clear_all (root );
  20. Root-> rnode = NULL:
  21. Goto out;
  22. }
  23. Slot = radix_tree_indirect_to_ptr (sIot );
  24. // Save the number of digits of the index to be shifted in shift
  25. Shift = (height-1) * RADIX_TREE_MAP_SHIFT;
  26. // Set the node of the first entry in the path array to null to indicate the whistle
  27. Pathp-> node = NULL;
  28. // This loop traverses the object corresponding to the id from the root node, and nodes and slots along the path are stored in the array pointed to by pathp.
  29. Do {
  30. If (sIot = NULL) // if a NULL pointer is encountered on the way (the specified object certainly does not exist), return directly
  31. Goto out;
  32. Pathp ++; // The path array pointer increments pathp-> node stores the groove index of the current node, and pathp-> node stores the current node
  33. Offset = (index> shift) & RADIX_TREE_MAP_MASK;
  34. Pathp-> offset = offset;
  35. Pathp-> node = sIot;
  36. // Obtain the pointer of the next node based on the index and adjust the shift count
  37. SIot = slot-> sIots [offset];
  38. Shift-= RADIX_TREE_MAP_SHIFT;
  39. Height --;
  40. } WhiIe (height> 0 );
  41. If (sIot = NULL)
  42. Goto out;
  43. ...
  44. To_free = NULL;
  45. /* This loop uses the pathp array record to traverse from the parent node of the object to the root node, the corresponding slot pointer is null (the underlying node slot pointer is null, that is, the object is deleted from the tree), the number of subnodes decreases, and all nodes with empty slots are released. In either case, the cycle ends: (1) the root node has been reached and processed; (2) the node with a number of subnodes not 0 */
  46. While (pathp-> node ){
  47. Pathp-> node-> sIots [pathp-> offset] = NULL;
  48. Pathp-> node-> count --;
  49. /* Queue the node for deferred freeina after the last reference to it disappears (set NULL, above )*/
  50. If (to_free)
  51. Radix_tree_node_free (to_free );
  52. // Encounter the number of subnodes not 0 node, if it is the root node, call the radix-tree_shrink () to try to contract the tree, and then exit the loop
  53. If (pathp-> node-> count ){
  54. If (pathp-> node = radix_tree_indirect_to_ptr (root-> rnode ))
  55. Radix_tree_shrink (root );
  56. Goto out;
  57. }
  58. // Node with zero slots in use so free it
  59. To_free = pathp-> node;
  60. Pathp --;
  61. }
  62. /* Running here indicates that the tree does not contain objects and becomes an empty tree. The root node of to_free is released. The tree height is set to 0, and the root pointer is set to null */
  63. Root_tag_clear_aIl (root );
  64. Root-> height = 0;
  65. Root-> rnode = NULL;
  66. If (to_free)
  67. Radix_tree_node_free (to_free );
  68. Out:
  69. Return sIot;
  70. }

(4) tree extension:


    
    
  1. StatIc int radix_tree_extend (struct radix_tree_root * root, unsigned Iong index)
  2. {
  3. Struct radix_tree_node * node;
  4. Unsigned int height;
  5. Int tag;
  6. // Increase the height by 1
  7. Height = root-> height + 1;
  8. The histogram loop compares the maximum index value and index of the tree. by increasing the height, the tree can accommodate the objects of the specified index.
  9. While (index> radix_tree_maxindex (height ))
  10. Height ++;
  11. // For an empty tree, leave it for future node allocation. Here, only adjust the height of the tree.
  12. If (root-> rnode = NULL ){
  13. Root-> height = height;
  14. Goto out;
  15. }
  16. Add a single subtree above the original root node. so that the tree height reaches the specified value, the root node is replaced by the root of the new subtree, and the leaf node of the new subtree points to the original root/node, the newly added node has the features that all slots except the slot 0 pointer pointing to the subnode are null pointers, that is, the leftmost single tree. this adjustment is equivalent to adding a 0 string to the high position of the original id string, so that the original object id value and its position in the new expansion tree still maintain a correct ing relationship.
  17. Do {
  18. Unsigned int newheight;
  19. If (! (Node = radix_tree_node_aIIoc (root )))
  20. Return-ENOMEM;
  21. /* Lncrease the height .*/
  22. Node-> slots [0] = radix_tree_indirect_to_ptr (root-> rnode );
  23. /* Propagate the aggregated tag info into the new root */
  24. For (tag = 0; tag
  25. If (root_tag_get (root, tag ))
  26. Tag_set (node, tag, 0 );
  27. }
  28. Newheight = root-> height + 1;
  29. Node-> height = newheight;
  30. Node-> count = 1;
  31. Node = radix_tree_ptr_to_indirect (node );
  32. Rcu_assign_pointer (root-> rnode, node );
  33. Root-> height = newheight;
  34. } WhiIe (height> root-> height ):
  35. Out:
  36. Return 0;
  37. }

(5) shrinkage of the tree.

Start from the root node and check the nodes that meet the condition that all the other slots pointers except 0th slots are null until the node at Layer n does not meet this condition ~ The n-1 layer of a single tree shrinks, and node f is released along the way to return to the slab distributor), and then the n-layer node is used as the new root node:


    
    
  1. Static inIine void radix_tree_shrink (struct radix_tree_root * root)
  2. {
  3. /* Try to shrink tree height */
  4. WhiIe (root-> height> 0 ){
  5. Struct radix_tree _ node * to _ free = root-> rnode;
  6. Void * newptr;
  7. BUG_0N (! Radix_free_is_indirect_ptr (to_free ));
  8. To_free = radix_tree_indirect_to_ptr (to_free );
  9. // Exit the loop if the number of subnodes of the current node is not equal to 1
  10. If (to_free-> count! = 1)
  11. Break;
  12. // The sub-node does not point to the 0th slot or exit the loop.
  13. If (! To_free-> slots [0])
  14. Break;
  15. // Newptr stores the to_free unique sub-node pointer
  16. Newptr = to_free-> slots [0];
  17. If (root-> height> 1)
  18. Newptr = radix_free_ptr_to_indirect (newptr );
  19. // Subnode as the New Root Node
  20. Root-> rnode = newptr;
  21. Root-> height --; // The height of the tree decreases
  22. /* Must only free zeroed nodes into the sIab */
  23. Tag_clear (to_free, 0, 0 );
  24. Tag_cIear (to_free, 1, 0 );
  25. Release to_free Node
  26. To_free-> sIots [0] = NULL;
  27. To_free-> count = 0;
  28. Radix_tree_node_free (to_free );
  29. }
  30. }

(6) query objects by page index:


    
    
  1. Void * radix_tree_lookup (struct radix_tree_root * root, unsigned long index)
  2. {
  3. Unsigned int height, shift;
  4. Struct radix_tree_node * node, ** slot;
  5. Node = rcu_dereference (root-> rnode );
  6. ...
  7. Height = node-> height;
  8. If (index> radix_tree_maxindex (height ))
  9. Return NULL;
  10. // Set the number of digits of the initial shift
  11. Shlft = (height-1) * RADIX_TREE_MAP_SHlFT;
  12. /* Perform layer-by-layer search from the top-down loop. The index, shift, and bit mask are used to obtain the slot index, and then the current node and the slot index obtain the slot sIot. then, the node points to the lower-layer node indicated by the node pointer slot, and finally adjusts the shift and current height until it reaches layer 0. At this time, the node points to the object */
  13. Do {
  14. SIot = (struct radix_tree_node **) (node-> sIots + (index> shift) & RADIX_TREE_MAP_MASK ));
  15. Node = rcu_dereference (* sIot );
  16. If (node = NULL)
  17. Return NULL;
  18. Shift-= RADIX_TREE_MAP_SHlFT;
  19. Height --;
  20. } WhiIe (height> 0 );
  21. Return node;
  22. }

2.2Linux \ Iib \ idr. c

The idr mechanism is basically a set of methods of the base tree, but it has more functions to find idle IDs, so it cannot completely copy the above mechanism.

Code analysis is omitted.

3. File page caching of peripheral functions 3.1

The add_to_page_cache () function calls radix_tree_insert () to insert a specified page to the specified location of the specified file page cache base tree. find_get_page () searches for the page with the specified index in the base tree of the address space. Both functions are included in Linux \ mm \ filemap. c. Their operations on the base tree are write and read respectively. Because the base tree function in radix_tree.c does not have the synchronization method, their peripheral functions must be called to include the synchronization measures. the two peripheral functions use the Read and Write locks of the address space. add_to_page_cache () calls write_lock_irq (& mapping-> tree_lock) before calling radix_tree_insert, find_get_page () calls read_lock_irq (& mapping-> tree_lock); to lock the read.

3.2 idr mechanism of ipc

Ipc_findkey () calls idr_find () to traverse the base tree from 0 until the object with the specified key value is found. ipc_addid () calls idr_get_new () add the object to the idr tree and return the id corresponding to the location. ipc_rmid () calls idr_remove () to delete the object with the specified id from the idr tree. These functions include Linux \ ipc \ unic. c. Their synchronization problems are guaranteed by the Read and Write semaphores used by the outermost ipc function. For example, the call path of ipc_rmid () is shm_close ()-> shm_destroy ()-> shm_rmid () -> Use down_write (& shm_ids (ns) in ipc_rmid () and shm_close ). rw_mutex); the shared memory ids are locked, which sacrifices a certain degree of concurrency, but ensures data consistency. in future versions, it is estimated that more fine-grained locks or better concurrency mechanisms will be used. Similarly, the call path of ipc_addid () is sys_shmget ()-> ipcget ()-> ipcget_new ()-> newseg ()-> shm_addid ()-> ipc_addid (), in ipcget_new (), down_write (& ids-> rw_mutex) is also used; write locks the entire ids.

5. Conclusion

For the data structure of the object located by id, the fixed array is the most direct and the fastest. And shift with logical operations

The combination of operations is followed by the hash list of hash functions. However, arrays are suitable for scenarios where the number of objects does not change much or the maximum number of objects is not many. arrays are not suitable for scenarios where the object distribution is sparse. Otherwise, memory waste is serious; however, when querying, inserting, or deleting a hash table, the entire table must be locked. Frequent sharing may lead to poor concurrent performance. In addition, the uniqueness of the location and id ing is missing, this method is not applicable to scenarios where IDs need to be automatically generated. The base tree learns from each other. Its search performance is within the acceptable range, and its memory consumption is not large. It is also dynamic and can be scaled down or expanded as needed. More importantly, it has a unique ing relationship between location and id like an array, so that it is easy to generate an id value when a new object is added, which is not in the hash list. in addition, many such trees can be created in the system, which improves the concurrency performance.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.