Go Application analysis of Linux kernel cardinality tree

Source: Internet
Author: User
Tags bitmask dashed line goto

Linux Kernel cardinality Tree application analysis

--lvyilong316

The cardinality tree (Radix trees) can be seen as a trie tree with the bits string as the keyword, a multi-fork tree structure, and similar to a multi-level index table, each intermediate node contains an array of pointers to multiple nodes, and the leaf nodes contain pointers to the actual objects ( Because the object does not have a tree node structure, it considers its parent node as a leaf node.

Figure 1 is a cardinality tree sample, the base tree Fork is 4 (2^2), the tree height is 4, each leaf node of the tree is used to quickly locate the 8-bit file within the offset, you can locate the 4x4x4x4=256 (number of leaf nodes) page, For example, the path of the two leaf nodes corresponding to the dashed line in the figure is composed of the values 0x00000010 and 0x11111010, pointing to the cached page corresponding to the corresponding offsets within the file.


Figure 1

In the Linux kernel, the cardinality tree is used to convert an object's handle ID or page index to an index map to a pointer to an object (specifically to a path with pointers to some columns), by dividing the ID into an array of pointers for each layer node (The following is an index of the array of pointers to the slot) to achieve the purpose of the search. segments typically use the ID right shifts the specified number of digits and the specified length of the bitmask phase to obtain , such as (Id>>n) &idr_mask. For example, a 32-bit ID value, according to 4 bits of a segmented method, can be converted into 8 bit strings (each containing 4 bits), from high to low as the 1~8 Layer node slot index, through the slot index of the previous node to get pointers to the next layer of nodes, so until the last layer, the index points to the final object. As shown in 2, for the ID of 8 bits, 4 bits in a paragraph of the method, can constitute a 2-tier cardinality tree, the lowest common (2^4) * (2^4) =2^8=256 leaf nodes, so the function holds 256 objects, and the object's maximum ID is 256-1=255 (ID starting from 0).

Figure 2

From this point of view, the retrieval of objects in the cardinality tree is slightly slower than the fixed array, but it uses the idea of time-changing space. It is very suitable for the dynamic change of nodes, and its time complexity is acceptable, reaching O (log2nn), where 2n is the number of pointer slots per node, and n corresponds to the bit length of the fragment mask.

1. Application of cardinality tree in Linux kernel 1.1 file cache page Management

In earlier versions of the kernel (such as 2.4.0), the file page cache was organized by a common hash table page_hash_table (hash based on the index of the cached page), and the hash table was able to search the specified page of the specified file more quickly, and there was not much additional memory consumption. However, the drawbacks are obvious because all access files cache pages through the same hash table, while queries are pagecache_lock by spin locks, thereby reducing the concurrent access performance of multiple processes, which is intolerable in certain situations. Therefore, the cache pages are managed by each file address space in the 2.6 kernel. So that the page search work of each file does not affect each other, improve concurrency performance. The Linux2.6 kernel's file pages are managed by the cardinality tree, and the page index determines its location in the tree. The data structure of the file address space object is as follows:

struct address_space{

struct Inode *host; /*owner:inode,biock_device*/

struct Radix_tree_root pagetree;

)

Where Page_tree points to the root of the cardinality tree, which points to the radix_tree_root structure.

struct radix_tree_root{

unsigned int height; Height of the tree

gfp_t Gfp_mask;

struct radix _tree_ node *rnode; Root node pointing to the cardinality tree

};

Rnode points to the root node of the cardinality tree, and the root node is a radix_tree_node structure.

struct radix_tree_node{

unsigned int heigh; /*height from the bottom*/

unsigned int count;

struct Rcu_head rcu_head;

void *slots[radlx_tree_map_size];

unsigned iong tags[radix_tree_max_tags][radix_tree_tag_longs];

};

Height of the node, count indicates the number of child nodes (that is, non-empty slots), slots is a child node pointer array (for leaf nodes, the corresponding page structure), the tags array uses bitmaps to indicate whether each subtree contains a page with corresponding flags. The two flags are dirty and writeback respectively.

#define Pagecache TAG DIRTY 0

#define Pagecache 1_ag Vvrlteback 1

1.2 Process Q Communication IPC Object management

An IPC object (such as a shared-memory object SHM) in an earlier kernel (such as 2.6.11) is managed with a fixed array (the object ID is the subscript for an array), which has the disadvantage of reallocating the new array through Grow_ary () when the number of objects is soaring and the number of original array objects is insufficient. Then a copy of the content between the old and new arrays, when the number of objects change is large to face the array of frequent allocation and release, which is detrimental to performance, so the IPC object in 2.6.24 instead of using the base tree IDR management, although the location of the object by IDR is not as direct as the array (time complexity is the height of the tree), But in exchange for good dynamic performance, there is no large-scale memory allocation when adding objects, it is only necessary to create one or several (extended tree) tree nodes, and the performance of the free ID is better than the array, which directly affects the speed of inserting new objects. The IDR structure is used to manage the cardinality tree containing IPC objects, indexed by object ID:

struct idr{

struct Idr_iayer *top;

struct Idr_iayer *id_free;

int iayers;

int Id_free_ cnt;

spinlock_t lock;

};

Where Idr_layer is the tree node structure, top points to the root node, layers is the height of the tree, Id_free maintains a temporary list of idle nodes, id_free_cnt indicates the number of nodes in the idle list.

2. Code Analysis 2.1 linux\lib\radix-tree.c

(1) Conversion of tree height and maximum index:

Static __init unsigned long__maxindex (unsigned int height)

{

unsigned int width=height*radix_tree_map_shlft; A radix_tree_map_shlft value of 6 o'clock indicates that each node has 2^6=64 slots and a value of 4 o'clock, indicating that there are 2^4=16 slots.

int Shift=radlx_tree_index_blts-width:

if (shift<0)

return ~0ul;

if (Shift>=bits_per_long)

return 0UL;

return~0ul>>shift;

}

The width of the total index is obtained by the height (number of layers of the tree) and by the index bits of each node (radix_tree_map_shlft with several indexes), and then by the bit width into the maximum index (that is, the number of leaves), such as the maximum value of the 32-bit index is 2^ 32-1, the maximum index for 4-bit double-decker trees is 2^ (2*4)-1. (4-digit index, each node can have 2^4=16 branch (slot), then the second layer has 2^ (2*4) nodes, numbering from the second-most left-hand side, with a maximum index of 2^ (2*4)-1). This function can be called by loop to get the maximum index value of a tree of various heights stored in a static array height_to_maxindex. This is the implementation of calling Radix_tree_init ()->radix_tree_init_maxindex () during initialization.

(2) inserting an object

The parameter root points to the root node, index indicates the page indexes, item:

int Radix_tree_insert (struct radix_tree_root *root,unsigned long index,void *item)

{

struct Radix_tree_node*node=null,*siot;

unsigned int height,shift;

int offset;

int error;

bug_0n (Radix_tree_is_indirect_ptr (item));

If the current index exceeds the maximum index of the tree, you must call the height of the radix_tree_extend () extension tree until the maximum index can hold the index value in the parameter
if (Index>radix_tree_maxindex (root->height)) {

Error=radix_tree_extend (Root,index);

if (Error)

return error;

)
Siot=radix_tree_indirect_to_ptr (Root->rnode);

He-ght the height of the current tree, and the initial right shift number of the page index shift
height=root->height;

shift= (height-1) *radix_tree_map_shift;

offset=0; /*uninitiaiised var warning*/

Retrieves from the height layer according to the index loop, up to the 1th node (sub-tree node on demand)

Whiie (height>0) {

If you encounter Siot as a null pointer, you need to assign an intermediate node

if (siot==null) {

/*have to add a chiid node. */

if (! ( Slot=radjx_tree_node_alloc (root))//Call the Siab allocator to assign a new node

Return–enomem;

slot->height=height;//Setting the node height

node is not empty, the new allocation node is the child node, otherwise the new allocation node is the root node

if (node) {

The offset slot in the pointer array of the new assigned node pointer into node

Rcu_assign_pointer (Node->siots[offset],sjot);

Node->count++;∥node's children increase by 1 in number of nodes

}else

Rcu_assign_pointer (Root->rnode,radix_tree_ptr_to_indirect (Siot));

}

Adjust the index, node, Siot down (Siot points to Node's child nodes), adjust the number of shifts, height minus 1

offset= (Index>>shift) &radix_tree_map_mask;//calculates the slot of the current layer data item according to the index of the data item, such as the index is 32 bits, uses 4 bits to do the key, The data item at the top of the slot is the first four bits corresponding to the slot, the second layer (top to bottom) corresponding to the slot for the next 4 bits corresponding slot

Node=siot;

siot=node->siots[offset];

Shift-=radix_tree_map_shift;

height--;

)

/* The bit Index of node 1th. The set corresponding slot (array item) points to the object indicated by item, thus completing the insertion of the object */

if (node) {

node->count++;

Rcu_assign_pointer (Node->siots[offset],item);

...

}

(3) Deletion of objects:

void *radix_tree_deiete (struct radix_tree_root *root,unsignediong index)

{

/* Use the path array to hold the node pointers and indexes along the search path, the length of the array is the maximum path length (maximum height of the number) +1, the extra one holds the null pointer (Sentinel function) */

struct Radix_tree_path Path[radix_tree_max_path+1],*pathp=path;

struct Radix_tree_node *slot=null;

struct Radix_tree_node *to_free;

unsjgned int height,shift;

int tag;

int offset;

Height initialized to tree heights

Height=root->height:

Checks if the index of the object to be deleted is outside the range of the tree

if (index>radix_tree_maxindex (height))

Goto out;

Siot initialization points to the root node, where the slot always points to an intermediate node in the following procedure

siot=root->rnode;

Direct return for empty trees with a height of 0

if (height==0) {

Root_tag_clear_all (root);

Root->rnode=null:

Goto out;

}

Slot=radix_tree_indirect_to_ptr (Siot);

The number of bits that the index currently needs to shift in shift

shift= (height-1) *radix_tree_map_shift;

The node of the No. 0 entry in the path array is null as the cue

pathp->node=null;

This loop loops down from the root node to the object corresponding to the ID, along with the nodes and slots in the array that pathp points to

do{

if (siot==null)//On the way encountered a null pointer (the specified object does not exist), directly returned

Goto out;

pathp++; Diameter array pointer increment pathp->node The slot index of the current node, Pathp->node holds the current node

offset= (Index>>shift) &RADIX_TREE_MAP_MASK;

pathp->offset=offset;

pathp->node=siot;

Gets a pointer to the next node based on the index and adjusts the number of shifts

siot=slot->siots[offset];

Shift-=radix_tree_map_shift;

height--;

} Whiie (height>0);

if (siot==null)

Goto out;

...

To_free=null;

/* This loop iterates through the records of the PATHP array from the parent node of the object being deleted to the direction of the root node, the corresponding slot pointer is empty (the bottom node slot pointer is empty, the object is removed from the tree), the number of child nodes is decremented, and the nodes with empty slots are freed. In both cases, the loop terminates: (1) the root node (2) that has reached and finished processing has encountered a node that is not 0 of the number of child nodes */

while (Pathp->node) {

pathp->node->siots[pathp->offset]=null;

pathp->node->count--;

/*queue the node for deferred Freeina after the last reference to it disappears (set null,above) */

if (To_free)

Radix_tree_node_free (To_free);

If the node is not 0, if it is the root node, call Radix-tree_shrink () to try to shrink the tree and then exit the loop

if (Pathp->node->count) {

if (Pathp->node==radix_tree_indirect_to_ptr (Root->rnode))

Radix_tree_shrink (root);

Goto out;

}

Node with zero slots inch

to_free=pathp->node;

pathp--;

}

/* Run here to indicate that the tree does not contain objects become empty trees, releasing the root node that to_free contains, the tree height is 0, the root pointer is empty */

Root_tag_clear_ail (root);

root->height=0;

root->rnode=null;

if (To_free)

Radix_tree_node_free (To_free);

Out

return Siot;

}

(4) Extension of the tree:

statIc int radix_tree_extend (struct radix_tree_root *root,unsigned Iong index)

{

struct Radix_tree_node *node;

unsigned int height;

int tag;

1 Increase in height

height=root->height+1;

∥ the maximum index value of the tree for the loop comparison and index. Finally enables the tree to accommodate objects of the specified index by increasing the height

while (Index>radix_tree_maxindex (height))

height++;

For empty trees, leave the nodes assigned later, only the height of the tree is adjusted here

if (root->rnode==null) {

root->height=height;

Goto out;

}

By adding a single branch of subtree above the original root node. Thus the tree height reaches the specified value, the root node is replaced by the root of the new subtree, the leaf node of the new subtree points to the Huangen/node, and the new node has a pointer to the sub-node except for slot 0, and the rest of the slots are null pointers, that is, the characteristics of the leftmost single-branch tree. This adjustment is equivalent to adding a string of 0 to the high point of the original ID string, so that the original object ID value and its position in the new tree remain in the correct correspondence.

do{

unsigned int newheight;

if (! (NODE=RADIX_TREE_NODE_AIIOC (root)))

Return-enomem;

/*lncrease the height. */

Node->slots[0]=radix_tree_indirect_to_ptr (Root->rnode);

/*propagate the aggregated tag info into the new root*/

for (tag=0;tag<radix_tree_max_tags;tag++) {

if (Root_tag_get (Root,tag))

Tag_set (node,tag,0);

}

newheight=root->height+1;

node->height=newheight;

node->count=1;

Node=radix_tree_ptr_to_indirect (node);

Rcu_assign_pointer (Root->rnode,node);

root->height=newheight;

}whiie (height>root->height):

Out

return 0;

}

(5) Contraction of the tree.

Starting from the root node to check the node that conforms to the null condition of other slot pointers except for the No. 0 slot, until the node that encounters the nth layer does not meet this condition, the single branch of the 1~n-1 layer is shrunk, the node F is released along the way to the slab allocator, and the N layer node is used as the new root node:

static iniine void Radix_tree_shrink (struct radix_tree_root *root)

{

/*try to shrink tree height*/

Whiie (root->height>0) {

struct Radix_tree_ node *to_ free=root->rnode;

void* newptr;

bug_0n (!radix_free_is_indirect_ptr (To_free));

To_free=radix_tree_indirect_to_ptr (To_free);

The current node number of nodes is not equal to 1 to exit the loop

if (to_free->count!=1)

Break

The child node is not the No. 0 slot of the pointer pointing also exits the loop

if (!to_free->slots[0])

Break

Newptr the only child node pointer that holds the To_free

newptr=to_free->slots[0];

if (root->height>1)

Newptr=radix_free_ptr_to_indirect (NEWPTR);

Child node as the new root node

root->rnode=newptr;

The height of the root->height--;//tree is decreasing

/*must zeroed nodes into the siab*/

Tag_clear (to_free,0,0);

Tag_ciear (to_free,1,0);

∥ Releasing the To_free node

to_free->siots[0]=null;

to_free->count=0;

Radix_tree_node_free (To_free);

}

}

(6) Index query object according to the page indexes:

void *radix_tree_iookup (struct radix_tree_root *root, unsigned long index)

{

unsigned int height,shift;

struct Radix_tree_node *node,**slot;

Node=rcu_dereference (Root->rnode);

...

height=node->height;

if (index>radix_tree_maxindex (height))

return NULL;

Set the number of bits for the initial shift

shlft= (height-1) *radix_tree_map_shlft;

/* cycle-by-layer retrieval from top to bottom, get the slot index by index, shift number shift, and bitmask, and then get the slot Siot by the node and slot index of the current nodes. Node then points to the lower node indicated by the nodes ' pointer slot, and finally resizes the shift number shift and the current height level until it reaches level 0, when node points to the object */

do{

siot= (struct radix_tree_node**) (node->siots+ ((index>>shift) &radix_tree_map_mask));

Node=rcu_dereference (*siot);

if (node==null)

return NULL;

Shift-=radix_tree_map_shlft;

height--;

}whiie (height>0);

return node;

}

2.2 LINUX\IIB\IDR.C

The IDR mechanism is basically a set of methods for the cardinality tree. But it has more to find the function of the free ID, so it can not completely copy the above mechanism.

Specific code analysis slightly.

3. Peripheral Function 3.1 File page cache

The Add_to_page_cache () function call Radix_tree_insert () inserts the specified page into the specified file page cache cardinality tree at the specified location, find_get_page () looks for the page in the base tree of the address space for the specified index. Both of these functions are contained in the LINUX\MM\FILEMAP.C. Their actions on the cardinality tree are write and read, respectively, because the function of the cardinality tree in radix_tree.c has no synchronization means, so the peripheral function that calls them contains the synchronization measures. The two peripheral functions use a read-write lock on the address space, and Add_to_page_cache () calls WRITE_LOCK_IRQ (&mapping->tree_lock) before calling Radix_tree_insert (). A write lock is made, while Find_get_page () calls READ_LOCK_IRQ (&mapping->tree_lock) and reads the lock.

3.2 The IDR mechanism of the IPC

Ipc_findkey () calls Idr_find () from 0 to start traversing the cardinality tree until the object of the specified key value is found; Ipc_addid () calls Idr_get_new () joins the object to the IDR tree and returns the Id;ipc_rmid () call to the location corresponding to IDR_ Remove () Removes the object of the specified ID from the IDR tree, which is contained in the LINUX\IPC\UNIC.C. Their synchronization problems are guaranteed by the outermost IPC function using read-write semaphores, such as the call Path of Ipc_rmid () Shm_close ()->shm_destroy ()->shm_rmid ()->ipc_rmid (), shm_ Close () uses Down_write (&shm_ids (NS). R_-mutex); The IDs of the shared memory are locked, which sacrifices some concurrency, but ensures the consistency of the data. Later versions are estimated to use finer-grained locks or better concurrency mechanisms. Similarly, the call path for Ipc_addid () is Sys_shmget ()->ipcget ()->ipcget_new ()->newseg ()->shm_addid ()->ipc_addid (), Down_write (&ids->rw_mutex) is also used in ipcget_new (), and the entire IDs is locked by writing.

5. Conclusion

Fixed arrays are the most straightforward and fastest for data structures that target objects based on IDs . And the logical operation plus shift

The combination of operations is followed by a hash of the hash function. but the array applies to the object number changes or the maximum number of objects is not a lot of occasions, it is not suitable for the sparse distribution of objects, otherwise the waste of memory is more serious, and the hash list in the query and insert delete requires locking the entire table, for frequent sharing occasions can cause poor concurrency performance, Also, because of the lack of mapping uniqueness for the location and ID of the array , this does not apply to situations where an ID is automatically generated. Base tree is complementary to each other, its search performance within the acceptable range, memory consumption is not small, but also dynamic, can shrink or expand on demand. More importantly, it has a unique mapping of the location and ID as well as the array, making it easy to generate ID values while adding new objects, which is not a hash list. In addition, many of these trees can be created in the system, which also improves concurrency performance.

Original address: http://blog.chinaunix.net/uid-28541347-id-5018036.html

Go Application analysis of Linux kernel cardinality tree

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.