Page cache processing functions

Source: Internet
Author: User
Tags blank page

Now, after learning about the data structure related to the page cache, let's introduce the basic page cache processing functions:

 

Basic advanced functions for page cache operations include searching, adding, and deleting pages. Based on the above functions, there is another function to ensure that the cache contains the latest version of the specified page.

 

1. Search Page

 

The find_get_page () function receives the pointer and offset to the address_space object. It obtains the spin lock of the address space and calls the radix_tree_lookup () function to search for the leaf node of the base tree with the specified offset:
Struct page * find_get_page (struct address_space * mapping, unsigned long offset)
{
Struct page * page;

Read_lock_irq (& Mapping-> tree_lock );
Page = radix_tree_lookup (& Mapping-> page_tree, offset );
If (page)
Page_cache_get (PAGE );
Read_unlock_irq (& Mapping-> tree_lock );
Return page;
}

The radix_tree_lookup function starts from the root of the tree and searches down based on the bits in the offset value, as described in the preceding section. If a null pointer is encountered, the function returns NULL; otherwise, the address of the leaf node is returned, that is, the required page descriptor pointer. If the required page is found, the find_get_page () function adds the counter for this page, releases the spin lock, and returns the address of this page; otherwise, the function releases the spin lock and returns NULL:
Void * radix_tree_lookup (struct radix_tree_root * root, unsigned long index)
{
Void ** slot;

Slot = _ lookup_slot (root, index );
Return slot! = NULL? * Slot: NULL;
}

Static inline void ** _ lookup_slot (struct radix_tree_root * root,
Unsigned long index)
{
Unsigned int height, shift;
Struct radix_tree_node ** slot;

Height = root-> height;

If (index> radix_tree_maxindex (height ))
Return NULL;

If (Height = 0 & root-> rnode)
Return (void **) & root-> rnode;

Shift = (height-1) * radix_tree_map_shift;
Slot = & root-> rnode;

While (height> 0 ){
If (* slot = NULL)
Return NULL;

Slot = (struct radix_tree_node **)
(* Slot)-> slots +
(Index> shift) & radix_tree_map_mask ));
Shift-= radix_tree_map_shift;
Height --;
}

Return (void **) slot;
}

The find_get_pages () function is similar to find_get_page (), but it allows you to find a group of pages with adjacent indexes in the cache. It receives the following parameters: the pointer to the address_space object, the offset of the address space relative to the start position of the search, the maximum number of pages retrieved, and the pointer to the page descriptor array assigned by the function. Find_get_pages () relies on the radix_tree_gang_lookup () function to perform the search operation. The radix_tree_gang_lookup () function assigns values to the pointer array and returns the number of pages found. Although some pages may not be cached in the high-speed page and there will be blank page indexes, the returned pages will still return an increasing index value.

 

There are also several other functions to implement the search operation on the page cache. For example, the find_lock_page () function is similar to the find_get_page () function, but it adds the Use Recorder for the returned page and calls lock_page () to set the pg_locked flag, therefore, when the function returns, the caller can access the returned page in a mutually exclusive manner. Then, if the page has been locked, the lock_page () function will block the current process. Finally, it calls the _ wait_on_bit_lock () function when the position of pg_locked. The subsequent functions set the current process to the task_uninterruptible state, store the process descriptor to the waiting queue, execute the sync_page method of the address_space object to cancel the Request queue of the block device where the file is located, and finally call schedule () function to suspend the process until the pg_locked flag is cleared to 0. The kernel uses the unlock_page () function to unlock the page and wake up the sleeping process in the waiting queue.

 

The find_trylock_page () function is similar to find_lock_page (). The difference is that find_trylock_page () is never blocked. If the requested page is locked, the function returns an error code. The find_or_create_page () function executes find_lock_page (). However, if the requested page cannot be found, a new page is allocated and the page is inserted into the cache.

 

2 add page

 

The add_to_page_cache () function inserts the descriptor of a new page into the page cache. It receives page descriptor address pages, address_space object address mapping, the offset of the page index value in the address space and the memory allocation mark gfp_mask used when a new node is allocated to the base tree. The function performs the following operations:
Int add_to_page_cache (struct page * Page, struct address_space * mapping,
Pgoff_t offset, gfp_t gfp_mask)
{
Int error = radix_tree_preload (gfp_mask &~ _ Gfp_highmem );

If (error = 0 ){
Write_lock_irq (& Mapping-> tree_lock );
Error = radix_tree_insert (& Mapping-> page_tree, offset, page );
If (! Error ){
Page_cache_get (PAGE );
Setpagelocked (PAGE );
Page-> mapping = mapping;
Page-> Index = offset;
Mapping-> nrpages ++;
Trace_add_to_page_cache (mapping, offset );
_ Inc_zone_page_state (page, nr_file_pages );
}
Write_unlock_irq (& Mapping-> tree_lock );
Radix_tree_preload_end ();
}
Return Error;
}

 

Add_to_page_cache first calls the radix_tree_preload () function, which disables kernel preemption and assigns some empty radix_tree_node structures to the radix_tree_preloads variable per CPU. The allocation of the radix_tree_node structure is completed by the Slab High-speed cache radix_tree_node_cache. If the structure of the radix_tree_preload () pre-allocated radix_tree_node is unsuccessful, the add_to_page_cache () function terminates and returns the error code-enomem. Otherwise, if radix_tree_preload () is successfully pre-allocated, the add_to_page_cache () function will not be able to insert a new page descriptor because of the lack of free memory or because the file size reaches 64 GB:
Int radix_tree_preload (gfp_t gfp_mask)
{
Struct radix_tree_preload * RTP;
Struct radix_tree_node * node;
Int ret =-enomem;

Preempt_disable ();
RTP = & __ get_cpu_var (radix_tree_preloads );
While (RTP-> Nr <array_size (RTP-> nodes )){
Preempt_enable ();
Node = kmem_cache_alloc (radix_tree_node_cachu, gfp_mask );
If (node = NULL)
Goto out;
Preempt_disable ();
RTP = & __ get_cpu_var (radix_tree_preloads );
If (RTP-> Nr <array_size (RTP-> nodes ))
RTP-> nodes [RTP-> Nr ++] = node;
Else
Kmem_cache_free (radix_tree_node_cache, node );
}
Ret = 0;
Out:
Return ret;
}

 

Add_to_page_cache then gets mapping-> tree_lock spin lock -- note that the radix_tree_preload () function has disabled kernel preemption.

 

Call radix_tree_insert () to insert a new node into the tree. This function performs the following operations:
Int radix_tree_insert (struct radix_tree_root * root,
Unsigned long index, void * item)
{
Struct radix_tree_node * node = NULL, * slot;
Unsigned int height, shift;
Int offset;
Int error;

/* Make sure the tree is high enough .*/
If (index> radix_tree_maxindex (root-> height )){
Error = radix_tree_extend (root, index );
If (error)
Return Error;
}

Slot = root-> rnode;
Height = root-> height;
Shift = (height-1) * radix_tree_map_shift;

Offset = 0;/* uninitialised var warning */
While (height> 0 ){
If (slot = NULL ){
/* Have to add a child node .*/
If (! (Slot = radix_tree_node_alloc (Root )))
Return-enomem;
If (node ){
Node-> slots [offset] = slot;
Node-> count ++;
} Else
Root-> rnode = slot;
}

/* Go A level down */
Offset = (index> shift) & radix_tree_map_mask;
Node = slot;
Slot = node-> slots [offset];
Shift-= radix_tree_map_shift;
Height --;
}

If (slot! = NULL)
Return-eexist;

If (node ){
Node-> count ++;
Node-> slots [offset] = item;
Bug_on (tag_get (node, 0, offset ));
Bug_on (tag_get (node, 1, offset ));
} Else {
Root-> rnode = item;
Bug_on (root_tag_get (root, 0 ));
Bug_on (root_tag_get (root, 1 ));
}

Return 0;
}

 

Note: radix_tree_insert first calls radix_tree_maxindex () to obtain the maximum index. This index may be inserted into the base tree with the current depth. If the index of the new page cannot be expressed as the current depth, radix_tree_extend () is called () increase the depth of the tree by adding an appropriate number of nodes:
Static int radix_tree_extend (struct radix_tree_root * root, unsigned long index)
{
Struct radix_tree_node * node;
Unsigned int height;
Int tag;

/* Figure out what the height should be .*/
Height = root-> height + 1;
While (index> radix_tree_maxindex (height ))
Height ++;

If (root-> rnode = NULL ){
Root-> Height = height;
Goto out;
}

Do {
If (! (Node = radix_tree_node_alloc (Root )))
Return-enomem;

/* Increase the height .*/
Node-> slots [0] = root-> rnode;

/* Propagate the aggregated tag info into the new root */
For (TAG = 0; tag <radix_tree_max_tags; tag ++ ){
If (root_tag_get (root, tag ))
Tag_set (node, Tag, 0 );
}

Node-> COUNT = 1;
Root-> rnode = node;
Root-> height ++;
} While (height> root-> height );
Out:
Return 0;
}

 

The new node is allocated by executing the radix_tree_node_alloc () function, which attempts to obtain the radix_tree_node structure from the slab distributor's high-speed cache. If the allocation fails, obtain the radix_tree_node structure from the pre-allocated structure pool in radix_tree_preloads:
Static struct radix_tree_node *
Radix_tree_node_alloc (struct radix_tree_root * root)
{
Struct radix_tree_node * ret;
Gfp_t gfp_mask = root_gfp_mask (Root );

Ret = kmem_cache_alloc (radix_tree_node_cachu, gfp_mask );
If (ret = NULL &&! (Gfp_mask & _ gfp_wait )){
Struct radix_tree_preload * RTP;

RTP = & __ get_cpu_var (radix_tree_preloads );
If (RTP-> nr ){
Ret = RTP-> nodes [RTP-> nR-1];
RTP-> nodes [RTP-> nR-1] = NULL;
RTP-> NR --;
}
}
Return ret;
}

 

Radix_tree_insert then, based on the offset of the page index, traverse the tree from the root node (mapping-> page_tree) until the leaf node. If necessary, call radix_tree_node_alloc () to allocate a new intermediate node.

 

Radix_tree_insert stores the page descriptor address in the appropriate location of the last node traversed by the base tree, and returns 0:
Node-> count ++;
Node-> slots [offset] = item;

 

Return to the add_to_page_cache () function. After the page is allocated, add the counter page-> count for the page descriptor.

 

Because the page is new, its content is invalid: The pg_locked flag of the function setting page box is used to prevent other kernel paths from accessing the page concurrently.

Use the mapping and offset parameters to initialize page-> mapping and page-> index. (Important !!!!)

 

The add_to_page_cache () function increments the counter (mapping-> nrpages) on the page cached by the address space, releases the spin lock of the address space, and CALLS radix_tree_preload_end () to re-enable kernel preemption, returns 0 (successful ).

 

3. Delete page

 

The remove_from_page_cache () function deletes the page descriptor from the page cache using the following steps:

 

1. Get the spin lock page-> mapping-> tree_lock and disable it.

 

2. Call the radix_tree_delete () function to delete nodes from the tree. This function receives the address of the root (page-> mapping-> page_tree) and the page index to be deleted as the parameter, and performs the following steps:

A) as described in the preceding section, the tree is traversed from the root node based on the page index until it reaches the leaf node. Create an array of the radix_tree_path structure to describe the path from the root to the leaf node corresponding to the page to be deleted.

B) starting from the last node (including the pointer to the page descriptor), the node in the path array starts the cyclic operation. For each node, set the element of the array pointing to the next node (or page descriptor) to null and decrease the Count field. If the Count value changes to 0, the node is deleted from the tree and the radix_tree_node structure is released to the slab distributor cache. Then, the nodes in the path array are processed cyclically. Otherwise, if count is not equal to 0, continue to the next step.

C) return the page descriptor pointer that has been deleted from the tree.

 

3. Set the page-> mapping field to null.

 

4. Subtract 1 from the value of the page-> mapping-> nrpages counter on the cached page.

 

5. Release the spin lock page-> mapping-> tree_lock, enable the interrupt, and terminate the function.

 

4. Update page

 

The read_cache_page () function ensures that the specified page of the latest version is included in the cache. Its parameter is the mapping pointer to the address_space object, indicating the value index of the offset of the requested page, the pointer to the function for reading page data from the disk filler (usually the function for implementing the readpage method of the address space) and the pointer data (usually null) passed to the filler function ), the following is a brief description of this function:

 

1. Call the find_get_page () function to check whether the page is already in the page cache.

 

2. If the page is not in the page cache, perform the following sub-steps:

A) Call alloc_pages () to allocate a new page.
B) Call add_to_page_cache () to insert the corresponding page descriptor in the page cache.
C) Call lru_cache_add () to insert the page into the non-active LRU linked list of the management area.

 

3. At this time, the requested page is already in the page cache. Call the mark_page_accessed () function to record the fact that the page has been accessed.

 

4. If the page is not the latest (pg_uptodate indicates 0), call the filler function to read the page from the disk.

 

5. Return the page descriptor address.

 

5. Tree marker

 

We have previously stressed that the page cache not only allows the kernel to quickly obtain the page containing the specified data in the block device, but also allows the kernel to quickly obtain the page in the specified status from the cache.

 

For example, we assume that the kernel must obtain all pages and dirty pages belonging to the specified owner from the cache (that is, its content has not been written back to the disk ). The pg_dirty flag in the page descriptor indicates whether the page is dirty. However, if most pages are not dirty, traverse the entire base tree to access all leaf nodes in order (page descriptor) the operation is too slow.

 

On the contrary, to quickly search for dirty pages, each intermediate node in the base tree contains a dirty mark for each child node (or leaf node, this flag is set when at least one child node's dirty mark is set. The dirty mark of the underlying node is usually a copy of The pg_dirty flag of the page descriptor. In this way, when the kernel traverses the base tree to search for dirty pages, it can skip all the sub-trees marked with dirty nodes as 0: if the dirty mark of the intermediate node is 0, all page descriptors in the subtree are not dirty.

 

The pg_writeback flag is applied to the same idea, indicating that the page is being written back to the disk. In this way, two page descriptors are introduced to each node of the base tree: pg_dirty and PG writeback. Each node's tags field contains two 64-bit arrays to store the two flags. The tags [0] (pagecache tag dirty) array is a dirty tag, while the tags [L] (pagecache tag writeback) array is a write-back tag.

 

When you set the pg_dirty or pg_writeback flag of a page in high-speed cache, The radix_tree_tag_set () function is called. It acts on three parameters: the root of the base tree, the index of the page, and the type of the tag to be set (pagecache tag dirty or pagecache tag writeback ). The function starts from the root of the tree and goes down to the leaf node corresponding to the specified index. For each node from the root to the leaf path, the function sets the flag using the pointer pointing to the next node in the path. Then, the function returns the page descriptor address. The result is that all the nodes in the path from the root node to the leaf node are marked in an appropriate way.

 

When you clear the pg_dirty or pg_writeback flag of a page in the cache, call the radix_tree_tag_clear () function. Its parameters are the same as those of the radix_tree_tag_set () function. The function starts from the root of the tree and goes down to the leaf node to create an array describing the radix_tree_path structure of the path. Then, the function moves backward from the leaf node to the root node: clears the labels of the underlying node, and then checks whether all the labels in the node array are cleared. If yes, the function clears the parent node of the upper layer by 0, and continues the preceding operation. Finally, the function returns the page descriptor address.

 

When deleting a page descriptor from the base tree, you must update the corresponding tag of the node from the root node to the path of the leaf node. The radix_tree_delete () function can perform this operation correctly (though this is not mentioned in the previous section ). The radix_tree_insert () function does not update the flag, because the pg_dirty and pg_writeback flags of all page descriptors inserted into the base tree are considered to be cleared. If necessary, the kernel can then call the radix_tree_tag_set () function ().

The radix_tree_tagged () function uses the flag array of all nodes in the tree to test whether the base tree contains at least one page in the specified state. The function can easily complete this task by executing the following code (root refers to the pointer to the base tree radix_tree_root structure, tag is the tag to be tested ):
For (idx = 0; idx <2; idx ++ ){
If (root-> rnode-> tags [tag] [idx])
Return 1;
}
Return 0;

 

The radix_tree_tagged () function only needs to check the flag at the first layer. An example of using this function is to determine whether an index node containing dirty pages needs to be written back to the disk. Note: during each cycle of a function, you must test whether a flag has been set in 32 unsigned long integers.

 

The find_get_pages_tag () function is similar to the find_get_pages () function. Only one difference is that the former returns only the pages marked with the tag parameter. This function is critical to quickly finding all the dirty pages of an index node.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.