File System Learning Eight: page Cache in SVR4

Source: Internet
Author: User

In SVR4, all files read and write are pagecache. is different from a fixed-length physical cache, or Buffercache, or DNLC. The difference between Pagecache and other caches is that it can be swapped in or out according to software requirements. Also different from Buffercache: Buffercache uses devices and block numbers to index the cache, and Pagecache is indexed with Vnode and offset.


    • The composition of the Pagecache

A segment that supports SEG_MAP operations, such as the fault handling of page faults in the current segment;

A table of free page tables (list) with multiple uses


When the pageof file data used is swapped out from the MemoryCache (leaving the cache), it is added to the free page table;

You can also take it from the free page table anyway. However, it is important to note that although the page is on Freelist, its identifiers and data are still there, so that once the kernel wants to read the data again, it is no longer necessary to fetch it from the disk and remove the page from the Freelis directly.

On UNIX systems, it is actually organized in the following chunk way:


THESEGMAP structure is part of the kernel address space and isunderpinned bythe Segmap_datastructurethat describes the Pro Perties of the segment. The size of the Segmentis tunable and is split to Maxbsize (8KB) chunks where each 8KB chunk represents a 8KB window into A file. Each chunk was referenced by an smapstructurethat contains a pointer to a vnode for the file and the offset withinthe file.


You can see that the actual chunks capacity and entries are limited, and when all is exhausted, you need to move some SMAP out. However, given that there may be access to it later, it is placed on a freelist table so that the next time you read it, you do not need to read from disk again. This shows that Pagecache actually achieved the two-layer cache. This could be a previous incarnation of the ZFS level two cache. The overall diagram of the Pagecache related data structure in SVR4 is as follows:

650) this.width=650; "src=" Https://s3.51cto.com/wyfs02/M02/8E/78/wKiom1jBQ_HDuwW-AACKdITHnMw058.png "title=" 8_1. PNG "alt=" Wkiom1jbq_hduww-aackdithnmw058.png "/>


As can be seen, each process descriptor proc The Addressspace member in the data structure as points to a segment table of the current process, each segment corresponding to a SEGMAP_DATA data structure used to record page mappings, where the Smd_ SM points to a set of linked lists of data structures that contain vnode and offsets.


    • Operation of the Pagecache

When applying for Pagecache: the invocation of GETBLK () similar to Buffercache is implemented in Pagecache by calling Segmap_getmap ().


Addr_tsegmap_getmap (struct seg *seg, vnode_t *VP, uint_t *offset);

This function s_base from the SEG data structure to the S_base+s_size range, adding a node to the SMD_SM linked list whose member Segmap_data points to, and then returning the virtual address. It is important to note here that only the kernel virtual address is allocated, and there is no physical page box on the kernel and its corresponding.


When releasing Pagecache, call the Brelse () function similar to Buffercache.

int segmap_release (struct seg *seg, addr_t addr, U_int flags)


Note: This is an important difference between the Pagecache in SVR4 and the Pagecache in earlier other kernels, where the address returned by Segmap_getmap () in SVR4 does not have a real physical page box and its corresponding Instead, there is a resource for the physical page box when there is a real read operation in the back. The specific implementation process can be seen through the following example of a complete reading.


In the process of reading data from the file system, the Pagecache layer executes the following code at one end:

...............

Kaddr= Segmap_getmap (Segkmap, VP, 8192);

Uiomove (kaddr,1024, Uio_read, UIOP);

Segmap_release (SEGKMAP,KADDR, Sm_free);

...............

The virtual address returned by Segmap_getmap is passed to Uiomove (), which performs a specific read operation based on the UIO_READ flag. First, when it accesses the kaddr, because there is no physical page and its corresponding, so it will trigger a fault, it will kerneladdress space (Kas) from the kernel of all the recorded segments of the beginning address and length of information, indexed to the current address corresponding to the segment, And then call this section corresponding to the fault processing program, the core schematic code is as follows:

Segkmap->s_ops->fault (seg,addr, ssize, type, rw);


Depending on the s_base and addr parameters passed to the fault handle, you can find the vnode of the drink from the SMAP data structure and then invoke the Vop_getpage () function, which will apply the appropriate page and read the data back from disk. When this is done, the Pagefault function is processed, and then the Uiomove () continues to follow. For a more detailed procedure, refer to the following flowchart:

650) this.width=650; "src=" Https://s4.51cto.com/wyfs02/M00/8E/78/wKiom1jBRAuj5y20AACxC4NC_r4368.png "title=" 8_2. PNG "alt=" Wkiom1jbrauj5y20aacxc4nc_r4368.png "/>

Through the above process can be seen, SVR4 through the real reading when the pagefault mechanism to achieve the page only when the actual reading of the time allocated (delay distribution). The process of writing the file before the Segmap_release () function is similar to the read operation, and the possible difference is in its flags parameter, which may include the following:

Sm_write: page should be written back to the file by Vop_putpage ()

Sm_async: Pages can be written asynchronously

Sm_free: page can be freed up

Sm_inval: Invalid page

Sm_dontneed: The file system will no longer access the page


Does the above operation look like the basic operation of the physical cache?




This article from "Storage Chef" blog, reproduced please contact the author!

File System Learning Eight: page Cache in SVR4

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.