The concept of page buffering and block buffering in Linux __linux

Source: Internet
Author: User
Tags data structures mutex


Page buffering in the "Linux kernel Scenario Analysis" book section 5.6 of the document written and read a chapter in the detailed, here excerpt down;

There are three main data structures in the file system layer, file structure, dentry structure and inode structure.

File structure: A context that represents a target file in which different processes can establish different contexts on the same file, and the same process can establish multiple contexts by opening a file multiple times. Therefore, buffer queues cannot be set on the file structure because they are not shared between these file structures.

Dentry structure: The structure is a filename structure, through the soft/hard link can get multiple dentry structure corresponding to a file, Dentry structure and file is not a one-to-one relationship, so also can not build a buffer queue on the structure;

Inode structure: It is clear that only the inode structure, inode structure and file is a one-to-one relationship, so that the inode is the representative file. The i_mapping pointer is set on the inode structure body, which points to a address_space data structure, generally the data structure is inode->i_data, and the buffer queue is in the data structure;


The buffer queue is not a record block but a memory page, so when a process calls the mmap () function to map a file to its user space, it can naturally map these cached pages to the user space of the process as long as it sets the corresponding memory-mapped table. That's why it was named I_mapping.


Here also understand the base tree concept, first look at the map (image from the "Deep Linux kernel Architecture")


The cardinality tree is not a balanced tree, the tree itself consists of two different data structures, root nodes and non-leaf nodes, the root node is represented by a simple data structure, which contains the height of the tree and the data structures that point to the first node that makes up the tree. The node is essentially an array, and count is the pointer count for the node, and the other is the pointer to the next-level node. And the leaf node is a pointer to the page;

The data structure on the node also contains search tags, such as dirty page tags and write-back tags, which can be used to quickly specify which pages are marked.



Block buffer

The block buffer is composed of two parts in structure:

1, buffer head: Contains all the management data related to the buffer state, block number, length, accessor, etc., these buffers are not stored directly after the buffer head, but by the buffer head pointer point to the physical memory of the isolated area.

2. Useful data is stored in specially allocated pages, which can also be used by colleagues in the page buffer.


Buffer head:

* * Historically, a buffer_head is used to map a single block * within a page, and of course as the unit of I/O throug  H The * filesystem and block layers. Nowadays the basic I/O unit * are the bio, and buffer_heads are used for extracting block * mappings (via a get_block_t C All), for tracking State within * A page (via a page_mapping) and for wrapping bio submission * For backward Compatibili
 Ty reasons (e.g. SUBMIT_BH).      * * struct Buffer_head {unsigned long b_state; /* Buffer State bitmap (see above) *///buffers status ID, look below struct buffer_head *b_this_page;/* circular List of page ' s buffers        *///points to the next buffer header struct page *b_page;     /* The page this BH is mapped to *///points to the page descriptor pointer that owns the block buffer sector_t B_BLOCKNR;          /* Start block number *///blocks the logical block of the device size_t b_size;           /* Size of mapping *///block char *b_data;      /* Pointer to data within the page *///block position within the buffer page struct Block_device *b_bdev;//point to block device descriptor bh_end_io_t *b_end_io; */I/O completioN *///I/O completes the callback function void *b_private; /* reserved for b_end_io *///a data parameter that points to the I/O completion callback function struct List_head b_assoc_buffers;  /* associated with another mapping * struct address_space *b_assoc_map;       /* Mapping This buffer was associated with/* atomic_t B_count;
 /* The users using this Buffer_head *///block uses calculator};


General flags for the head of the buffer

Enum Bh_state_bits {bh_uptodate,/* Contains valid data *///indicates that the buffer contains valid data bh_dirty, * is Dirty *///buffer is dirty Bh_lock,/* is locked *///buffer is locked bh_req,/* has been for I/O submitted initialize buffer and request data transfer *///. /* Used by the "in" a page, to Serialise * IO Completion's other buffers in the page * /bh_mapped,/* has a disk mapping *///b_bdev and B_BLOCKNR are valid bh_new,/* Disk mapping is newly created by get  _block *///has not yet accessed Bh_async_read, * is under end_buffer_async_read I/O *///read this buffer asynchronously Bh_async_write, * is under End_buffer_async_write I/O *///asynchronously writes the buffer bh_delay,/* the ' not ' yet allocated on disk *///has not allocated buffers on the disks Bh_bou Ndary,/* followed by a discontiguity *///Bh_write_eio,/* I/O error on Write *///i/o errors Bh_unwritte    N,/* buffer is allocated on disk but not written/bh_quiet,/* buffer Error prinks to be Quiet/Bh_meta, /* Buffer contains meTadata/Bh_prio, * Buffer should is submitted with Req_prio/bh_privatestart,/* not a state bit, but the
 The available * For private allocation by the other entities */};


If a page is used as a buffer page, all buffer header associated with its block buffer is collected in a one-way cyclic list. The private field of the buffer page descriptor points to the buffer header for the first block in the page, and the field in the B_this_page field of each buffer header is a pointer to the next buffer header in the list. The b_page of each buffer header points to the buffer page descriptor to which it belongs;



From the above figure you can see that a buffer page corresponds to 4 buffers, which unifies the page cache and buffer cache. Modify buffer or buffer page, they will affect each other.



Address_space Structure Body:

struct Address_space {
    struct inode        *host;      /* owner:i node, Block_device *///the inode that points to the host file
    struct radix_tree_root  page_tree;  /* radix tree of all P Ages *///Cardinal Tree's root
    spinlock_t      tree_lock;  /* and lock protecting it *///cardinal Tree Lock
    unsigned int        i_mmap_writable;/* count vm_shared mappings *///vm_shared shared Map page Count     struct rb_root      i_mmap;     * Tree of private and shared mappings *///private and shared mappings
    struct list_head    I_MMAP_ Nonlinear;/*list vm_nonlinear Mappings *///Anonymous mapping list element
    struct mutex        i_mmap_mut Ex  /* Protect tree, count, list *///mutex
   /Protected by Tree_lock together with the radix trees */


unsigned long nrpages; /* Number of total pages *///
pgoff_t writeback_index;/* writeback starts here *///back to the beginning of writing
const struct Address_space_operations *a_ops; /* Methods *///function pointer
unsigned long flags; /* ERROR BITS/GFP Mask *///fault code
struct Backing_dev_info *backing_dev_info; /* Device ReadAhead, etc *///equipment pre-read
spinlock_t Private_lock; /* For use by the address_space * *
struct List_head private_list; * Ditto * *
void *private_data; * Ditto * *
} __attribute__ ((Aligned (sizeof (long)));


struct inode *host and struct radix_tree_root page_tree are associated with file and memory pages.




 346 struct Address_space_operations {347 int (*writepage) (struct page *page, struct Writeback_control *WBC);/write operation, from Page writes to owner's disk image 348 int (*readpage) (struct file *, struct page *), read from owner disk image to page 349//write back some di Rty pages from this mapping.  */351 int (*writepages) (struct address_space *, struct writeback_control *);//specified number of Owner dirty page Writeback disk 352/* Set a  Page dirty. Return true if this dirtied it/354 int (*set_page_dirty) (struct page *page);//Set the owner's page to dirty page 355 356 int (*rea dpages) (struct file *filp, struct address_space *mapping, 357 struct list_head *pages, unsigned nr_pages); List of Read owner pages on disk 358 359 int (*write_begin) (struct file *, struct address_space *mapping, 360 loff_t p OS, unsigned len, unsigned flags, 361 struct page **pagep, void **fsdata);//362 int (*write_end) (St ruct file *, struct address_space *mapping, 363 loff_t pos, unsigned len, unsigned copIED, 364 struct page *page, void *fsdata); 365 366/* Unfortunately this kludge are needed for fibmap.
 Don ' t use it/367 sector_t (*bmap) (struct address_space *, sector_t);
 368 void (*invalidatepage) (struct page *, unsigned long);
 369 int (*releasepage) (struct page *, gfp_t);
 370 void (*freepage) (struct page *); 371 ssize_t (*direct_io) (int, struct KIOCB *, const struct IOVEC *iov, 372 loff_t offset, unsigned long n
 R_segs);
 373 Int (*get_xip_mem) (struct address_space *, pgoff_t, int, 374 void * *, unsigned long *); 375/* 376 * Migrate the contents of a page to the specified target.
 If Sync 377 * is false, the it must not block. 378 */379 Int (*migratepage) (struct address_space *, 380 struct page *, struct page *, enum Migra
 Te_mode);
 381 int (*launder_page) (struct page *); 382 int (*is_partially_uptodate) (struct page *, read_descriptor_t *, 3unsigned long);
 384 int (*error_remove_page) (struct address_space *, struct page *);                 385 386/* Swapfile support/387 int (*swap_activate) (struct swap_info_struct, *sis file struct, 388
 sector_t *span);
 389 void (*swap_deactivate) (struct file *file);
 390};  391























Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.