The concept of page buffering and block buffering in Linux

Source: Internet
Author: User
Tags mutex


Page buffering is described in detail in chapter 5.6 of the writing and reading of the "Linux Kernel Scenario Analysis" section, which is extracted here;

In the file system layer, there are three main data structures, file structure, dentry structure and inode structure.

File structure: A context that represents a target file, in which different processes can establish different contexts on the same file, and the same process can create multiple contexts by opening a file multiple times. Therefore, the buffer queue cannot be set on the file structure because none of these file structures are shared.

Dentry structure: The structure is the file name structure, through the soft/hard link can get multiple dentry structure corresponding to a file, Dentry structure and file is not a one-to-none relationship, so also can not build a buffer queue on the structure;

Inode structure: Obviously only the inode structure, the inode structure and the file is a one-to-one relationship, so to speak, the Inode is the representative of the file. The i_mapping pointer is set on the inode structure, which points to a address_space data structure, in which the data structure is generally inode->i_data, and the buffer queue is in the data structure;


It is not a record block but a memory page that hangs in the buffer queue, so when a process calls the mmap () function to map a file to its user space, it can naturally map these cached pages to the user space of the process as long as the corresponding memory mapping table is set. That's why he was named I_mapping.


Here also to understand the base tree concept, first look at the diagram (image from "Deep Linux kernel Architecture")


The cardinality tree is not a balanced tree, and the tree itself consists of two different data structures, the root node and the non-leaf node, and the root node is represented by a simple data structure that contains the height of the tree and the database that points to the first node that makes up the tree. The node is essentially an array, and count is the pointer count for that node, and the rest refers to pointers to the next level of nodes. And the leaf node is a pointer to the page;

The data structure on the nodes also contains the search tags, such as the dirty page mark and the write-back tag, which can be quickly specified on which page is marked;



Block buffer

The block buffer is structurally composed of two parts:

1. Buffer head: Contains all the management data related to the buffer state, block number, length, accessors, etc., these buffers are not stored directly behind the buffer head, but by the buffer head pointer pointing to the physical memory in a separate area.

2. Useful data is saved in a specially allocated page, which can also be used by colleagues in the page buffer.


Buffer head:

/* * Historically, a buffer_head is used to map a single block * within a page, and of course as the unit of I/O through  The * filesystem and block layers. Nowadays the basic I/O unit * is the bio, and buffer_heads be used for extracting block * mappings (via a get_block_t cal L), for tracking State within * A page (via a page_mapping) and for wrapping bio submission * For backward compatibility R Easons (e.g. submit_bh).      */struct buffer_head {unsigned long b_state; /* Buffer State bitmap (see above) *///buffers status ID, look below struct buffer_head *b_this_page;/* circular List of page ' s buffers *        Point to the next buffer header struct page *b_page;     /* The page this BH was mapped to *///points to the sector_t B_blocknr that owns the block buffer for the pages descriptor pointer;          /* Start block number *///block device logical block size_t b_size;           /* Size of mapping *///block sizes char *b_data;      /* Pointer to data within the page *///block position within the buffer page struct Block_device *b_bdev;//point to block device descriptor bh_end_io_t *b_end_io;    /* I/O completion *///i/o completion callback functionvoid *b_private; /* reserved for b_end_io *///points to the data parameter of the I/O completion callback function struct List_head b_assoc_buffers;  /* associated with another mapping */struct address_space *b_assoc_map;       /* Mapping This buffer was associated with */atomic_t b_count; /* Users using this buffer_head *///block use calculator};


Common flags for buffer headers

Enum Bh_state_bits {bh_uptodate,/* Contains valid data *///indicates that the buffer contains valid data Bh_dirty,/* is Dirty *///buffer is dirty BH _lock,/* is locked *///buffer locked bh_req,/* has been submitted for I/O *///initialize buffer and request data transfer bh_uptodate_lock,/* Us Ed by the first BH in a page, to Serialise * IO completion of other buffers in the page */BH _mapped,/* has a disk mapping *///b_bdev and B_BLOCKNR are valid bh_new,/* Disk mapping was newly created by Get_block *// /Just allocated has not visited Bh_async_read,/* is under end_buffer_async_read I/O *///asynchronous read the buffer bh_async_write,/* is under End_buffer_ Async_write I/O *///asynchronously writes the buffer bh_delay,/* buffer is not yet allocated on disk *///has not allocated buffer bh_boundary on disks,/* B Lock is followed by a discontiguity *///Bh_write_eio,/* I/O error on Write *///i/o error bh_unwritten,/* Buffer I s allocated on disk but not written */bh_quiet,/* buffer Error prinks to be Quiet */bh_meta,/* buffer conta INS metadata */BH_PRIO,/* Buffer should is submitted with Req_prio */bh_privatestart,/* not a state bit, but the first bit available * For private allocation by other entities */};


If a page is used as a buffer page, all buffer headers associated with its block buffers are collected in a one-way loop linked list. The private field of the buffer page descriptor points to the buffer header of the first block in the page, and the field in the B_this_page field of each buffer header is a pointer to the next buffer header in the list. The b_page of each buffer header points to the buffer page descriptor to which it belongs;



You can see that a buffer page corresponds to 4 buffers, which unifies the page cache and buffer cache. Modify buffers or buffer pages, and they all affect each other.



Address_space Structural Body:

struct Address_space {
    struct inode        *host;      /* owner:in Ode, Block_device *///inode for host file
    struct radix_tree_root  page_tree;  /* radix tree of all Pag Es *///cardinality tree root
    spinlock_t      tree_lock;  /* and lock protecting it *///cardinality tree lock
&N Bsp   unsigned int        i_mmap_writable;/* count vm_shared mappings *///vm_shared Shared Mapping page Count
    struct Rb_root      i_mmap;    /* Tree of private and shared mappings *///private and share mapped trees
    struct list_head    i_mmap_n Onlinear;/*list vm_nonlinear mappings *///linked list element for anonymous mapping
    struct mutex        I_MMAP_ Mutex  /* Protect tree, count, list *///mutexes containing trees
   /* Protected by Tree_lock together with the radix tree */


unsigned long nrpages; /* Number of total pages *///pages
pgoff_t writeback_index;/* writeback starts here *///write-back start
const struct Address_space_operations *a_ops; /* Methods *///function pointer
unsigned long flags; /* ERROR BITS/GFP Mask *///wrong code
struct Backing_dev_info *backing_dev_info; /* Device ReadAhead, etc *///equipment pre-read
spinlock_t Private_lock; /* For use by the address_space */
struct List_head private_list; /* Ditto */
void *private_data; /* Ditto */
} __attribute__ ((Aligned (sizeof (long)));


The struct inode *host and the struct radix_tree_root page_tree are associated with file and memory pages.




 346 struct Address_space_operations {347 int (*writepage) (struct page *page, struct Writeback_control *WBC);//write operation from page Writes to the owner's disk image 348 int (*readpage) (struct file *, struct page *);//read operation, read from owner disk image to page 349/* Write back some dirty Pages from this mapping. */351 int (*writepages) (struct address_space *, struct writeback_control *);//specified number of Owner dirty page Writeback disk 352 353/* Set a PA  GE Dirty. Return true if this dirtied it */354 int (*set_page_dirty) (struct page *page);//Set Owner's page to dirty page 355 356 int (*READPA GES) (struct file *filp, struct address_space *mapping, 357 struct list_head *pages, unsigned nr_pages);//read from disk Take the list of owners page 358 359 int (*write_begin) (struct file *, struct address_space *mapping, loff_t pos, uns  igned Len, unsigned flags, 361 struct page **pagep, void **fsdata);//362 int (*write_end) (struct file         *, struct address_space *mapping, 363 loff_t pos, unsigned len, unsigned copied, 364        struct page *page, void *fsdata); 365 366/* Unfortunately this kludge are needed for fibmap. Don ' t use it */367 sector_t (*bmap) (struct address_space *, sector_t); 368 void (*invalidatepage) (struct page *, unsigned long); 369 int (*releasepage) (struct page *, gfp_t); 370 void (*freepage) (struct page *); 371 ssize_t (*direct_io) (int, struct KIOCB *, const struct IOVEC *iov, 372 loff_t offset, unsigned long NR _segs); 373 Int (*get_xip_mem) (struct address_space *, pgoff_t, int, 374 void * *, unsigned long *); 375/* 376 * Migrate the contents of a page to the specified target. If Sync 377 * is false, it must not block. 378 */379 Int (*migratepage) (struct address_space *, 380 struct page *, struct page *, enum migrate _mode); 381 int (*launder_page) (struct page *); 382 int (*is_partially_uptodate) (struct page *, read_descriptor_t *, 383 unsigned long); 384 int (*error_remove_page) (struct address_space *, struct page *);                 385 386/* swapfile support */387 int (*swap_activate) (struct swap_info_struct *sis, struct file *file, 388 sector_t *span); 389 void (*swap_deactivate) (struct file *file); 390};  391























The concept of page buffering and block buffering in Linux

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.