[Linux drive]linux block device learning Note (i)

Source: Internet
Author: User

1, distinguish between block devices and character devices: Block devices are hardware that can randomly access a fixed-size piece of data in a system.
, the sector is the smallest addressable unit physically on all block devices, typically 512Byte in size, which is the smallest addressable unit of the file system, the size of which is an integral multiple of the sector, and cannot exceed the size of one page ~

When you operate a block device, you need to have a corresponding buffer in memory, represented by a struct buffer_head struct.

[CPP]View Plaincopy
  1. struct Buffer_head {
  2. unsigned long b_state; /* Buffer state bitmap (see above) * Buffers status flag/
  3. struct Buffer_head *b_this_page;/* circular List of page ' s buffers * pages in buffer/
  4. struct page *b_page; /* The page this BH was mapped to * storage buffer pages/
  5. sector_t B_BLOCKNR; /* Start Block number * Logical Block No./
  6. size_t B_size; /* Size of mapping * block sizes/
  7. Char *b_data; /* Corresponds to a private data pointer, and for FAT32, the structure points to the fat_boot_sector struct/
  8. struct Block_device *b_bdev; The corresponding block device
  9. bh_end_io_t *b_end_io; /* I/O completion */
  10. void *b_private; /* reserved for B_end_io *i/o completed method/
  11. struct List_head b_assoc_buffers; /* associated with another mapping */
  12. struct Address_space *b_assoc_map; /* Mapping this buffer is
  13. associated with a buffer corresponding to the map, i.e. address_space/
  14. atomic_t B_count; /* The users using this buffer_head * Indicates the usage count of the buffer/


2,bio Structural Body
Currently the basic container for block IO operations in the kernel is represented by the bio struct, which typically has 1 bio corresponding to 1 I/O requests, and the IO scheduling algorithm merges the continuous bio into 1 requests. As a result, 1 requests can contain multiple bio.

[CPP]View Plaincopy
  1. Truct Bio {
  2. sector_t Bi_sector; / * Associated sector on disk * /
  3. struct bio *bi_next; / * List of requests * /
  4. struct Block_device *bi_bdev; / * Associated block device * /
  5. unsigned long bi_flags; / * Status and command flags * *
  6. unsigned long bi_rw; / * Read or write? * /
  7. unsigned short bi_vcnt; /* Number of bio_vecs off * /
  8. unsigned short bi_idx; / * Current index in Bi_io_vec * /
  9. unsigned short bi_phys_segments; /* Number of segments after coalescing * /
  10. unsigned short bi_hw_segments; /* Number of segments after remapping * /
  11. unsigned int bi_size; /* I/O count */
  12. unsigned int bi_hw_front_size; /* Size of the first mergeable segment * /
  13. unsigned int bi_hw_back_size; /* Size of the last mergeable segment * /
  14. unsigned int bi_max_vecs; / * Maximum bio_vecs possible * /
  15. struct Bio_vec *bi_io_vec; / * Bio_vec list * /
  16. bio_end_io_t *bi_end_io; /* I/O completion method */
  17. atomic_t bi_cnt; / * Usage counter * /
  18. void *bi_private; / * Owner-private method * /
  19. bio_destructor_t *bi_destructor; / * destructor method * /
  20. };
  21. struct Bio_vec {
  22. / * Pointer to the physical page in which this buffer resides * /
  23. struct page *bv_page;
  24. /* The length in bytes of this buffer * /
  25. unsigned int bv_len;
  26. / * The byte offset within the page where the buffer resides * /
  27. unsigned int bv_offset;
  28. };

In the 2.6 kernel, the work that was originally done by buffer_head a structure is now done together by Buffer_head and bio. Now, Buffer_head only gives the upper layer the current state of the block it describes, and bio is responsible for merging as many blocks as possible, passing it to the lower driver, and eventually writing to the hard disk. That is, Buffer_head is responsible for describing the mapping of disk blocks to physical memory, and the bio is responsible for all block I/O operations of the container

2, request queue
Upper-level file system if you want to read and write to the underlying device data, they first send their request to the request queue, which is represented by the Reques_queue struct, which contains
Two-way request linked list and related control information, the specific request in the queue is represented by the struct request, each request can be composed of more than one bio structure,


3,io Dispatch Program
The IO Scheduler attempts to sort the request queue by sector, making the request order of the head as if it were a direction.

To optimize block-device addressing operations, the IO Scheduler performs the pre-operation of merging and sequencing when a request is submitted to disk, which improves the overall performance of the system. Merging means that the IO Scheduler merges requests when two requests are close to the sector of the underlying block device request. Sorting means making multiple requests in an orderly arrangement of the underlying block device sectors


2.4 In the kernel is the Linus Elevator IO Scheduler, the disadvantage is that when a disk area frequent operation will make the disk other location request not run the opportunity, may cause the read-write hunger situation in order to improve the shortcomings of Linus elevator, followed by the deadline IO Scheduler,

When a new request is added to the I/O request queue, the Linus Elevator Scheduler may take the following 4 actions:
If a request for an adjacent disk sector operation already exists in the queue, the new request will be merged into a single request with the existing request
If there is a long-lived request in the queue, the new request is traced to the tail of the queue, preventing the old request from starving
If there is an appropriate insertion position in the queue with a sector orientation, the new request is inserted into the location, ensuring that the requests in the queue are ordered in the physical location of the accessed disk.
If a suitable request insertion location does not exist in the queue, the request will be inserted at the tail of the queue

The deadline IO Scheduler has a timeout for each request, the default read request is 500ms, the write request time-out is 5s, and the end-of-term IO schedule maintains a FIFO queue, in addition to maintaining a sort queue addressed by the disk sector. The drawback of this scheduler is that it reduces system throughput (and the response time requirements for read and write requests are not the same, in general, the response time requirement for write requests is not high, the write request can be executed asynchronously with the application committing it, but the read request is generally executed synchronously with the application that submitted it. The application will not proceed until it has obtained the read data. )

When a new request is added to the I/O request queue, the final term I/O scheduling has the following actions compared to the Linus elevator schedule:
1. New requests are added to the sort queue (Order-fifo), the method of joining is similar to the method of Linus the new request of elevator
2. Depending on the type of new request, add it to the end of the read queue (READ-FIFO) or write Queue (WIRTE-FIFO) (the read-write queue is sorted by join time, so the new request is added to the tail)
3. The scheduler first determines whether the read, write queue header request timed out, if timeout, from read, write queue header take out the request, add to the dispatch queue (DISPATCH-FIFO)
4. If there is no timeout request, remove a request from the sort queue (Order-fifo) header to the dispatch queue (DISPATCH-FIFO)
5. Dispatch queue (DISPATCH-FIFO) submits requests to disk drive sequentially, completing I/O operations


The Predictive IO Scheduler, on the basis of the deadline IO Scheduler, waits a few milliseconds each time a read request is processed, with a default of 6ms

The end-of-term I/O scheduling algorithm takes precedence over the response time of a read request, but when the system is in a heavy write state, the throughput of the system is greatly reduced. Because the time-out of the read request is relatively short, each time a read request is made, the write request is interrupted, the disk is addressed to the read location, and the read operation is completed before it is returned to continue writing. This approach guarantees the response speed of the read request, but impairs the system's global throughput (the head goes first to read and then back to write, and there are 2 addressing operations)
However, when a new request is added to the I/O request queue, the predicted I/O schedule has the following actions compared to the deadline I/O scheduling:
1. After the new read request is submitted, the request is not processed immediately, but is intentionally waiting for a moment (default is 6ms)
2. Wait period if there are other read requests to the adjacent location of the disk to join, will immediately process these read requests
3. Wait time if there are no other read requests to join, then the wait is the equivalent of wasting
4. After the wait time is over, continue with the remaining requests

Linux system default IO Scheduler is a fully-queued IO scheduler, the CFQ IO Scheduler puts incoming IO requests into a particular queue, which is based on the process organization that caused the IO request, the subsequent re-merging and sequencing of each queue, and the CFQ IO Scheduler schedules the queue in time-slice rotation, Select the number of requests from each queue, the default value is 4

Fully-justified queueing (complete Fair Queuing, CFQ) I/O scheduling is designed for a proprietary workload, which is fundamentally different from the I/O scheduling mentioned earlier.
CFQ I/O scheduling algorithm, each process has its own I/O queue,
The CFQ I/O Scheduler schedules queues in time slices, selects a certain number of requests from each queue (default 4), and then schedules the next round.
CFQ I/O scheduling provides fairness at the process level, and its implementation is located at: BLOCK/CFQ-IOSCHED.C

[Linux drive]linux block device learning Note (i)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.