Xen PV Disk indirection descriptor

Source: Internet
Author: User
Tags memory usage

Early Xen used the "ring" to exchange Io between guest and driver domain, because of the design constraints, the maximum IO for one processing was 1408K, so it became a performance bottleneck when dealing with large IO. In order to solve this problem, the Xen community has put forward the concept of indirect descriptors by reference to the realization of virtio.

The following references and translations from: Indirect descriptors for Xen PV disks Xen PV Disk Protocol

The Xen PV disk protocol is very simple, and it only needs to map a 4K shared memory page between guest and driver domain to handle requests and read responses. The front and rear drive structure is also very simple, which is what we often call "ring". The ring contains a loop queue, and some counter for request response.

When the index is dropped, the maximum number of requests that can be accommodated in the ring is rounded down to an n power of 2, and the result is the number of requests and responses that can be accommodated in the ring. According to this calculation, the number of requests that can be accommodated in the ring is 32 ((4096–64)/112 = 36).

The following is the code structure of the request:

struct Blkif_request {
    uint8_t        operation;
    uint8_t        nr_segments;
    blkif_vdev_t   handle;
    uint64_t       ID;
    blkif_sector_t Sector_number;
    struct Blkif_request_segment {
        grant_ref_t gref;
        uint8_t     First_sect, last_sect;
    } seg[11];
};

The author omitted some fields not covered by this article

In this structure, the most important is the "seg", because it holds the IO data. Each seg holds a reference to the grant page, which is the shared memory page that the front and back ends use to transfer data. If each seg holds a reference to a 4K shared memory page in the 11 seg, a maximum of 44K of data is transmitted in a request (4KB * = 44KB). A ring can hold 32 requests, then the maximum amount of data for a class transfer is 1408KB (44KB * = 1408KB). This amount of data is not very large, and today's disks are basically ready to be processed quickly. So this way of implementation becomes the bottleneck of PV protocol itself.

Both Intel and Citrix have proposed solutions to this problem separately. Intel's scenario is to create a ring that accommodates only the SEG, which can increase the maximum amount of data for a request to 4MB. Citrix introduces multiple pages to the ring to increase the amount of data that can be accommodated on the ring. But neither of these approaches is in the first job. Indirect descriptors and Intel's scenario are similar, both to increase the amount of data requested, but the Indirect descriptors can accommodate a much larger amount of data than 4MB. Xen indirect descriptors implementation

Konrad recommended to use Virtio's indirect descriptors to solve this problem, and the original author also very much agree that this is the best solution. Because this approach increases the amount of data a request can hold, modern storage devices are more likely to handle large requests. Indirect Descriptors introduces a new request type for read-write operations only, placing the SEG in the shared memory page instead of directly into the request. Here is the structure of the indirect request:

struct Blkif_request_indirect {
    uint8_t        operation;
    uint8_t        Indirect_op;
    uint16_t       nr_segments;
    uint64_t       ID;
    blkif_sector_t Sector_number;
    blkif_vdev_t   handle;
    uint16_t       _pad2;
    grant_ref_t    indirect_grefs[8];
};

The main is to replace the SEG array with the grant reference array, and none of the grant shared memory pages are populated with the following structure:

struct blkif_request_segment_aligned {
    grant_ref_t gref;
    uint8_t     First_sect, Last_sect;
    uint16_t    _pad;
};

This structure is 8 bytes, which means that a shared page can hold 512 segment. By definition, a total of 8 grant, which is a request for a maximum of 4,096 segment, can be counted down to 16MB of data, which is much larger than the previous 44KB. If all is indirect segment, a ring can hold up to 512MB of data.

This amount of data is obviously large, but in order to maintain the balance between memory usage and disk throughput, the maximum number of segment for a request in the back-end driver is set to 256, and the front-end driver is set to 32. If the user needs to, you can also change this parameter in the startup options, such as the following to add a startup option:

Xen_blkfront.max=64

This will change the number of segment from 32 to 64. For different stores, there is no rule that specifies how much of this value is appropriate. So for different storage, it's best to test it.

The indirect descriptors of the PV block protocol is already in the 3.11 kernel, to determine that both the guest and the Driverdomain kernel support indirect descriptors.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.