Creation of vring (based on kernel 3.10, qemu2.0.0)

Source: Internet
Author: User
Tags data structures goto reset


The performance issues of device I/O virtualization on KVM have long existed, and the Virtio developed by Rusty Russell has attracted the attention of developers and has gradually been accepted by virtualization platforms such as KVM as the main general framework for the virtualization of I/O.

Virtio uses Virtqueue to implement its I/O mechanism, and each virtqueue is a queue that hosts a large amount of data. Vring is the specific implementation of Virtqueue.

Virtqueue: Abstraction of the Transport layer

Each device has multiple virtqueue for the transmission of large chunks of data. Virtqueue is a simple queue in which guest inserts buffers into which each buffer is a scatter-clustered array. The driver calls Find_vqs () to create a struct associated with the queue. The number of virtqueue varies depending on the device, such as a block device has a virtqueue,network device with 2 Virtqueue, one for sending packets and one for receiving packets.

In the Virtio device creation process, the resulting data structure is shown in the figure:

As you can see from the diagram, Virtio-netdev Associates two virtqueue, including a send queue and a receive queue, and the implementation of the specific queue is hosted by Vring.

The operations for Virtqueue include:


int virtqueue_add_buf (struct virtqueue *_vq, struct scatterlist sg[], unsigned int out, unsigned int in, void *data, Gfp_ T GFP)


Add_buf () is used to add a new buffer to the queue, parameter data is a non-empty token used to identify buffer, and data is returned when the buffer content is consumed. In fact, data is the address of SKB, returned to release SKB

Virtqueue_kick ()

The Guest notifies the host that a single or multiple buffer has been added to the queue, called virtqueue_notify (), and the Notify function writes to the queue notify (virtio_pci_queue_notify) register que UE Index to notify host.


void *virtqueue_get_buf (struct virtqueue *_vq, unsigned int *len)


Returns the length of the data used by the Buffer,len written to buffer. Gets the data, releases buffer, and updates index in the Vring descriptor table.

VIRTQUEUE_DISABLE_CB ()

The guest no longer needs to know that a buffer has been used, that is, shutting down device interrupts. The driver registers a callback function at initialization time, and DISABLE_CB () is typically used in this virtqueue callback function to close the callback function call again.

VIRTQUEUE_ENABLE_CB ()

In contrast to DISABLE_CB (), it is used to re-enable escalation of device interrupts.

vring the creation process

After the Virtio Netdev driver is loaded, virtnet_probe is called to identify, create, and initialize the device.

static struct Virtio_driver Virtio_net_driver = {

. feature_table = Features,

. Feature_table_size = Array_size (Features),

. Driver.name = Kbuild_modname,

. Driver.owner = This_module,

. id_table = id_table,

. Probe = Virtnet_probe, <------------recognition, initializing the portal

. remove = Virtnet_remove,

. config_changed = virtnet_config_changed,

#ifdef Config_pm_sleep

. Freeze = Virtnet_freeze,

. Restore = Virtnet_restore,

#endif

};

The configuration of the properties of Virtio net device, the initialization and registration of network devices, and the creation of Vring are also included:

/* Allocate/initialize The RX/TX queues, and invoke Find_vqs */

Init_vqs (vi); Create and initialize a Send/Receive queue

--->virtnet_alloc_queues ()

--->virtnet_find_vqs ()

Virtnet_alloc_queues creates the send_queue and receive_queue structures in Virtnet_info in the diagram, and the send and receive queue are paired.

static int virtnet_alloc_queues (struct virtnet_info *vi)

{

int i;

VI->SQ = Kzalloc (sizeof (*VI->SQ) * vi->max_queue_pairs, Gfp_kernel);

if (!VI->SQ)

Goto ERR_SQ;

VI->RQ = Kzalloc (sizeof (*VI->RQ) * vi->max_queue_pairs, Gfp_kernel);

if (!VI->RQ)

Goto ERR_RQ;

Init_delayed_work (&vi->refill, refill_work);

for (i = 0; i < vi->max_queue_pairs; i++) {

Vi->rq[i].pages = NULL;

Netif_napi_add (Vi->dev, &vi->rq[i].napi, Virtnet_poll, napi_weight);

Sg_init_table (vi->rq[i].sg, Array_size (vi->rq[i].sg)); Initializing the scatterlist of the end of the collection

Ewma_init (&vi->rq[i].mrg_avg_pkt_len, 1, receive_avg_weight);

Sg_init_table (vi->sq[i].sg, Array_size (vi->sq[i].sg)); Initialize the scatterlist of the originator

}

return 0;

ERR_RQ:

Kfree (VI->SQ);

ERR_SQ:

Return-enomem;

}

Scatterlist is an array of data structures, with each member pointing to a page's address, offset, length, and so on.

Create vring with Find Vqs:

static int Virtnet_find_vqs (struct virtnet_info *vi)

{

......

Vi->vdev->config->find_vqs (Vi->vdev, Total_vqs, Vqs, callbacks, names);

......

}

The initialization of the config corresponding to the Vdev is in the PCI bus probe phase:

static int virtio_pci_probe (struct pci_dev *pci_dev, const struct PCI_DEVICE_ID *id)

{

......

Vp_dev->vdev.config = &virtio_pci_config_ops;

......

}

Virtio_pci_config_ops is a method of operation for the Virtio device configuration, consisting mainly of four parts 1. Read-write feature bit; 2. Read/write configuration space; 3. Read-write status bit; 4. Restart the device.

static const struct Virtio_config_ops Virtio_pci_config_ops = {

. get = Vp_get,//Read the domain of the Virtio configuration space

. Set = Vp_set,//Setting the domain for virtio configuration space

. Get_status = Vp_get_status,//Read status bit

. Set_status = Vp_set_status,//Set status bit

. reset = Vp_reset,//reset of the device

. Find_vqs = Vp_find_vqs, creation of//virtqueue

. Del_vqs = Vp_del_vqs, deletion of//virtqueue

. Get_features = Vp_get_features,

. Finalize_features = Vp_finalize_features,

. Bus_name = Vp_bus_name,

. set_vq_affinity = Vp_set_vq_affinity,

};

The most important of these is SETUP_VQ ():

/*

function function: Get a queue for the target device

Vdev: Target device

Index: The number of the queue to use for the target device

callback function for Callback:queue

Name:queue's name.

Msix_vec: Number of msix vectors used for queue

*/

static struct Virtqueue *setup_vq (struct virtio_device *vdev, unsigned index,

void (*callback) (struct virtqueue *vq),

const Char *name,

U16 Msix_vec)

{

Through the Virtio_pci_queue_sel configuration domain, select the number of the QUEUE we need index

Iowrite16 (index, VP_DEV->IOADDR + Virtio_pci_queue_sel);

Gets the index number of the QUEUE by reading the Virtio_pci_queue_num configuration domain.

num = ioread16 (vp_dev->ioaddr + virtio_pci_queue_num);

/* If NUM is 0, the queue is not available

Returns the address of the QUEUE by reading the VIRTIO_PCI_QUEUE_PFN configuration domain,

If the address of the queue is not empty, it is already in use and the queue is not available.

*/

if (!num | | ioread32 (VP_DEV->IOADDR + virtio_pci_queue_pfn))

Return Err_ptr (-enoent);

info = kmalloc (sizeof (struct virtio_pci_vq_info), gfp_kernel);

Calculate the amount of space required for vring

Size = page_align (vring_size (num, virtio_pci_vring_align));

A number of page spaces that are assigned a size are used by vring

Info->queue = alloc_pages_exact (size, Gfp_kernel|__gfp_zero);

/* Activate the queue

Vring addresses are advertised to QEMU through the VIRTIO_PCI_QUEUE_PFN configuration domain.

So the index number of the queue has size, there is space for Qemu to pass the block vring

Shared space interacts with guest for data

*/

Iowrite32 (Virt_to_phys (info->queue) >> Virtio_pci_queue_addr_shift,

VP_DEV->IOADDR + VIRTIO_PCI_QUEUE_PFN);

/* Create the vring * *

Specific initialization of the internal structure of the vring

VQ = Vring_new_virtqueue (index, Info->num, Virtio_pci_vring_align, Vdev,

True, Info->queue, Vp_notify, callback, name);

}

With regard to the initial value of NUM, when QEMU initializes the Virtionet device, the queue size is initialized to 256 (FILE:HW/VIRTIO-NET.C, line:999) of Num read in kernel:

Virtiodevice *virtio_net_init (devicestate *dev, nicconf *conf, virtio_net_conf *net)

{

...

N->RX_VQ = Virtio_add_queue (&n->vdev, VIRTIO_NET_HANDLE_RX);

...

N->TX_VQ = Virtio_add_queue (&n->vdev, VIRTIO_NET_HANDLE_TX_BH);

...

}


static void Virtio_net_device_realize (Devicestate *dev, Error **ERRP)
{
...

N->VQS[0].RX_VQ = Virtio_add_queue (Vdev, VIRTIO_NET_HANDLE_RX);

....

if (N->net_conf.tx &&!strcmp (n->net_conf.tx, "Timer")) {
N->VQS[0].TX_VQ = Virtio_add_queue (Vdev, 256,
Virtio_net_handle_tx_timer);
...
} else {
N->VQS[0].TX_VQ = Virtio_add_queue (Vdev, 256,
VIRTIO_NET_HANDLE_TX_BH);

...
}
N->CTRL_VQ = Virtio_add_queue (Vdev, Virtio_net_handle_ctrl);


...
}

Virtqueue *virtio_add_queue (virtiodevice *vdev, int queue_size,
void (*handle_output) (Virtiodevice *, Virtqueue *))
{
...
Vdev->vq[i].vring.num = queue_size;
Vdev->vq[i].vring.align = virtio_pci_vring_align;
Vdev->vq[i].handle_output = Handle_output;

...
}

The specific structure of the virtio_ring consists of 3 parts: Descriptor Arrays (descriptor table) are used to store some associated descriptors, each descriptor being a description of buffer, including a pair of address/length. The available ring (available ring) is used for the guest side to indicate that those descriptor chains are currently available. The used ring (used ring) is used to indicate that the host side indicates that those descriptors have been used.

The number of ring must be a power of 2. The structure is shown in the following figure:

The vring descriptor is used to point to the buffer used by the guest.
Addr:guest Physical Address
Length of Len:buffer
The value meanings of the flags:flags include: Vring_desc_f_next: Indicates whether the next field in the current buffer is valid, or whether the current buffer is the last of the buffers list. Vring_desc_f_write: Current buffer is read-only or write-only. Vring_desc_f_indirect: Indicates that this buffer contains a list of buffer descriptors

Next: All buffers are concatenated together by next to form descriptor table

Multiple buffer consists of a list by descriptor table pointing to these lists.

The Available ring points to the descriptor that the guest provides to the device, which points to the header of a descriptor list.

The used ring points to the buffers used by the device (host).

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.