Qemu kvm memory Virtualization

Source: Internet
Author: User
I. qemu physical memory Registration
Cpu_register_physical_memory calls cpu_policy_set_memory
Cpu_policy_set_memory call kvm_client_set_memory
Kvm_client_set_memory call kvm_set_phys_mem
Kvm_set_phys_mem call kvm_set_user_memory_region
Kvm_set_user_memory_region calls kvm_vm_ioctl to enter the kernel
The kernel calls kvm_vm_ioctl_set_memory_region and calls the _ kvm_set_memory_region function.
The following code is available in the _ kvm_set_memory_region function:
738 ____ slots-> memslots [mem-> slot] = new;
739 ____ old_memslots = kvm-> memslots;
740 ____ rcu_assign_pointer (kvm-> memslots, slots );
741 ____ synchronize_srcu_expedited (& kvm-> srcu );
Therefore, the _ kvm_set_memory_region function creates and fills in a temporary kvm_memslots structure and assigns it to kvm-> memslots (global ).

2. Process User-state virtual addresses (mainly considering the situations where tlb cannot hit)
1. Check the physical tlb and call do_kvm_tlbmiss in the host if it cannot be hit.
2. do_kvm_tlbmiss first checks whether the address is an I/O address or an access address. If it is an access address, it further queries the table guest tlb. If no hit is found for guest tlb, the guest tlb miss exception will be injected into the guest system, and the guest kernel will fill the guest tlb according to the page table. When the guest calls the TLBWI privileged command, it will go into the host again, do_kvm_cpu Exception Handling
3. Simulate the TLBWI command in do_kvm_cpu, first fill in the guest tlb table item, and call kvmmips_update_shadow_tlb to update the physical tlb (shadow tlb)
4. In kvmmips_update_shadow_tlb, the gpa is converted into hpa through the gfn_to_page and page_to_phys functions, and then the hpa is filled into the physical tlb.
5. The gfn_to_page function (the key point I want to talk about) calls gfn_to_hva.
6. gfn_to_hva calls gfn_to_memslot and gfn_to_hva_memslot
The gfn_to_memslot code is as follows:
859 struct kvm_memory_slot * gfn_to_memslot (struct kvm * kvm, gfn_t gfn)
860 {
861 ____ int I;
862 ____ struct kvm_memslots * slots = kvm_memslots (kvm );
863
864 ____ for (I = 0; I <slots-> nmemslots; ++ I ){
865 ________ struct kvm_memory_slot * memslot = & slots-> memslots [I];
866
867 ________ if (gfn> = memslot-> base_gfn
868 ________ & gfn <memslot-> base_gfn + memslot-> npages)
869 ____________ return memslot;
870 ____}
871 ____ return NULL;
872}
In the code, call kvm_memslots to obtain the slots. The kvm_memslots code is as follows:
255 static inline struct kvm_memslots * kvm_memslots (struct kvm * kvm)
256 {
257 ____ return rcu_dereference_check (kvm-> memslots,
258 ____________ srcu_read_lock_held (& kvm-> srcu)
259 ____________ | lockdep_is_held (& kvm-> slots_lock ));
260}
The essence is return kvm-> memsolts.
The gfn_to_hva_memslot code is as follows:
935 static unsigned long gfn_to_hva_memslot (struct kvm_memory_slot * slot, gfn_t gfn)
936 {
937 ____ return slot-> userspace_addr + (gfn-slot-> base_gfn) * PAGE_SIZE;
938}
From this perspective, the key to gpa to hva is slot-> userspace_addr. In qemu, kvm_set_user_memory_region assigns values through the qemu_safe_ram_ptr function.
The qemu_safe_ram_ptr code is as follows:
2945 void * qemu_safe_ram_ptr (ram_addr_t addr)
2946 {
2947 RAMBlock * block;
2948
2949 QLIST_FOREACH (block, & ram_list.blocks, next ){
2950 if (addr-block-> offset <block-> length ){
2951 return block-> host + (addr-block-> offset );
2952}
2953}
2954
2955 fprintf (stderr, "Bad ram offset %" PRIx64 "\ n", (uint64_t) addr );
2956 ABC ();
2957
2958 return NULL;
2959}
Therefore, you must find the block-> host and assign a value to the qemu qemu_ram_alloc_from_ptr function. In this function, there is such a sentence new_block-> host = qemu_vmalloc (size ); assign an hva address from the host system.

Conclusion:
Based on the above analysis, we can see that calling qemu_ram_alloc in qemu mainly allocates the RAMBlock structure and inserts it into the ram_list.blocks linked list. In essence, it allocates an hva address, put it in the host domain of the RAMBlock structure; call cpu_register_physical_memory to fill the slots domain of the struct kvm structure. In essence, it maps a gha address to the hva address, place hva in slot-> userspace_addr, and place gha in slot-> base_gfn. Quma uses the preceding two functions to map a gha space into an hva space.

3. console display process (based on cirrusfb)
First look at a function Stack:
2 [<4000000080451164>] cirrusfb_imageb1_+ 0xa0/0x284
3 [<400000008043ce5c>] bit_putcs + 0x3dc/0x48c
4 [<400000008046eb8c>] do_update_region + 0x148/0 x1a4
5 [<40000000804705f4>] update_region + 0xb4/0xdc
6 [<40000000804393bc>] fbcon_switch + 0x5b8/0x61c
7 [<4000000080470ef4>] redraw_screen + 0x188/0 x2a8
8 [<4000000080472c84>] take_over_console + 0x368/0 x3cc
9 [<4000000080436030>] fbcon_takeover + 0x108/0x188
10 [<4000000080160204>] notifier_call_chain.isra.1 + 0x40/0x90
11 [<4000000080160540>] _ blocking_notifier_call_chain + 0x48/0x68
12 [<400000008042ee8c>] register_framebuffer + 0x2b0/0x2dc
13 [<400000008010f4b4>] cirrusfb_pci_register + 0x608/0 x6c4
14 [<400000008042550c>] pci_device_probe + 0x60/0 xa0
15 [<4000000080489008>] driver_probe_device + 0x108/0 x1f0
16 [<400000008048915c>] _ driver_attach + 0x6c/0xa4
17 [<40000000804879f8>] bus_for_each_dev + 0x54/0 xa0
18 [<40000000804881ec>] bus_add_driver + 0xf0/0x310
19 [<4000000080489838>] driver_register + 0xe0/0x194
20 [<4000000080426214>] _ pci_register_driver + 0x5c/0x11c
21 [<4000000080886710>] cirrusfb_init + 0x164/0x198
22 [<4000000080870c6>] do_one_initcall + 0xbc/0x204
23 [<4000000080870ecc>] kernel_init + 0x16c/0x244
24 [<40000000801189e8>] kernel_thread_helper + 0x10/0x18
From the function stack, we can see that register_framebuffer will trigger an FB_EVENT_FB_REGISTERED event and call the fbcon_fb_registered function. This function calls fbcon_takeover to take over the console function. Then, the console will call the following function
3281 static const struct consw fb_con = {
3282 ____. owner __________ = THIS_MODULE,
3283 ____. con_startup _______ = fbcon_startup,
3284 ____. con_init ______ = fbcon_init,
3285 ____. con_deinit ________ = fbcon_deinit,
3286 ____. con_clear _____ = fbcon_clear,
3287 ____. con_putc ______ = fbcon_putc,
3288 ____. con_putcs _____ = fbcon_putcs,
3289 ____. con_cursor ________ = fbcon_cursor,
3290 ____. con_scroll ________ = fbcon_scroll,
3291 ____. con_bmove _____ = fbcon_bmove,
3292 ____. con_switch ________ = fbcon_switch,
3293 ____. con_blank _____ = fbcon_blank,
3294 ____. con_font_set ______ = fbcon_set_font,
3295 ____. con_font_get ______ = fbcon_get_font,
3296 ____. con_font_default ___ = fbcon_set_def_font,
3297 ____. con_font_copy _____ = fbcon_copy_font,
3298 ____. con_set_palette ___ = fbcon_set_palette,
3299 ____. con_scrolldelta ___ = fbcon_scrolldelta,
3300 ____. con_set_origin ____ = fbcon_set_origin,
3301 ____. con_invert_region _ = fbcon_invert_region,
3302 ____. con_screen_pos ____ = fbcon_screen_pos,
3303 ____. con_getxy _____ = fbcon_getxy,
3304 ____. con_resize = fbcon_resize,
3305 ____. con_debug_enter ____ = fbcon_debug_enter,
3306 ____. con_debug_leave ____ = fbcon_debug_leave,
3307 };
Take the fbcon_putcs function as an example. The code for further analysis is as follows:
1256 static void fbcon_putcs (struct vc_data * vc, const unsigned short * s,
1257 ________ int count, int ypos, int xpos)
1258 {
1259 ____ struct fb_info * info = registered_fb [con2fb_map [vc-> vc_num];
1260 ____ struct display * p = & fb_display [vc-> vc_num];
1261 ____ struct fbcon_ops * ops = info-> fbcon_par;
1262
1263 ____ if (! Fbcon_is_inactive (vc, info ))
1264 ________ ops-> putcs (vc, info, s, count, real_y (p, ypos), xpos,
1265 ____________ get_color (vc, info, scr_readw (s), 1 ),
1266 ____________ get_color (vc, info, scr_readw (s), 0 ));
1267}
It needs to call info-> fbcon_par-> putcs (the data structure of info is struct fb_info), and info-> fbcon_par is initialized in the fbcon_set_bitops function. The function is as follows:
404 void fbcon_set_bitops (struct fbcon_ops * ops)
405 {
406 ____ ops-> bmove = bit_bmove;
407 ____ ops-> clear = bit_clear;
408 ____ ops-> putcs = bit_putcs;
409 ____ ops-> clear_margins = bit_clear_margins;
410 ____ ops-> cursor = bit_cursor;
411 ____ ops-> update_start = bit_update_start;
412 ____ ops-> rotate_font = NULL;
413
414 ____ if (ops-> rotate)
415 ________ fbcon_set_rotate (ops );
416}
Therefore, bit_putcs will continue to be called, and it will eventually be called to info-> fbops-> fb_imageb.pdf (info, image); (the data structure of info is struct fb_info ), the initialization function of info-> fbops is in cirrusfb_set_fbinfo, which contains info-> fbops = & cirrusfb_ops. In a sentence, the structure of cirrusfb_ops is as follows:
1973 static struct fb_ops cirrusfb_ops = {
1974 ____. Owner ______ = this_module,
1975 ____. fb_open ____ = cirrusfb_open,
1976 ____. fb_release _ = cirrusfb_release,
1977 ____. fb_setcolreg ___ = cirrusfb_setcolreg,
1978 ____. fb_check_var ___ = cirrusfb_check_var,
1979 ____. fb_set_par _ = cirrusfb_set_par,
1980 ____. fb_pan_display = cirrusfb_pan_display,
1981 ____. fb_blank ___ = cirrusfb_blank,
1982 ____. fb_fillrect ____ = cirrusfb_fillrect,
1983 ____. fb_copyarea ____ = cirrusfb_copyarea,
1984 ____. fb_sync ____ = cirrusfb_sync,
1985 ____. fb_imageb1____ = cirrusfb_imageb.pdf,
1986 };
Therefore, the cirrusfb_imageb.pdf function is called.

The above process is a console write operation, and is finally transferred to the cirrusfb_imageb.pdf process in the cirrusfb driver.

Iv. xserver display (based on fbmem)

Common graphics card drivers in xserver usually operate registers directly. The specific operation is to first operate the base address of the MMAP (/dev/MEM) Io space, and then operate registers by adding an offset to the base address.
However, fbmem is an exception. Instead of operating the existing memory, it throws these operations to the kernel through IOCTL (/dev/fb0.
However, when neither of them is accelerated, the framebuffer operation method is the same. The framebuffer region is mapped to the user State through MMAP, and then sent to other code in Xorg to process the ing address.

V. Conclusion

In qemu, The cirrus_linear_writeb function is called many times in the console. The reason why xserver is not called is as follows:

First, under xserver, we map the framebuffer region (a region starting from 0x14000000) to the guest virtual address (by calling the/dev/fb0 MMAP function ), that is to say, all frambuffer operations on xserver use this GVA.
In qema, in map_linear_vram, use cpu_register_physical_memory (S-> VGA. map_addr, S-> VGA. map_end-S-> VGA. map_addr, S-> VGA. vram_offset); (contact the GPA 0x14000000 with an HVA)
Therefore, the entire framebuffer operation on the xserver becomes a memory operation, instead of an I/O operation. The process is GVA-> GPA-> HVA-> HPA and will not return to qemu, of course, it is impossible to access the cirrus_linear_writeb function in qemu.
Again, in the console, the access console operation will eventually call the cirrusfb_imageb.pdf function. In cirrusfb_imageb.pdf, there is such a sentence memcpy (Info-> screen_base, image-> data, size ); among them, info-> screen_base is 0x14000000remap (IO space), so it will return to qemu and call cirrus_linear_writeb.
Finally, why do we focus so much on the cirrus_linear_writeb function, because the operation of framebuffer in qemu is manifested in S-> VGA. write or from S-> VGA in vram_ptr. vram_ptr read, (S-> VGA. vram_ptr is what we call HVA). The value is 2242 S-> vram_offset = qemu_ram_alloc (null, "VGA. VRAM ", vga_ram_size );
2243 S-> vram_ptr = qemu_get_ram_ptr (S-> vram_offset. Only in the cirrus_linear_writeb function is the path to s-> VGA. after writing the HVA vram_ptr, mark the region through cpu_physical_memory_set_dirty. On our update screen, the dirty region is our updated judgment condition.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.