(Technical Analysis) KVM virtualization Principle

Source: Internet
Author: User
Vmcs Structure
Vmcs is a data structure maintained in the memory. It contains the contents of the registers related to the virtual CPU and the control information related to the virtual CPU. Each vmcs corresponds to a virtual CPU.
Vmcs must be bound to the physical CPU during use. At any given time, the vmcs and the physical CPU are bound one to one, that is, one physical CPU can only be bound to one vmcs, and one vmcs can only be bound to one physical CPU. Vmcs can be bound to different physical CPUs at different times. For example, if a vmcs is bound to physical cpu1 and unbound at a certain time, and re-bound to the physical cpu2. this binding relationship change is called the migration of vmcs.
VT-X provides two commands for binding and unbinding vmcs.
Vmptrld <vmcs address>: bind the specified vmcs to the physical CPU that runs the command.
Vmclear: unbind the physical CPU that runs the command from its vmcs. This command synchronizes the vmcs structure in the physical CPU cache to the memory to ensure that the value in the memory is the latest when the vmcs and the new physical CPU are bound.
VT-X defines the specific format and content of vmcs. It is required to be a memory block of no more than 4 kb and must be 4 kb aligned. The format of vmcs. The fields are described as follows:
The offset 0 is the vmcs version ID, indicating the version number of the vmcs data format.
The value of offset 4 indicates that vmx is aborted. If VM-exit fails to be executed, vmx is aborted, And the CPU is saved to vmx for debugging.
The format of the vmcs data field is CPU-related when the offset is 8. Different types of CPUs may use different formats. The specific format is determined by the vmcs version ID.
The main information of vmcs is stored in the vmcs data domain. VT-X provides two instructions for accessing vmcs.
Vmread <index>: Read the domain specified by the vmcs index.
Vmwrite <index> <DATA>: Write the domain specified by the vmcs index.
VT-X defines indexes for each field in the vmcs data field. The preceding two commands can also directly access each field in the vmcs data field.
Specifically, the vmcs data domain contains the following six categories of information.

  1. Guest-State (client state field): saves the CPU status when the client is running, that is, the non-root mode. When VM-exit occurs, the CPU saves the current status to the client state domain. When VM-entry occurs, the CPU restores from the client state domain.
  2. Host-State (host State domain): saves the CPU status in the root mode when the vmm is running. When VM-exit occurs, the CPU returns to the CPU status from this domain.
  3. VM-entry control domain: controls the processor behavior during VM-entry.
  4. VM-execution control domain: control the processor's behavior in the vmx non-root mode. Typically, it can control certain conditions to trigger the VM-exit event and enable certain virtualization functions of vmx, for example, APIC virtualization and EPT mechanism.
  5. VM-exit control domain: controls the processor behavior when VM-Exi occurs.
  6. VM-exit information domain: Provides the cause and details of the VM-exit event. vmm uses this information to determine how to manage and control the VM. The VM-exit information domain is only read-only.

    Detailed analysis of each domain in vmcs:

    VM-execution control field
    Virtual_processor_id = 0x00000000 ,/Secondary_exec_enable_vpid is 1, valid, and provides 16-bit vpid/
    Posted_intr_nv = 0x00000002 ,/Valid when pin_based_posted_intr is 1/
    Io_bitmap_a = 0x00002000 ,/This field takes effect when cpu_based_use_io_bitmaps is enabled./
    Io_bitmap_a_high = 0x00002001,
    Io_bitmap_ B = 0x00002002,
    Io_bitmap_ B _high = 0x00002003,
    /Valid when cpu_based_use_msr_bitmaps is set to 1. When a value of 1 is set, MSR corresponding to this bit will generate VM-exit, and MSR bitmap is set to 4 K,
    The low-half-read bitmap corresponds to the MSR range from 00000000h to 00001 fffh, which is used to control MSR read access;
    The high-half-read bitmap corresponds to the MSR range from c0000000h to c0001fffh, which is used to control MSR read access;
    The write bitmap of the lower half is used to control write access to the MSR, and the corresponding MSR ranges from milliseconds h to 00001 fffh;
    Write bitmap in the upper half, corresponding to MSR range from c0000000h to c0001fffh, is used to control MSR write access;
    When a value of MSR bitmap is 0, MSR corresponding to this bit does not generate VM-exit
    Msr_bitmap = 0x00002004,
    Msr_bitmap_high = 0x00002005,
    Excutive_vmcsp = 0x0000200c,
    Excutive_vmcsp_high = 0x0000200d,
    /When cpu_based_use_tsc_offseting is set to 1, this field provides a 64-bit offset value and executes the rdtsc, rdtscp, and rdmsr commands.
    When reading TSC, the returned value is TSC + TSC offset.
    Tsc_offset = 0x00002010,
    Tsc_offset_high = 0x00002011,
    /This field is valid when cpu_based_tpr_shadow is set to 1. a page with a physical address as 4 K is required./
    Virtual_apic_page_addr = 0x00002012 ,/Virtual-APIC address (full)/
    Virtual_apic_page_addr_high = 0x00002013 ,/Virtual-APIC address (high)/
    /When secondary_exec_virtualize_apic_accesses is 1, this field is valid and requires a physical address.
    4 K page
    Apic_access_addr = 0x00002014 ,/APIC-access address (full)/
    Apic_access_addr_high = 0x00002015 ,/APIC-access address (high)/
    Posted_intr_desc_addr = 0x00002016,
    Posted_intr_desc_addr_high = 0x00002017,

    /When secondary_exec_enable_ept is 1, the final physical address of the guest end can be converted to the host end.
    Bit2: 0 indicates the memory type (UC or WB) of the EPT paging-structure; bit5: 3 indicates the structure level of the EPT page table. Adding 1 to this value is the real level;
    Bit6 = 1 indicates that the access and dirty flag bits in the structure items of the EPT page are valid (bit8: 9 of the EPT table item). The processor will update these two flags of the EPT table item.
    Bit N-1: 12 provides the physical address of the EPT pml4t table.
    The EPT page table is loaded into a special EPT page table pointer register eptp. The EPT ing mechanism of the EPT page table to the address is the same as that of the client page table to the address.
    Ept_pointer = 0x0000201a ,/EPT pointer (eptp; full)/
    Ept_pointer_high = 0x0000201b ,/EPT pointer (eptp; high)/

    /When secondary_exec_virtual_intr_delivery is 1, this field is valid and used to control whether The EOI command is sent
    VM-exit is generated. If the corresponding bit is 1, Vm-exit is generated.
    Eoi_exit_bitmap0 = 0x0000201c ,/Corresponding Vector Numbers from 0 h to 3fh/
    Eoi_exit_bitmap0_high = 0x0000201d,
    Eoi_exit_bitmap1 = 0x0000201e ,/Corresponding Vector Numbers From 40h to 7fh/
    Eoi_exit_bitmaphigh high = 0x0000201f,
    Eoi_exit_bitmap2 = 0x00002020 ,/Corresponding Vector Numbers from 80 h to BFH/
    Eoi_exit_bitmap2_high = 0x00002021,
    Eoi_exit_bitmap3 = 0x00002022 ,/Corresponding Vector Numbers From c0h to FFH/
    Eoi_exit_bitmap3_high = 0x00002023,
    /Vmcs shadowing bitmap addresses/
    Vmread_bitmap = 0x00002026,
    Vmwrite_bitmap = 0x00002028,

    /Bit0 = 1 VM-exit occurs when an external interruption occurs; bit2: 1 Reserved Bit, fixed to 1;
    Bit3 = 1 VM-exit occurs when NMI occurs; bit4 Reserved Bit, fixed to 1;
    Bit5 = 1 Define virtual NMI; bit6 = 1 enable vmx-preemption timer;
    Bit7 = 1 enable posted-interrupt processing mechanism to handle Virtual interruptions;
    Bit31: 8 reserved bits, fixed to 0
    Pin_based_vm_exec_control = 0x00004000 ,/Pin-based VM-execution controls/

    /Bit0 reserved bits, fixed to 0; bit1 reserved bits, fixed to 1;
    When bit2 = 1 is in if = 1 and the interruption is not blocked, the VM-exit is generated. When bit3 = 1 reads the TSC value, the TSC value plus the offset value is returned;
    Bit6: 4 reserved value, fixed to 1; bit7 = 1 execute hlt command to generate VM-exit; bit8 reserved value, fixed to 1;
    Bit9 = 1 execute the invlpg command to generate VM-exit; bit10 = 1 execute the mwait command to generate VM-exit;
    Bit11 = 1 execute rdpmc command to generate VM-exit; bit12 = 1 execute rdtsc command to generate VM-exit; bit14: 13 Keep value, fixed to 1;
    Bit15 = 1 Write the Cr 3 register to generate VM-exit; bit16 = 1 read the Cr 3 register to generate VM-exit; bit18: 17 reserved value, fixed to 1;
    Bit19 = 1 Write the cr8 register to generate VM-exit; bit20 = 1 read the cr8 register to generate VM-exit; bit21 = 1 enable virtual-APIC page virtualization local APIC;
    Bit22 = 1 VM-exit is generated when Virtual-NMI window is opened; bit23 = 1 VM-exit is generated when Dr registers are read and written;
    Bit24 = 1 execute in/out or INS/outs class command to generate VM-exit; bit25 = 1 enable I/O bitmap; bit26 Reserved Bit, fixed to 1;
    Bit27 = 1 enable MTF debugging; bit28 = 1 enable MSR bitmap; bit29 = 1 execute the monitor command to generate VM-exit;
    Bit30 = 1 execute the pause command to generate VM-exit; bit31 = 1 secondary processor-based VM-execution controls field is valid
    Cpu_based_vm_exec_control = 0x00004002 ,/Primary processor-based VM-execution controls/

    /The prediction_bitmap field is a 32-bit value, each corresponding to an exception vector. In vmx non-root, if an exception occurs, the processor checks the corresponding bit of prediction_bitmap, if this bit is 1, Vm-exit is generated. When it is 0, the exception handling routine is executed through guest-IDT. When triple-fault occurs, Vm-exit is directly generated./
    Prediction_bitmap = 0x00004004 ,/Exception bitmap, exception Control/
    Page_fault_error_code_mask = 0x00004006,
    Page_fault_error_code_match = 0x00004008,
    /The maximum value is 4./
    Cr3_target_count = 0x0000400a,

/When cpu_based_tpr_shadow is set to 1, this field is valid and provides a threshold for the interrupt priority. If it is lower than this value, Vm-exit/
Tpr_threshold = 0x0000401c,

/* Bit0 = 1 virtualization access APIC-access page; bit1 = 1 enable EPT; bit2 = 1 access GDTR, ldtr, idtr, tr

Generate VM-exit;
Bit3 = 0 execute rdtscp command to generate # UD exception; bit4 = 1 virtualization access x2apic MSR; bit5 = 1 enable vpid mechanism;
Bit6 = 1 execute the wbinvd command to generate VM-exit; bit7 = 1guest can use non-Paging protection mode or real mode;
Bit8 = 1 supports access to virtual registers in the virtual-APIC page; bit9 = 1 supports virtual interrupt delivery;
Bit10 = 1 determines whether the pasue command generates VM-exit; bit11 = 1 executes the rdrand command to generate VM-exit;
Bit12 = 1 run the invpcid command to generate # UD exception; bit13 = 1vmx non-root operation can execute the vmfunc command;
Bit31: 14 Reserved bits, fixed to 0/
Secondary_vm_exec_control = 0x0000401e ,/
Secondary processor-based VM-execution controls */
Ple_gap = 0x00004020,
Ple_window = 0x00004022,

/When the bit is 1, it indicates that the bit belongs to the host. If the bit is 0, it indicates that the bit guest has the right to set/
Cr0_guest_host_mask = 0x00006000 ,/Accelerating client to write Cr0 commands/
Cr4_guest_host_mask = 0x00006002,
Cr0_read_shadow = 0x00006004 ,/Accelerating the reading of Cr0 commands by the client/
Cr4_read_shadow = 0x00006006,
Cr3_target_value0 = 0x00006008,
Cr3_target_value1 = 0x0000600a,
Cr3_target_value2 = 0x0000600c,
Cr3_target_value3 = 0x0000600e,

VM-entry control field
Vm_entry_msr_load_addr = 0x0000200a,
Vm_entry_msr_load_addr_high = 0x0000200b,

/* Bit1: 0 Reserved Bit, fixed to 1; bit2 = 1 loaded debug register; bit8: 3 Reserved Bit, fixed to 1; bit9 = 1 into the IA-32e mode; bit10 = 1 Enter SMM mode; bit11 = 1 return executive monitor, disable SMM dual monitoring processing; bit12 Reserved Bit, fixed to 1; bit13 = 1 Load ia32_perf_global_ctrl; bit14 = 1 Load ia32_pat; bit15 = 1 Load ia32_efer; bit31: 16 reserved value, fixed to 0 */

Vm_entry_controls = 0x00004012 ,/VM-entry controls, controlled by the register msr_ia32_vmx_entry_ctls/
Vm_entry_msr_load_count = 0x00004014,

/* Bit7: 0 interrupt or exception vector number; bit10: 8 interruption type: 0: External Interrupt 1: Reserved 2: Non-Maskable Interrupt (NMI) 3: hardware exception 4: software interrupt 5: privileged software exception 6: Software exception 7: Other event bit11 = 1 indicates that an error code must be submitted; bit30: 12 reserved bits; bit31 = 1 indicates that the vm_entry_intr_info_field field is valid */

Vm_entry_intr_info_field = 0x00004016 ,/Event Injection Control Field/
Vm_entry_exception_error_code = 0x00004018 ,/VM-entry Exception error code/
Vm_entry_instruction_len = 0x0000401a ,/VM-entry instruction Length/

VM-Exit Control Field
Vm_exit_msr_store_addr = 0x00002006,
Vm_exit_msr_store_addr_high = 0x00002007,
Vm_exit_msr_load_addr = 0x00002008,
Vm_exit_msr_load_addr_high = 0x00002009,
/Bit1: 0 reserved value, fixed to 1; bit2 = 1 saved debug register; bit8: 3 Reserved value, fixed to 1; bit9 = 1 returned
IA-32e Mode;
Bit11: 10 reserved value, fixed to 1; bit12 = 1 Load ia32_perf_global_ctrl; bit14: 13 Reserved value, fixed to 1;
When bit15 = 1vm-exit, the processor responds to the interrupt controller and reads the interrupt vector number. The value of bit17: 16 is invariably set to 1;
Bit18 = 1 save ia32_pat; bit19 = 1 Load ia32_pat; bit20 = 1 save ia32_efer; bit21 = 1 Load ia32_efer;
When bit22 = 1vm-exit, the vmx timer Count value is saved; When bit31: 23, the reserved value is fixed to 0.
Vm_exit_controls = 0x0000400c ,/VM-Exit Controls/
Vm_exit_msr_store_count = 0x0000400e,
Vm_exit_msr_load_count = 0x00004010,

VM-exit information fields
Vm_instruction_error = 0x00004400 ,/Command failure class/
/Basic Information/
Guest_physical_address = 0x00002400 ,/Guest-physical address is saved due to EPT violation or/
Guest_physical_address_high = 0x00002401 ,/GPA value at VM-exit caused by EPT misconfiguration failure/
Vm_exit_reason = 0x00004402 ,/Exit reason/
Exit_qualification = 0x00006400 ,/The reason for executing the VM-exit command. The format of this field varies depending on the command./
Guest_linear_address = 0x0000640a ,/Stores the linear address values of some events that cause VM-exit./
/Direct vector event class/
Vm_exit_intr_info = 0x00004404 ,/VM-exit interruption information cause of VM exit/
Vm_exit_intr_error_code = 0x00004406,
/Indirect vector event information field/
Idt_vectoring_info_field = 0x00004408,
Idt_vectoring_error_code = 0x00000000a,
/Command Information/
Vm_exit_instruction_len = 0x00000000c,
Vmx_instruction_info = 0x00000000e,
/End VM-exit Information Field/
/Start guest-state region Field/
Guest_dr7 = 0x0000681a ,/Debug register/
Guest_rsp = 0x0000681c ,/Stack pointer/
Guest_rip = 0x0000681e ,/Command pointer/
Guest_rflags = 0x00006820 ,/Mark register/
/Control Register/
Guest_cr0 = 0x00006800,
Guest_32a = 0x00006802,
Guest_cr4 = 0x00006804,
/6 data/code segment register fields: Es, Cs, SS, DS, FS, GS, and 2 system segment registers, which are
Ldtr and TR registers.
Each segment register has four fields, which respectively describe the fields of the segment register:
Selector: 16-bit field; base: 64-bit system: 64-bit; otherwise, 32-bit;
Limit: 32-bit; Access Right: 32-bit
Access Right Field Format:
Bit3: 0 type segment type value; bit4 0 = system, 1 = code/data; bit6: 5 segment access permission;
Bit7: 0 = No present, 1 = present; bit11: 8 reserved; bit12 system software available;
Bit13 is the L flag in IA-32e mode and the Reserved Bit in legacy; bit14 is the default operand size, 0 = 16 bits, 1 = 32 bits;
Bit15 segment limit granularity, 0 = 1 byte, 1 = 4kb; bit16 0 = usable, 1 = unusable; bit31: 17 reserved
Guest_es_selector = 0x00000800,
Guest_es_limit = 0x00004800,
Guest_es_ar_bytes = 0x00004814,
Guest_es_base = 0x00006806,
Guest_cs_selector = 0x00000802,
Guest_cs_limit = 0x00004802,
Guest_cs_ar_bytes = 0x00004816,
Guest_cs_base = 0x00006808,
Guest_ss_selector = 0x00000804,
Guest_ss_limit = 0x00004804,
Guest_ss_ar_bytes = 0x00004818,
Guest_ss_base = 0x0000680a,
Guest_ds_selector = 0x00000806,
Guest_ds_limit = 0x00004806,
Guest_ds_ar_bytes = 0x0000481a,
Guest_ds_base = 0x0000680c,
Guest_fs_selector = 0x00000808,
Guest_fs_limit = 0x00004808,
Guest_fs_ar_bytes = 0x0000481c,
Guest_fs_base = 0x0000680e,
Guest_gs_selector = 0x0000080a,
Guest_gs_limit = 0x0000480a,
Guest_gs_ar_bytes = 0x0000481e,
Guest_gs_base = 0x00006810,
/Ldtr Local Descriptor Table register, command lldt command loaded to ldtr/
Guest_ldtr_selector = 0x0000080c,
Guest_ldtr_limit = 0x0000480c,
Guest_ldtr_ar_bytes = 0x00004820,
Guest_ldtr_base = 0x00006812,
/Tr task register/
Guest_tr_selector = 0x0000080e,
Guest_tr_limit = 0x0000480e,
Guest_tr_ar_bytes = 0x00004822,
Guest_tr_base = 0x00006814,
/Two descriptor registers, GDTR and idtr. it consists of two fields: Base: providing the base address of the descriptor table; Limit: providing the length of the descriptor table. GDTR Global Descriptor Table register. The lgdt command loads the gdt entry address into this register./
Guest_gdtr_limit = 0x00004810,
Guest_gdtr_base = 0x00006816,
/Idtr Interrupt Descriptor Table register/
Guest_idtr_limit = 0x00004812,
Guest_idtr_base = 0x00006818,
Guest_ia32_debugctl = 0x00002802,
Guest_ia32_debugctl_high = 0x00002803,
Guest_ia32_pat = 0x00002804,
Guest_ia32_pat_high = 0x00002805,
Guest_ia32_efer = 0x00002806,
Guest_ia32_efer_high = 0x00002807,
Guest_ia32_perf_global_ctrl = 0x00002808,
Guest_ia32_perf_global_ctrl_high = 0x00002809,
Guest_sysenter_cs = 0x0000482a,
Guest_sysenter_esp = 0x00006824,
Guest_sysenter_eip = 0x00006826,
Non-register Fields
Guest_intr_status = 0x00000810 ,/Indicates the status of virtual interruption./
Vmcs_link_pointer = 0x00002800,
Vmcs_link_pointer_high = 0x00002801,
Guest_pdptr0 = 0x0000280a ,/Enable fields used by EPT/
Guest_pdptr0_high = 0x0000280b,
Guest_pdptr1 = 0x0000280c,
Guest_pdptr1_high = 0x0000280d,
Guest_pdptr2 = 0x0000280e,
Guest_pdptr2_high = 0x0000280f,
Guest_pdptr3 = 0x00002810,
Guest_pdptr3_high = 0x00002811,
Guest_activity_state = 0x00004826 ,/Guest-State indicates that the virtual machine enters/exits, and the virtual processor is active./
Guest_interruptibility_info = 0x00004824 ,/Testability of the current virtual processor/
Vmx_preemption_timer_value = 0x0000482e,
Guest_pending_dbg_exceptions = 0x00006822 ,/Pending debug exceptions/

Host-state region Field
Host_rsp = 0x00006c14 ,/Stack pointer/
Host_rip = 0x00006c16 ,/Command pointer/
/Control Register/
Host_cr0 = 0x00006c00,
Host_32a = 0x00006c02,
Host_cr4 = 0x00006c04,
/Segment Selection Register/
Host_es_selector = 0x00000c00,
Host_cs_selector = 0x00000c02,
Host_ss_selector = 0x00000c04,
Host_ds_selector = 0x00000c06,
Host_fs_selector = 0x00000c08,
Host_gs_selector = 0x00000c0a,
Host_tr_selector = 0x00000c0c,
/Segment Base Address Register/
Host_fs_base = 0x00006c06,
Host_gs_base = 0x00006c08,
Host_tr_base = 0x00006c0a,
Host_gdtr_base = 0x00006c0c,
Host_idtr_base = 0x00006c0e,
/MSR register/
Host_ia32_pat = 0x00002c00,
Host_ia32_pat_high = 0x00002c01,
Host_ia32_efer = 0x00002c02,
Host_ia32_efer_high = 0x00002c03,
Host_ia32_perf_global_ctrl = 0x00002c04,
Host_ia32_perf_global_ctrl_high = 0x00002c05,
Host_ia32_sysenter_cs = 0x00004c00,
Host_ia32_sysenter_esp = 0x00006c10,
Host_ia32_sysenter_eip = 0x00006c12,

(Technical Analysis) KVM virtualization Principle

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.