English Original: https://lwn.net/Articles/658511/. This article adds some own understanding on the basis of translation.
QEMU, Virtual box, VMware, Xen are VMS, the average user is exposed to more virtual box and VMware, are used in Ubuntu running Windows, or Windows running Ubuntu.
QEMU is actually the most popular open source simulator, which can simulate x86, ARM, MIPS, and so on, and can use hardware acceleration, such as the HAXM under the Linux KVM and Windows and Mac. These hardware accelerations are also based on Initel Vt-x, Intel vt-d, and AMD's technology, which provides the Vcpus, as well as the hardware's Shadow page table (Intel EPT), which greatly reduces the effort of the QEMU software simulation.
Virtual BOX,QEMU-KVM is used by QEMU, but it only uses its device emulation capabilities. QEMU simulates the GPU with a slag, so the QEMU-based Android emulator itself implements the Opengles qemu pipe, drawing using OpenGL on the host computer.
Xen is used more in cloud computing and is not described here in detail. Other emulators are basically a process that runs on top of a normal operating system, with each core being one of the threads.
This article describes the use of KVM, in the Intel platform under the ubuntu12.04 to implement a simplest simulator, calculate the results of the result and output through the IO port.
A description of the KVM API in the kernel can be seen: Documentation/virtual/kvm/api.txt, other documents: documentation/virtual/kvm/. The full source: https://lwn.net/Articles/658512/.
With KVM's true virtual machine, many virtual devices and firmware are simulated, as well as complex initialization states (initialization of individual devices, initialization of CPU registers, etc.), and initialization of memory. The simulator described in this article will use the following 16bit x86 code (why 16bit, because the x86 is real mode, working at 16bit, and then switching to 32bit protection mode):
mov $0x3f8,%dx add%bl,%al add $ ' 0 ',%al out%al, (%DX) mov $ ' \ n ',%al out%al, (%DX) hlt
This code acts as the guest OS, basically a bare-ben system. It implements the 0x3f8, then adds ' 0 ', turns 4 to ASCII ' 4 ' and outputs through the port. And then output the ' \ n ' and then shut down the machine.
We put the binary that corresponds to the code into the array:
Const uint8_t code[] = {0XBA, 0xf8, 0x03,/* mov $0x3f8,%dx */0x00, 0xd8,/ * add%bl,%al */0x04, ' 0 ',/ * add $ ' 0 ',%al */0xee, /* out%al, (%DX) */0xb0, ' \ n ',/ * mov $ ' \ n ',%al */0xee, /* out%al, (%DX) */0xf4,/ * HLT */ };
How do you get these machine codes?
[Email protected]:~$ cat simple_os.asm mov $0x3f8,%dx add%bl,%al add $ ' 0 ',%al out%al, (%DX) mov $ ' \ n ',%al out%al, (%DX) hlt[email protected]:~$ as-o simple_os.o simple_os.asm[email protected]:~$ objdump-d
SIMPLE_OS.OSIMPLE_OS.O: file format elf64-x86-64disassembly of section. text:0000000000000000 <.text>: 0:66 ba f8 mov $0x3f8,%dx 4:00pm D8 add %bl,%al 6:04 add $0x30,%al 8:ee out%al, (%DX) 9:b0 0a mov $0xa,%al b:ee out%al, (%DX) C : F4 Hlt
The assembly instructions can be viewed on this page, along with the corresponding machine code: http://x86.renejeschke.de/
Note that there is a 0x66 at the beginning, explained as follows:
Http://wiki.osdev.org/X86-64_Instruction_Encoding inside the Prefix Group 3
So we need to add the. Code16 at the beginning of the Simple_os.asm file, which is the right thing to do, but the objdump is wrong and needs to be used:
[Email protected]:~$ objdump-d-mintel,i8086 simple_os.osimple_os.o: file format elf64-x86-64disassembly of section. text:0000000000000000 <.text>: 0:ba F8, mov dx,0x3f8, d8 add al,bl 5:04 Add al,0x30 7:ee out dx,al 8:b0 0a mov al,0xa a:ee Out dx,al b:f4 hlt
Https://sourceware.org/binutils/docs/as/i386_002d16bit.html
Http://stackoverflow.com/questions/1737095/how-do-i-disassemble-raw-x86-code
We will put this code in the virtual physical memory, which is the second page of the GPA (guest Physical address) (to avoid conflicting with a non-existent real-mode interrupt Descriptor table at address 0 prevents interrupt vector table collisions with real mode. Al and BL are initialized to 2,cs initialized to 0,ip to point to the starting position of the second page 0x1000.
In addition, we also have a virtual serial device, the port is 0x3f8,8bit, for the output character.
In order to implement a virtual machine, we first need to open/DEV/KVM:
KVM = open ("/DEV/KVM", O_RDWR | O_CLOEXEC);
Before using KVM, it is necessary to use the Kvm_get_api_version ioctl () to check that the KVM version is correct, and to see if it is api12, to continue running:
RET = IOCTL (KVM, kvm_get_api_version, NULL); if (ret = =-1) Err (1, "kvm_get_api_version"); if (ret! =) Errx (1, "kvm_get_api_version%d, expected", ret);
After checking the API version, you can use the Kvm_check_extension ioctl () to check whether other extensions are available, such as kvm_set_user_memory_region, to check whether the KVM supports the Hardware Shadow page table ( http://royluo.org/2016/03/13/kvm-mmu-virtualization/):
RET = IOCTL (KVM, kvm_check_extension, kvm_cap_user_memory); if (ret = =-1) Err (1, "kvm_check_extension"); if (!ret) Errx (1, "Required extension Kvm_cap_user_mem not available");
Then create a virtual machine VM, which is associated with memory, device, and all Vcpus, and corresponds to a process in the host system:
VMFD = IOCTL (KVM, KVM_CREATE_VM, (unsigned long) 0);
Virtual machines require some virtual physical memory to hold the guest OS. When the guest OS memory access, if the missing pages, KVM will be based on the kvm_set_user_memory_region settings, to try to solve the problem of missing pages, if the KVM can not be resolved, will exit, the reason for the exit is Kvm_exit_mmio, The device is then simulated by qemu or something else (Android QEMU-KVM memory management and IO mapping).
We first request a page of memory in host, and then copy the guest OS bare-Ben code to the past:
MEM = mmap (NULL, 0x1000, Prot_read | Prot_write, map_shared | Map_anonymous,-1, 0); memcpy (mem, Code, sizeof (code));
Then we need to tell KVM using the Kvm_set_user_memory_region IOCTL () to map the memory of the host virtual space and the guest OS virtual physical memory:
struct Kvm_userspace_memory_region region = {. Slot = 0,.guest_phys_addr = 0x1000,.memory_size = 0x1000,.userspace_addr = ( uint64_t) mem, }; IOCTL (VMFD, kvm_set_user_memory_region, ®ion);
Thus, when the guest OS accesses the 0x1000~0x2000 of virtual physical memory, KVM accesses the actual physical memory corresponding to the mem directly.
Now that we have a virtual machine VM with some virtual physical memory and the guest OS code in the memory, we need to add a kernel (Vcpus) to the virtual machine and one thread. Of course you can also multicore (Vcpus, call multiple Kvm_create_vcpu):
VCPUFD = IOCTL (VMFD, KVM_CREATE_VCPU, (unsigned long) 0);
Each vcpus is associated with a kvm_run structure, Kvm_run is used for synchronization of kernel State and user-state information, such as the reason for obtaining a kernel-state KVM exit from a user-configured virtual machine, Kvm_exit_mmio, Kvm_exit_io, and so on. Get the size of the Kvm_run struct first, then allocate the memory and bind to the Vcpus:
Mmap_size = IOCTL (KVM, Kvm_get_vcpu_mmap_size, NULL); run = mmap (null, mmap_size, Prot_read | Prot_write, map_shared, VCPUFD, 0);
There are also processor registers in the Vcpus state, divided into two groups, struct kvm_regs and struct kvm_sregs, we need to set the CS,AL,BL,IP and other registers:
IOCTL (VCPUFD, Kvm_get_sregs, &sregs); sregs.cs.base = 0; Sregs.cs.selector = 0; IOCTL (VCPUFD, Kvm_set_sregs, &sregs);
struct Kvm_regs regs = {. RIP = 0x1000,.rax = 2,.RBX = 2,.rflags = 0x2, }; IOCTL (VCPUFD, Kvm_set_regs, ®s);
Well, everything is ready and we can start running Vcpus:
while (1) {IOCTL (VCPUFD, Kvm_run, NULL); switch (Run->exit_reason) {/* Handle exit */} }
We need to handle the KVM exit status according to Run->exit_reason, such as the guest shutdown:
Case KVM_EXIT_HLT: puts ("kvm_exit_hlt"); return 0;
Initialization failed:
Case Kvm_exit_fail_entry: errx (1, "Kvm_exit_fail_entry:hardware_entry_failure_reason = 0x%llx", (unsigned long Long) Run->fail_entry.hardware_entry_failure_reason); Case Kvm_exit_internal_error: errx (1, "kvm_exit_internal_error:suberror = 0x%x", run->internal.suberror );
And the simulator that needs to be carried out, here, there is only one port for the 0X3F8 serial device. The effect of simulating a device is to print out the characters:
Case KVM_EXIT_IO: if (run->io.direction = = Kvm_exit_io_out && run->io.size = = 1 && Run->io.port = = 0x3f8 && run->io.count = = 1) putchar (* (((char *) run) + run->io.data_offset); Elseerrx (1, "Unhandled Kvm_exit_io"); Break
Test results:
[Email protected]:~/desktop$ gcc-o kvmtest kvmtest.c[email protected]:~/desktop$./kvmtest 4KVM_EXIT_HLT
In QEMU-KVM, QEMU's main task is to Kvm_exit_io, simulate the virtual device after Kvm_exit_mmio, and initialize the device before Kvm_run set it up.
The simplest virtual machine using the KVM IOCTL interface in the ubuntu12.04 environment