Summary
Docker is a new virtualization tool in recent years, and it can be used as a virtual machine to isolate resource and system environments. This article will mainly according to the research report published by IBM, discusses the difference between Docker and traditional virtualization, and compares the performance difference of physical machine, Docker container and virtual machine, and the principle of difference producing.
comparison between Docker and virtual machine implementation principle
The following diagram is the implementation framework of the virtual machine and Docker respectively.
Comparing the difference of two graphs, the Guest OS layer and the hypervisor layer of the left figure virtual machine are replaced by the Docker engine layer in the Docker. The guest OS of the virtual machine is the operating system installed by the virtual machine, which is a full operating system kernel, and the hypervisor layer of the virtual machine can be simply understood as a hardware virtualization platform, which exists in the host OS as a kernel-state driver.
Virtual machines implement resource isolation by leveraging a stand-alone OS and using Hypervisor virtualization CPU, memory, IO devices, and so on. For example, for virtual cpu,hypervisor to create a data structure for each virtual CPU, simulate the value of all registers of the CPU, and track and modify these values when appropriate. It should be noted that in most cases, the virtual machine software code is run directly on the hardware without the need for hypervisor intervention. Only with some high permission requests, the Guest OS needs to run kernel state to modify the CPU's register data, hypervisor will intervene, modify and maintain the virtual CPU state.
The way to hypervisor virtualization memory is to create a shadow page table. Normally, a page table can be used to implement translations from virtual memory to physical memory. In the case of virtualization, because the so-called physical memory is still virtual, so shadow Page table: Virtual memory-> Virtual physical memory-> real physical memory.
For IO device virtualization, when hypervisor receives page fault and discovers that the virtual physical memory address corresponds to an I/O device, hypervisor simulates the device's work with software and returns it. For example, when the CPU wants to write a disk, hypervisor writes the corresponding data to a file on the host OS, which actually simulates the virtual disk.
Compared to the virtual machine to achieve the resource and environment isolation scheme, Docker is much more concise. Docker engine can be viewed as a simple encapsulation of Linux namespace, Cgroup, and mirroring management file system operations. Docker does not use a completely independent guest OS to implement environmental isolation as a virtual machine, using the containers that the Linux kernel itself supports to achieve resource and environment isolation.simply put, the Docker uses the namespace to realize the isolation of the system environment, uses the Cgroup to realize the resource restriction, and uses the mirror to implement the root directory environment isolation.
By comparing the principles of Docker and virtual machines, we can draw some conclusions:
(1) Docker has a less abstract layer than a virtual machine. Because the Docker does not need hypervisor to realize the hardware resource virtualization, the program running on the Docker container directly uses the hardware resources of the actual physical machine. As a result, Docker will have an advantage in efficiency over CPU, memory utilization, and specific efficiency comparisons are given in the next few sections. In IO device virtualization, there are several scenarios for Docker mirroring management, such as the use of AUFS file systems or device Mapper to achieve Docker file management, and the efficiencies of various implementations are slightly different.
(2) Docker utilizes the host's kernel without the need for the guest OS. Therefore, when you create a new container, Docker does not need to reload an operating system kernel like the virtual machine. We know that booting and loading the operating system kernel is a time-consuming process, and when a new virtual machine is created, the VM software needs to load the guest OS, which is the minute level of the new process. Docker because of the direct use of the host's operating system, the process is omitted, so it only takes a few seconds to create a new Docker container. In addition, modern operating system is a complex system, in a physical machine to add a new operating system of the resource overhead is relatively large, therefore, docker comparison of virtual machines in the resource consumption also occupies a greater advantage. In fact, we can easily build hundreds of containers on a physical machine, but only a few virtual machines.Comparison of computational efficiency between Docker and virtual machines
In the previous section, we inferred from the principle that Docker should be more efficient than virtual machines in terms of CPU and memory utilization. In this section we will analyze the data given by the IBM paper. The following data are measured at the IBM x3650 M4 Server, and the main hardware parameters are:
(1) 2 intel® Xeon e5-2655 processors with a frequency of 2.4-3.0 GHz. Each processor has 8 cores, so there are a total of 16 cores.
(2) 256 GB RAM.
In the test, it is through the Operation Linpack program to obtain the computational capability data. The results are shown in the following illustration:
From left to right are the computational power data for physical machines, Docker, and virtual machines. It can be seen that the computational power of Docker is almost no loss relative to the physical machine, while the virtual machine has a very obvious loss in comparison with physical machines. Virtual machines have a loss of computing power at around 50%.
Why is there such a large performance loss? On the one hand, because the virtual machine adds a layer of virtual hardware layer, the application running on the virtual machine is running on the hypervisor virtual CPU in the numerical calculation, and on the other hand, the difference caused by the characteristic of the calculation program itself. Virtual machine Virtual CPU architecture differs from the actual CPU architecture, the numerical calculation program generally for specific CPU architecture has some optimization measures, virtualization makes these measures void, and even play a reverse effect. For example, for this experiment platform, the actual CPU architecture is 2 physical CPU, each CPU has 16 cores, a total of 32 cores, the use of NUMA architecture, while the virtual machine to virtual CPU into a piece with 32 cores of the CPU. This leads to the calculation program in the calculation of the actual CPU structure can not be optimized, greatly reducing the computational efficiency. comparison of Docker and virtual machine memory access efficiency
Memory access efficiency is relatively more complex, mainly memory access has a variety of scenarios:
(1) Large-volume, continuous address block of memory data read and write. The performance data obtained in this test environment is memory bandwidth, and the performance bottleneck is mainly in the memory chip performance;
(2) Random memory access performance. The performance data in this test environment are mainly related to the memory bandwidth, the cache hit ratio and the efficiency of virtual address and physical address conversion.
The following is an analysis of the two memory access scenarios primarily. Before we analyze, let's outline the differences in memory access model between Docker and virtual machines. The following figure is the Docker and virtual machine memory access model:
Visible on application memory access, the application of the virtual machine is mapped to 2 times of virtual memory to physical memory, which is more expensive to read and write than Docker applications.
The following figure is the test data for the scenario (1), which is the memory bandwidth data. The left image is the data that the program runs on a CPU (that is, 8 cores), and the right graph is the data that the program runs on 2 CPUs (or 16 cores). The units are GB/s.
It can be seen from the data in the graph that the performance difference between Docker and virtual machine is not significant in memory bandwidth performance. This is because in the memory bandwidth test, the memory address of the read and write is continuous, high-volume, the kernel is optimized for this operation (data prefetching). Therefore, the number of virtual memory to physical memory mapping is relatively small, performance bottlenecks mainly in the physical memory reading and writing speed, so this situation Docker and virtual machine test performance is not very different;
The difference between Docker and virtual machine memory access performance in memory bandwidth testing is due to the fact that the number of mappings that require virtual addresses to physical addresses in memory bandwidth testing is low. Based on this assumption, we speculate that the performance gap between the two will be larger when a random memory access test is performed, because the number of mappings that require virtual memory addresses to physical memory addresses in random memory access tests will become greater. The results are shown in the following figure.
The left image is the data that the program runs on a CPU, and the right image is the data that the program runs on 2 CPU. As can be seen from the left, it is true that, as we predicted, the performance gap between the container and the virtual machine has become apparent in the random memory access performance, and the memory access performance of the container is significantly better than that of the virtual machine. , but to our surprise, the gap between the random memory access performance of the container and the virtual machine was not obvious when the test program was run on a 2-piece CPU.
In response to this phenomenon, IBM's paper gives a reasonable explanation. This is because when there are 2 CPU access to memory, memory read and write control will become more complex, because two CPUs may read and write the same address data, need to do some synchronization of memory data, resulting in memory read and write performance loss. This loss, even for physical machines, can be seen in the right-hand image of the memory access performance data is lower than the left. 2 CPU on memory read and write performance of the loss is very large, this loss occupies much larger than the virtual machine and Docker due to the different memory access model, so the Docker and virtual machine in the right image of random memory access performance on the difference we do not see the obvious differences.comparison of Docker and virtual machine start-up time and resource consumption
The above two subsections perform performance comparisons mainly from programs running in Docker and programs running in virtual machines. In fact, another important reason why Docker is so popular with developers is that the cost of starting a docker system is much lower than starting a virtual machine: whether from startup time or resource consumption. Docker directly utilizes the host's system kernel to avoid the system boot time required by the virtual machine and the resource consumption of the operating system. Using Docker can start a large number of containers within a few seconds, which is impossible for a virtual machine. With the advantages of fast start-up and low system resource consumption, Docker has a good application prospect in elastic cloud platform and automatic operation and maintenance system. The disadvantage of Docker
The previous content focuses on the advantages of Docker versus virtual machines, but Docker is not a perfect system. Compared to the virtual machine, Docker also has the following disadvantages:
1. Resource isolation is not as good as virtual machines, Docker use Cgroup to achieve resource constraints, limit the maximum resource consumption, and not isolate other programs to occupy their own resources. &NBSP
2. Security issues. Docker is not currently able to distinguish the user who executes the instruction, as long as a user has permission to execute Docker, then he can do all the operations on the Docker container, regardless of whether the container was created by that user. For example, A and B have the right to execute Docker, because the server side of Docker does not specifically determine the Docker Cline is initiated by which user, a can delete the container created by B, there is a certain security risk. &NBSP
3.docker is still in the version of the rapid update, the details of the function of the adjustment is relatively large. Some core modules rely on the high version kernel, and there are version compatibility issues