GPU Virtualization Technology

Last Update:2016-04-04 Source: Internet

Author: User

Tags passthrough

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, GPU overview

GPU The English name is graphic processing Unit,gpu Chinese is all called Computer graphics processor, presented by Nvidia Corporation in 1999. The concept of GPU is also relative to the CPU in the computer system, due to the increasing demand for graphics, especially in home systems and game enthusiasts, and traditional CPUs can not meet the status quo, so need to provide a special processing graphics core processor. The GPU, as the "heart" of hardware graphics, is equivalent to the CPU's role in computer systems. The GPU can also be used as an important basis for distinguishing between 2D hardware graphics and 3D hardware graphics. Hardware graphics are primarily used to process features and 3D images using CPUs, which are referred to as "soft acceleration". 3D hardware graphics is the ability to focus features and 3D image processing power in the hardware graphics, that is, "hardware acceleration." Most of the popular graphics cards in the market are produced by two companies such as Nvidia and ATI.

1.1, why the need to specifically appear GPU to handle graphics work, CPU why not?

GPU is a parallel programming model, and the CPU's serial programming model is completely different, resulting in a lot of good algorithms on the CPU can not directly map to the GPU, and the structure of the GPU is equivalent to a shared storage-type multi-processing structure, so the parallel program on the GPU and the serial program on the CPU is very different. GPU mainly uses the cubic environment of material mapping, hardware t&l, vertex blending, bump mapping and texture compression, dual texture four-pixel 256-bit rendering engine and other important technologies. Because the Graphics rendering task has a high degree of parallelism, the GPU can effectively improve processing power and memory bandwidth simply by adding parallel processing units and memory control units. GPU is designed to be very different from the CPU, the CPU is designed to handle common tasks, so it has a complex control unit, and the GPU is mainly used to deal with computationally strong and illogical computational tasks, and the available processing units in the GPU can be used more as execution units. As a result, it has unparalleled advantages over CPU,GPU in applications with a large number of repetitive dataset operations and frequent memory accesses.

1.2. How is the GPU used?

There are two ways to use the GPU, one is to develop applications that invoke GPU devices through a common graphics library interface, and the GPU itself provides API programming interfaces that applications directly invoke GPU devices through the API programming interfaces provided by the GPU.

1.2.1, General Graphics library

Using the GPU in a common graphics library, all through an existing library of graphics functions such as OpenGL or Direct3D, to write the rendering language (shading language) to control the GPU's internal renderer (shader) to do the required calculations.

At present, the industry recognized graphics programming interface mainly has OpenGL and DirectX these two kinds of interfaces. OpenGL is the preferred environment currently available for developing interactive, portable 2D and 3D graphics applications, and is the most widely used standard for current graphics applications. OpenGL is a computer graphics processing system developed by SGI, a software interface for graphics hardware, GL is a graphics library, and OpenGL applications do not need to focus on the operating system and platform in which they operate. As long as there is an OpenGL standard in any environment will produce the same visual effect. Like OpenGL, DirectX (Directextension) is also a graphical API. It is a multimedia programming interface created by Microsoft and has become the standard for Microsoft Windows. To meet the needs of GPU applications, DirectX defines new versions in a timely manner based on the expansion and progression of new GPU capabilities, and it provides functionality that is almost synchronized with the capabilities provided by the GPU.

1.2.2, GPU itself programming interface

GPU the programming interface provided by itself is provided primarily by two companies that provide GPU devices, including the NVIDIA CUDA Framework and AMD (ATI) company, which introduced the CTM (Close tometal) framework in 2006 (note that originally ATI produced GPU devices, Later acquired by AMD). AMD's CTM framework is no longer in use, primarily AMD (ATI) introduced the ATI Stream SDK Architecture in 2007, and AMD (ATI) turned to the Open OpenCL standard in 2008, so AMD (ATI) has no independent, A private, universal computing framework.

CUDA (computeunified Device Architecture) is a dedicated universal computing framework released by NVIDIA in June 2007. The use of CUDA for general computing programming no longer requires the use of graphical APIs, but rather the development in a very similar way to the C language. In the CUDA programming model, there is a CPU called host and several GPUs called devices or coprocessors (co-processor). In this model, the CPU and GPU work together to perform their duties. The CPU is responsible for logical transaction processing and serial computing, while the GPU is focused on executing threaded parallel processing tasks. The CPU and GPU each have their own memory on the host side of the memory address space and on the device side. Some large applications, such as oil survey, fluid dynamics simulation, molecular dynamics simulation, bio-computing, audio-video coding and decoding, and astronomical computing, are commonly used by the CUDA framework to program itself. Most of our enterprise-class applications are developed using a common graphics library for development calls to GPU devices due to development costs and compatibility.

1.3. How does the GPU work?

GPU There are two main parts to the internal components of general computing and graphics processing: The vertex processor (vertex processor) and the sub-processor (fragment processor). This kind of processor has the mode of stream processor, that is, no large capacity of fast storage/memory can read and write, but directly on the chip using temporary registers for streaming data operation.

650) this.width=650; "src=" Http://s1.51cto.com/wyfs02/M01/7E/78/wKioL1cB4qCAWT-iAABrTjhmZR0004.png "title=" 1.png " alt= "Wkiol1cb4qcawt-iaabrtjhmzr0004.png"/>

When the GPU is used for graphics processing, vertex rendering, pixel rendering, and geometric rendering operations within the GPU can be done through the stream processor. As you can see, all the flow processors inside the GPU are equivalent to a multi-core processor, the data can be easily moved between the input and output between different stream processors, and the GPU dispatcher and control logic can dynamically assign the stream processor to perform the corresponding vertex, pixel, geometry and other operations, Because the stream processor is generic.

Second, GPU Virtualization

Start our topic, there are three types of graphics processing in virtual machine systems: One is the use of Virtual graphics card, the other is the direct use of physical graphics card, and finally the use of GPU virtualization.

2.1. Virtual Graphics

The first adoption of virtual graphics is now the choice of mainstream virtualization systems, because professional graphics hardware is expensive. The technologies currently used for virtual graphics include virtual network Computing, Xen virtual frame cache, VMware Virtual graphics display processor GPU (graphics processing Unit) and the virtual Machine Manager-independent graphics acceleration system VMGL (vmm-independent graphics acceleration).

VNC (virtualnetwork Computing) It basically belongs to a display system, which means it can transfer the full window interface over the network to another computer's screen. The "Terminal server" contained in the Windows server is a design that belongs to this principle. VNC was developed by the T-Lab and is licensed under the GPL (General Publiclicense), which can be obtained free of charge by anyone. VNC software consists of two parts: VNC server and VNC viewer. The user will need to install the VNC server on a remotely operated computer before the VNC Viewer can be remotely manipulated on the host side.

XEN virtual frame caching refers to a virtual display device provided by Xen. The virtual display device employs a VNC server in the privileged domain, so the virtual display device has a similar VNC interface. The client writes the data in the Xen virtual frame cache, then transmits the modified picture via the VNC protocol, and finally notifies the front end to update the corresponding area. The source code for this virtual frame cache device is from open source qemu. We see the operating system desktop interface of the virtual machine on the XenServer, which is the display of this technology.

Virtual network computer VNC and Xen virtual frame cache These two modes still do not provide some hardware graphics acceleration capability in the virtual machine. Since there is still no mechanism to enable virtual machines to access graphics hardware, these virtual display devices handle graphics data in a way that uses both CPU and memory. The function of physical display device is not used. However VMGL this pattern has already implemented this mechanism, which is often referred to as the front-end-backend virtualization mechanism (front-end virtualization). VMGL This mode uses this mechanism to send data that requires graphics processing to a virtual monitoring machine with hardware graphics acceleration for the corresponding graphics data processing. There are two types of graphical processing interfaces available for GPU application development: OpenGL and Direct3D. In both types of graphics processing interfaces, OpenGL is the only type of graphical API interface that can operate across platforms in a major operating system. In the virtual machine platform, VMGL is the first project to virtualize the OpenGL API. VMGL works by deploying a pseudo-Library (Fake library) in the client operating system to replace the standard OpenGL library, which has the same interface as the standard OpenG library, and the Pseudo-Libraries (Fake library). The pseudo-Library in the client operating system also implements a remote call to the host operating system of the remote server. In this way, all local OpenGL calls are interpreted as a service request to the remote server, and the host operating system of the remote server has a true OpenGL library, graphics driver, and physical hardware GPU, which is responsible for completing the OpenGL request and displaying the execution results on the screen. Since VMGL is completely transparent throughout the process, the application calling OpenGL does not need to modify the source code and does not require a binary rewrite, and there is no need to make any changes to the virtual machine platform.

2.2, graphics card through

Video card passthrough is also called video card penetration (pass-through), refers to bypassing the virtual machine management system, the GPU is assigned to a single virtual machine, only the virtual machine has the ability to use the GPU, this exclusive device method allocation preserves the integrity and independence of the GPU, is close to performance and non-virtualized conditions and can be used for general purpose calculations. But the graphics card passthrough requires some special details of the video card, while the compatibility is poor, and only the device can be used by some GPUs. Xen 4.0 adds VGA passthrough technology, so XenServer also has the technology, and XenServer's passthrough is the use of Intel device virtualization (Intel vt-d) technology to expose a display device to a customer virtual machine, Not only are other client VMS inaccessible, but even the host virtual machine loses the ability to use the GPU. It implements some special details of the graphics card in the guest virtual machine, such as VGA BIOS, text mode, IO port, memory mapping, VESA mode, etc. to support direct access. GPU using XenServer VGA Passthrough technology is efficient, full-featured, but only by a single system exclusive use, lost the function of device reuse. The vmwareesx includes a vmdirectpath I/O framework that can be used to pass our graphics device directly to a virtual machine. XenServer and VMware use different technologies but the effect is the same, the physical graphics device directly to one of the virtual machines used to achieve 3D display and rendering of virtual machines.

Because graphics passthrough is actually using native drivers and hardware by the guest operating system, lacks the necessary middle tier to track and maintain the GPU state, it does not support virtual machine advanced features such as live migration. Operations such as XenServer Passthrough are forbidden to perform save/restore/migration. In VMware's virtual machines, once the vmdirectpath I/O feature is turned on, its corresponding virtual machines will lose the ability to perform suspend/resume, live migration.

2.3. Graphics Virtualization (GPU virtualization)

Video card virtualization is the process of slicing the video cards and assigning them to the virtual machine for use. Because graphics cards that support graphics virtualization can typically be split into different sizes of time slices as needed, they can be allocated to multiple virtual machines for use. The principle of implementation is to use the Application Layer interface virtualization (API remoting), API redirection refers to the application layer to intercept the GPU-related application programming interface (Application PROGRAMMINGINTERFACE,API), The function is completed by redirection (still using the GPU), and the execution results are returned to the application.

We are now using the 3D Desktop virtualization solution, mostly using the graphics virtualization technology provided by NVIDIA, which is Vcuda (virtual Cuda) technology, which we talked about in the CUDA framework. Vcuda adopts the method of intercepting and redirecting Cuda API at the user layer, establishes the logical image of the physical GPU in the virtual machine-virtual GPU, realizes the fine granularity division, reorganization and reuse of GPU resources, and supports the advanced features of virtual machine such as multi-machine concurrency and suspend recovery.

The implementation of the Vcuda is probably as follows: It includes three modules: Cuda client, Cuda Server and Cuda management side. In XenServer, for example, a VMM is running on a physical hardware resource to provide a hardware image up, running several virtual machines on VMM. One of the virtual machines is a privileged virtual machine (host VM), which is domain 0 in XenServer, and the operating system running in the virtual machine is called the host OS. the host OS has direct hardware control, native Cuda libraries and GPU drivers installed in the system, allowing the host OS to directly access the GPU and use Cuda. Other virtual machines are non-privileged virtual machines (guest VMs) that run on the operating system (guest OS) and do not directly manipulate the GPU. Here we call the Cuda client driver, the CUDA server is called the host driver, Cuda management is called the GPU manager.

2.3.1, Client

The client driver is essentially the graphics driver that we install on a virtual machine such as Windows 7. The main function is to provide a library of Cuda APIs at the user level and a virtual GPU (VGPU) to maintain CUDA-related hardware and software status. The client-side driver directly targets the CUDA application, and its role includes:

1) intercepting Cuda API calls in the application;

2) Choose a communication strategy that provides higher-level semantics for virtualization support;

3) The interface and parameters of the call are encapsulated and encoded;

4) The data returned by the server is decoded and returned to the application.

In addition, the client driver first requests GPU resources from the management side before the first API call arrives. Each independent invocation process must drive the request resource to the host management side to achieve real-time scheduling of GPU resources and tasks.

In addition, the client driver also sets the VGPU to maintain the hardware and software status associated with the graphics card. Vgpu itself is essentially a data structure of a key-value pair, where the currently used address space, video memory object, memory object, etc. are stored, and the API's call order is recorded. When the calculation results are returned, the client driver updates the VGPU based on the results.

2.3.2, server-side

The server-side component is located in the application tier in the privileged virtual machine (XenServer term: privileged domain). Privileged virtual machines can interact directly with the hardware, so the server-side component can manipulate the physical GPU directly to complete the common computing task.

The service side targets the real GPU, and its role includes:

1) receive the client's datagram, and parse out the invocation and parameters;

2) Review the call and parameters;

3) Use Cuda and physical GPU to calculate the call of audit pass;

4) Encode the result and return it to the client;

5) Manage the Cuda-enabled GPU in the computing system.

In addition, the first task for the server to run is to register the information for the CUDA-enabled GPU device on the management side. Each application is assigned a separate service thread when the server is responding to requests from the client. The server manages the local GPU resources uniformly, provides the GPU resources according to certain policies, and updates the related hardware and software status modified by the API call to VGPU.

2.3.3, management side

The management side component is located in the privileged domain, and on the basis of implementing CUDA programming interface virtualization, the GPU's powerful computing power and computing resources are isolated, partitioned and dispatched at a higher logical level. Using compute threads and worker threads on the CUDA server to some extent enables load balancing between GPUs on a single physical machine, the Cuda management side component is load balanced on a higher logical level, allowing GPU load balancing in the same GPU virtual cluster. Management side Components

The principle of scheduling is to try to make the GPU requirements on the same physical machine self-sufficient, if the physical machine has the ability to meet the GPU resources, in general, the virtual machine on the physical machine's GPU requirements are redirected to the physical machine's Cuda server.

Management of GPU resources, using a centralized, flexible mechanism to achieve:

1) Dynamic scheduling: When the user's resource idle time exceeds a certain threshold or the end of a task, the management side recycles the resource, and when the user publishes the computing task again, assigns the GPU resources to the task;

2) Load balancing: When the authority calculates excessive pressure, it adjusts the computational load, and chooses the appropriate GPU resources to disperse the computing load through dynamic scheduling.

3) Failure recovery: When a failure occurs, the task is transferred to the new available GPU resource.

This article is from "I take fleeting chaos" blog, please be sure to keep this source http://tasnrh.blog.51cto.com/4141731/1760062

GPU Virtualization Technology

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More