What does the new graphics API bring to unity5? What are the benefits of the next-generation new graphics API?
Watermelon speech
Ppt translation + explanation + others: wolf96
At the most basic level, these new APIs are designed to improve CPU performance and efficiency:
Reduces the CPU rendering bottleneck,
More predictable and stable actions are supported,
Give more control to applications, just like developing on the console
In traditional APIs, only a single CPU thread submits GPU work. When trying to render an extremely complex scenario, it may become a bottleneck.
Therefore, most applications try to do as few things as possible in the "rendering thread", and share with the driver multithreading, but the scalability is of course limited.
In contrast, we will find these new APIs, instead of solving this problem, we will solve the problem directly by supporting GPU to create a lot of multithreading.
When talking about the predictability and stability of a driver, when your application submits a draw call, or maps a buffer to write, your driver may respond by instantly compiling the coloring code, insert a fence or refresh the cache to avoid conflicts or even allocate memory. All this means that two calls to the same API function may be performed at very different times (or even cross-frame), which makes it difficult to obtain the consistent frame time.
Compared with the driver behavior on the PC, the modern console graphic api gives more control to the application. In these operations: When compilation occurs, when synchronization occurs, when memory is allocated.
Command Buffer
The CPU creation thread calls a command to the GPU, and the driver is handed over to the GPU front-end for processing.
A better model is that the cpu thread writes these commands into the memory. This memory is what we call the "command buffer ".
The format of the command buffer is generally GPU-specific, so only the driver knows the exact format.
When the command buffer command is full or the application request is refreshed, the buffer is submitted to the GPU for execution.
The driver adds a complete Buffer Queue to the GPU frontend for processing.
In this way, the CPU and GPU can run asynchronously.
Users of the previous generation of graphic APIs, such as D3D11 or OpenGL, usually do not know the existence of these command buffers. Therefore, a simpler model is feasible.
However, the speed of GPU processing commands is faster than that of CPU. To improve rendering performance, we need to expand multiple CPU threads.
However, when such a command buffer is hidden, it is impossible to expand multiple CPU threads to create GPU commands.
To solve these problems, all new APIs include a more specific concept about the command buffer or "command list ".
As shown in the figure: Four CPU threads are writing commands to another command buffer...
When one of the threads completes the command, it will submit the Buffer Queue for execution.
If you want to generate more GPU commands, the thread will begin to fill in a new command buffer.
All these Apis support multiple queues, so GPU can use multiple asynchronous stream commands.
Most of these APIs are called "free threading ".
Any API function can be called on any thread without rendering the thread. However, any operation of the application must ensure that the same object is read and written synchronously correctly.
The command buffer content is not transparent and cannot be pre-created as in the console.
Metal only has one command buffer. Once submitted, it is implicitly deleted.
Vulkan and D3D12 allow more buffer zones to be reused. You can re-submit the same buffer through frames/frames.
Unity5
Unity5.2 uses two-thread rendering, and one main thread is responsible for Advanced logic rendering. A rendering thread is responsible for calling APIs and other work.
Unity has its own command buffer and ring buffer.
It can be used on any platform, except webGL, because webGL has no thread
A ring buffer is a ring buffer. It is a queue data structure that is connected at the beginning and end. Data is transmitted between the context of one thread and the context of another thread.
The Ring Buffer is faster than the linked list because it is an array and has an easy-to-predict access mode.
For more information, see explain.
Pipeline status object PSO
OpenGL is a state machine that enables or disables state changes. D3D9 also has a similar "SetRenderState" to change the state. But with 10 and 11, the status object is changed to a coarse-grained State object. Therefore, a single object is blend-related.
These new APIs have a State object that encapsulates almost all GPU state vectors.
So to change the individual state, you need to switch to the pipeline state, draw, then switch to status B, draw, and so on.
These are very coarse-grained, and the overall State object allows the driver to compile and verify the state for the foreseeable period ,. When you start rendering in a new State, avoid the driver pause.
What entered the PSO? The most important part is the shader in different programming stages. So you need a unique PSO to combine each of the colorants you will use. Some engines work in a way that may be tricky if you rely on mashups.
At the same time, the PSO also contains most of the fixed functional states, such as hybrid and raster.
It also contains the format information and color/depth targets of all vertex attributes.
What cannot be entered into the PSO? The most important thing is your resource binding: actual vertex/index/continuous buffering, texture, samplerger, and so on.
In addition, some fixed functional states of each APIs are also separated from those of the PSO. Each API is a bit different, but it is an example, but it allows you to dynamically set the mixed color of constants.
Memory and resources
Allocation, representing some large physical or virtual address spaces
Resources, a combination of memory and its specific layout
For a view, prepare a resource (such as a color target) for special purposes)
For example
During the allocation, you may choose different cache behaviors and decide whether to enable CPU visualization, GPU visualization, or both.
When creating a resource, you will choose whether it is a linear buffer or some texture models. You may decide to store a 2 d multi-sample texture in one memory.
When creating a view, you may decide the layer structure array as a depth-template target in a specific format.
Resource binding
We have a complete GPU status vector to see the status and status of the PSO.
In addition, we have some conceptual "bind tables". We fill in the BIND texture, samplerger, and buffer to the GPU state.
A descriptor is a piece of data that can be written, copied, and moved without the need to allocate or release memory.
For example, a texture descriptor may include a texture Data Pointer along with the width/height, format, and so on.
You can have different types of descriptors according to their differences.
Different GPUs store different information and the format is not transparent.
Create a new resource binding model. Currently, the descriptor tables managed by the application point to our textures, samplerkers, and buffers. The GPU state only points to these tables.
The PSO contains all GPU states. The GPU State points to the descriptor table, which is a pointer to the data.
Each material must impose some restrictions on the layout of the table. Someone may say, "descriptor 2 is better than a texture in Table 0," otherwise the result will be undefined.
To capture this information, the new API has the concept of "Pipeline Layout", which is a clear API that describes which types of descriptors should appear in which slots of each bound table. This actually forms the interface between the shader of the PSO and the descriptor table.
Multiple shader (or, more specifically, multiple PSO) can use the same layout, so you can easily bind a group of tables and use multiple draw calls
Use them.
D3D12 and Vulkan have the heap or pool allocated by the descriptor table.
Vulkan calls the descriptor Table "" descriptor sets ". "D3D12 is called" descriptor tables ", but it is only a subinterval of the heap.
These two api objects represent a complete binding layout. D3D is called "root layout" because it is the layout of the root table, and vulkan is called "pipeline layout ."
Unity5.2 now supports resource binding
However, it is a pity that the main function has not been implemented.
In general
In addition, unity uses DX12 to become slow.
Unity has not implemented vulkan
Vulkan is a new-generation high-performance image processing and computing API developed by Khronos Group. It is more efficient than OpenGL ES.
From http://imgtec.eetrend.com/article/5245demonstrate vulkanand OpenGL ES 3.0 CPU usage difference
Vulkan enables direct control of applications on GPU acceleration to maximize performance and predictability, while using Khronos's new standard SPIR-V Interlanguage specifications can bring higher rendering language flexibility. Vulkan minimizes drive overload and improves multithreading performance on mobile, desktop, console, and embedded platforms. Http://cn.khronos.org/news/press/khronos-to-create-new-open-standard-for-computer-vision
The data comes from two different benchmarking scenarios (one is static and the other is a lot of material Animation Parameters ). Perform different hardware configuration tests on three PCs.
That is to say, unity does not fully play the role of DX12.
Coloring Language
Unity believes that HLSL is not an ideal coloring/computing language, but it has a huge number of color splitters that have been compiled. Unity wants to continue using HLSL, but it will eventually go to MetalSL
D3D9/11/12: Use d3dcompiler_xx.dll GL2, GLES2: HLSL-> (hlsl2glslfork)-> GLSL-> (glsl-optimizer)-> GLSL GL3/4, GLES3: HLSL-> (hlslcc)-> GLSL Metal: HLSL-> (hlsl2glslfork)-> GLSL-> (glsl-optimizer)-> MetalSL Vulkan: No
In short, we hope that unity will develop rendering technologies as soon as possible to give full play to the powerful features of DX12.
The blogger may post several pieces of unity rendering details later ....
Http://blog.csdn.net/wolf96 ----------- by wolf96
Copyright Disclaimer: This article is an original article by the blogger and cannot be reproduced without the permission of the blogger.