Watermelon's Speech
PPT Translation + explanation + other: wolf96
At the most basic level, these new APIs are designed to improve CPU performance and efficiency by:
Reduce CPU rendering bottlenecks,
Provide more predictable and stable drive behavior,
Give the application more control, just like in console development
In traditional APIs, there is usually only a single CPU thread that submits the work of the GPU. When trying to render an extremely complex scene, it can become a bottleneck.
As a result, most applications try to do as little as possible in the "Render thread", and drive multithreading can also share a bit, but scalability is of course limited.
By contrast, we will find these new APIs, rather than dealing with this problem, but rather more directly by supporting the GPU to create many threads to solve the problem.
When it comes to a driver's predictability and stability, when your application submits a draw call, or maps a buffer to write, your driver may respond by compiling shader code on the fly, inserting fences or refreshing caches to avoid collisions, and possibly even allocating memory. All of this means that two calls to the same API function may be at very different times (even across frames), which makes it difficult to get consistent frame time.
Compared to this drive behavior on the PC, the modern console graphics API gives the application greater control in these operations: When a synchronization occurs, when the memory is allocated, when the synchronizations occur.
command Buffers
The CPU creation thread invokes commands on the GPU, and the driver is handed over to the GPU's front-end for processing
A better model is to assume that CPU threads write these commands to memory. This memory is what we call the "command buffer".
The format of the command buffer is usually GPU-specific, so only the exact format that the driver knows.
When the command buffer command is full, or when the application requests a refresh, the buffer is committed to the GPU for execution
The driver will add a full buffer queue to the GPU front end for processing.
In this way, the CPU and GPU can run asynchronously.
The previous generation of graphical API users such as D3D11 or OpenGL are often unaware of the existence of these command buffers, so a simpler model is possible.
However, the GPU processes commands faster than the CPU produces. If we want to improve rendering performance, we need to extend multiple CPU threads.
But when such a command buffer is hidden, it is not possible to extend multiple CPU threads to create GPU commands.
So in order to solve these problems, all new APIs include a more specific concept about the command buffer or "command list".
As illustrated: 4 CPU threads are writing commands to another command buffer ...
When one of the threads finishes the command, it commits the buffer queue to execute.
If you want to generate more GPU commands, the thread will start filling out a new command buffer.
All of these APIs also support multiple queues, so the GPU can use multiple asynchronous stream commands.
Most of these API usage methods are often referred to as "free threading
Any API function can be called on any thread and does not require a render thread, but any operation of the application must ensure that the same object is read and written correctly synchronously
The contents of the command buffer are opaque and cannot be pre-buffered in the same way as in the console.
Metal only one command buffer, once committed, it is implicitly deleted
Vulkan, D3D12 allows for more buffer reuse and can resubmit the same buffer through frames/frames
The situation in Unity5
Unity5.2 is a two-thread rendering, with a main thread responsible for advanced logic rendering. One render thread is responsible for invoking the API and other work.
Unity has its own command buffer and ring buffer.
Can be used in any platform, except WebGL, because WebGL has no threads
A ring buffer is a ring buffer, which is a queue data structure that is connected to the end of a thread, and passes it between one threaded context and another.
Ring Buffer is faster than a linked list because it is an array and has an easy-to-predict access pattern.
For more details, see http://www.cnblogs.com/shanyou/archive/2013/02/04/2891300.html and http://my.oschina.net/alphajay/blog/36602. I think it's a pretty clear story.
Pipeline state Object PSO
OpenGL is a state machine that changes state by enabling/disabling, and D3d9 also has a similar "setrenderstate" to change state. But with 10 and 11, the transition to a coarser-grained state object. So a single object behaves blend-related state.
These new APIs have a state object that encapsulates almost all of the GPU state vectors.
So change the state of the individual, to switch to the pipe state, draw, then switch to State B, draw, and so on.
These very coarse-grained, global state objects allow the driver to compile and validate the state during a foreseeable period of time. When you first start rendering in a new state, avoid a drive pause.
What got into PSO? The most important part is the shader of different programming phases. So you need a unique PSO that combines every shader you'll use. Some of the ways the engine works, but if you rely on the ability to mix and match, this can be tricky.
At the same time, PSO also contains most of the fixed function states, such as mixed-related, rasterization and so on.
It also contains formatting information and color/depth targets for all vertex properties.
What can't go into PSO? The most important thing is your resource binding: The actual vertex/index/persistent buffer, texture, sampler and so on.
In addition, some of the fixed functional states of each APIs are separate from the PSO. Each API is a little different, but it's an example, but it allows you to dynamically set the blend color of constants.
Memory and Resources
Allocate, representing some large physical or virtual address space
Resources, a combination of memory and its specific layout
A view that prepares a resource for special purposes (such as a color target)
As an example,
When assigning, you may choose different caching behaviors and decide whether you need CPU visibility, GPU visibility, or both.
When you create a resource, you choose whether it is a linear buffer, or some texture model. You may decide to store a 2 D multi-sampled texture in a single memory.
When creating a view, you may decide to use the layer structure array as a depth-template target in a particular format.
Resource Bindings
We have the full GPU state vector to see the PSO state and the no PSO state.
In addition, we have some conceptual "binding tables" that we fill in to bind textures, samplers, and buffers to GPU state.
Descriptors are a piece of data that you can write, copy, and move descriptors without allocating or freeing memory.
For example, a texture descriptor might include a texture data pointer, and along with width/height, formatting, and so on.
Depending on what the descriptor represents, you can have different types of descriptors.
Different GPUs will store information differently, with a format that is opaque.
draw a new resource-binding model. Now the Application Management Descriptor table points to our textures, samplers, and buffers, and the GPU states just point to these tables.
Tied up, the PSO contains all the GPU states, the GPU State points to the descriptor, and the descriptor is a pointer to the data, all pointing to the data.
Each material imposes constraints on the layout of some restricted tables. One might say that "descriptor 2 is preferably a decal in table 0," otherwise the result will be undefined.
To capture this information, the new API has the concept of "piping layout", a clear API that describes what type of descriptor should appear in each slot of the binding table. This actually forms the interface between the shader of the PSO, and the descriptor table.
Multiple shader (or, more specifically, multiple PSO) can use the same layout, so you can easily bind a set of tables and pass multiple draw call
Use them.
D3d12 and Vulkan have a heap or pool assigned by the descriptor descriptor.
Vulkan called the descriptor Descriptor "descriptor sets". "D3d12 calls it" descriptor tables ", but it's just a sub-range of a heap.
Both of these API objects represent a complete binding layout. D3D calls it "root layout" because it is the layout of the root table, and Vulkan calls it "pipeline layout." ”
Unity5.2 has now implemented resource bindings
But the main function has not been realized, a pity
Overall
and unity uses DX12 to become very slow to run
Unity hasn't done Vulkan yet.
Vulkan? is a new generation of high-performance image processing and computing APIs developed by Khronos Group, which is more efficient than OpenGL ES.
From http://imgtec.eetrend.com/article/5245 shows the Vulkan and OpenGL ES 3.0 CPU Usage Differences
The Vulkan enables direct control of the application to GPU acceleration, maximizing performance and predictability, while using the new standard of Khronos Spir-v? The intermediate language specification can provide higher rendering language flexibility. The Vulkan minimizes driver overloading and improves multithreading performance on mobile, desktop, console, and embedded platforms. Http://cn.khronos.org/news/press/khronos-to-create-new-open-standard-for-computer-vision
This data is from two different benchmarks of the scene (one is static; the other is a lot of material animation parameters). Test 3 PCs on different hardware configurations.
Which means unity is not fully functioning as a DX12 now.
Coloring language Aspect
Unity believes that HLSL is not an ideal coloring/computing language, but there are huge, large numbers of shaders that have been compiled. Unity wants to continue to use HLSL, but will eventually move to METALSL
D3D9/11/12: Using D3dcompiler_xx.dll GL2, GLES2:HLSL, Hlsl2glslfork (GLSL), Glsl-optimizer GL3/4, GLES3:HLSL (HLSLCC), GLSL METAL:HLSL, Hlsl2glslfork, GLSL (Glsl-optimizer), M ETALSL Vulkan: Not currently
Hopefully unity will develop rendering technology as soon as possible, giving full play to DX12 's powerful features
Bloggers may post several situations where unity is now rendered ....
-----------by wolf96 http://blog.csdn.net/wolf96
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
What the new graphics API brings to Unity5 & benefits of the next generation of new graphics APIs