By cooljie, 73 visits, favorites, Edit
The release of DirectX 11 has brought a lot of exciting new features. The use of computer shader for graphics post-processing, multi-thread rendering, out-of-order transparency, and other functions have never been imagined by GPU hardware before. These features mark the evolution of the GPU, ending with an enhanced performance to get larger floating point throughput to adapt to the increasing number of triangles and more complex shader commands, instead, we use advanced rendering technology combined with GPU universal computing to implement complex effects.
Looking back at the replacement process of DirectX, almost all of them have a disruptive impact on the GPU architecture, most of which require the GPU to change the existing shader cell structure, or to add resources to the shader unit, these improvements aim to increase the number of GPU commands, increase the number of registers, increase the texture size, and increase the texture accuracy. This improvement will inevitably lead to an increase in the number of transistors, which means that each shader unit inside the GPU has become larger.
During this DirectX 11 upgrade, the following features are worth noting:
● The version of The Shader Model is upgraded to Shader Model 5.0. It adopts the object-oriented concept and supports double precision data.
● Tessellation surface subdivision technology was officially supported by Microsoft and gradually matured;
● Multithreading multi-threaded processing, making graphic processing no longer embarrassing to the multi-threaded programming environment;
● Put forward Microsoft's compute shader general-purpose computing concept, pushing GPU general-purpose computing to a new peak;
● The New Texture compression texture compression solution saves hardware resources in an environment with extremely low image quality loss.
Although the hyper-threading concept has been developed in the CPU field for decades, mostProgramIt was not until the popular multi-core CPU in recent years that people began to care about the parallelism of programs. Before that, mostCodeIt is a simple single thread. It is very difficult to find and mine the performance improvement brought about by multithreading in these codes.
To change this situation, the DirectX 11 feature also includes the support for multi-threading ). Yes, no matter DirectX 10 or DirectX 11, all the color information will eventually be raked and displayed on the Computer Display (whether in linear or synchronous mode ), however, DirectX 11 has added support for multithreading technology.
Multithreading changes from DirectX 10 to DirectX 11
Thanks to this, applications can synchronously create useful resources or manage statuses, and send extraction commands from all private threads, which is undoubtedly more efficient. This multithreading technology of dx11 may not accelerate the graphics subsystem (especially when our GPU resources are limited), but it can improve the efficiency of thread-based game startup, in addition, the potential of increasing the number of desktop CPU cores can be utilized.
Multi-threaded rendering 1
In DirectX 11, Microsoft divided a single Direct 3D device into three independent interfaces: Device and immediate context) and deferred context ).
Multi-threaded rendering 2
All three are distributed to independent threads, and the device and deferred context can allocate multiple threads to send tasks waiting for execution to the immediate context or rendering thread. Such a design can pre-access the resources required for Graph Generation. At the same time, the CPU can also use the multi-thread processing of the video card to speed up DirectX processing, reduce the CPU response time, so that the game is no longer limited by the CPU bottleneck.
Microsoft introduced the DirectX compute concept in DirectX 11. In fact, this concept exists in many applications before. DirectX compute is enhanced in the field of general computing, further enhancing the kinetic energy of General GPU computing. Opencl is an API standard for general GPU computing. It provides parallel computing APIs and an extendedProgramming LanguageDirectX compute enhances General GPU performance. However, due to the different points, it is not a competitive relationship with opencl, but further enhances the practicality of opencl-DirectX 11 improves GPU general performance, opencl-based general-purpose computing programs will be more efficient.
The DirectX compute concept includes the compute shader technology we will introduce next. In fact, we can understand that they are one technology and two naming methods. Therefore, we will directly refer to this compute shader technology, which enables general GPU computing to assist in graphic processing.
GPU is a graphics processor. In the past, GPU general-purpose computing requires programmers to disguise the data as a GPU-recognizable image, and then convert the image output by the GPU into the desired result, by using the compute shader general computing in dx11, any type of data (even non-graphic data) can be directly computed, and can be written and written at any time without the constraints of the graphic rendering process, GPU general-purpose computing performance has improved a lot.
Due to the powerful floating point computing capability of GPU, the technology that supports GPU for general-purpose computing is developing rapidly. nvidia and AMD have Cuda and stream technologies respectively, and the previous two companies have their own battles, now Microsoft has seen the dawn of General GPU computing. In dx11, the compute shader technology is added to unify the current general computing technology. You can think that the compute shader standard is the open Cl proposed by Microsoft.
Compute shader graphic Assembly Line
The main features of compute shader include inter-thread data communication, a set of basic units for Random Access and stream I/O operations, which can accelerate and simplify existing technologies such as images and post-processing performance, it also prepares for the new technologies of dx11 hardware, which is of great significance for Game and application development.
With the help of DirectX 11 and CS, game developers can bypass complex data structures and run more commonAlgorithm. Like other fully programmable dx10 and dx11 pipeline phases, CS will share a set of material resources (I .e., coloring processors ).
Compute shader can be used in many places. In the game, GPU can be used for light tracing, A-buffer sampling anti-aliasing, physical effects, AI, and other special game effect operations. In addition to games, programmers can also use the CS architecture for image processing and post process.
Compute shader technology is a new feature of Microsoft DirectX 11 API. With the help of compute shader, programmers can directly use GPU as a parallel processor. GPU not only has 3D rendering capabilities, it also has other computing capabilities, that is, the concept of gpgpu and physical acceleration.
Computer shader plays an important role in graphic computing
In Figure 1, compute shader performs post process. The image is a game in Metro 2033. Using compute shader Technology for depth-of-field processing can improve efficiency. Figure 2 shows the use of compute shader Technology for IA artificial intelligence. Figure 3 shows Cuda or the future compute shader with Optix Technology for ray tracing. Figure 4 shows a typical general-purpose computing application with high requirements on shader performance.
The compute shader technology is actually a bridge between General GPU computing and traditional graphics processing. In the future, more special features will be implemented through GPU universal computing. With this technology, the stream processor unit in the GPU can become a CPU-like Computing Center, processing a series of such as post-rendering, image quality enhancement, high-quality shadow filtering, depth of field effects, and advanced ambient light masking effects.
The original intention of computer shader is to use general computing methods for post-processing. Due to the powerful floating point computing capability of GPU, the technology that supports GPU for general-purpose computing is developing rapidly. nvidia and AMD have Cuda and stream technologies respectively, and the previous two companies have their own battles, now Microsoft has seen the dawn of General GPU computing. In dx11, the compute shader technology is added to unify the current general computing technology. You can think that the compute shader standard is the open Cl proposed by Microsoft.
Depth of field is an important feature in the human visual system. When the human eye is imaging the real world, the focal length is automatically adjusted to adapt to different scenic distances, and the objects under the eyes are placed on the focus plane, so they are clearly imaged on the retina; the imaging of objects outside the focus plane is blurred. The focal length, diameter, and distance of the lens (pupil) determine the Blur degree of the object during imaging. Starting from the Shader Model 2.0 standard, the depth of field processing has been executed by GPU hardware. In the DirectX 11 era, the execution efficiency has been greatly improved with the support of compute shader technology.
In the past, almost no depth-of-field effects were introduced in virtual reality systems, and the entire scene was clearly displayed after imaging. In this way, the entire scene was not realistic and natural, in addition, the depth implied by the depth of field is missing. The depth-of-field effect helps to synthesize three-dimensional photos and relieve eye fatigue in virtual reality systems, enhancing the sense of realism and immersion of scenes.
Call of Duty 4 game cover
In principle, CS technology can handle all the results. However, the most commonly used method is DOF (depth of field), that is, the depth of field effect. HDR (high-Dynamic Range) high-dynamic light rendering currently does not require CS technology, but it is very cost-effective to use CS technology to achieve these effects. The depth of field is mainly suitable for first-person games, such as call of duty. The depth of field effect allows players to immerse themselves in games with film-level special effects, efficient processing of depth of field will become an important part of implementing real game special effects.
Compute shader can be used in many places. In the game, GPU can be used for light tracing, A-buffer sampling anti-aliasing, physical effects, AI, and other special game effect operations. In addition to games, programmers can also use the CS architecture for image processing and post process.
Shortly after the release of the dx11 video card, amd released a dx11 demo named Ladybug, with a demo of the depth of field effect.
After the hardware supports compute shader, the corresponding hardware must be more flexible than the contemporary hardware, because when running the CS code, hardware must support random read/write and irregular arrays (instead of simple fluids or fixed-size 2D arrays) multiple outputs, the ability to directly call individual or multiple threads, 32 KB shared storage space and thread group management systems, granular data instruction sets, synchronous construction, and executable unordered Io as per the programmer's needs computing capability.
With direct compute technology, DirectX 11 can achieve order independence and transparency. That is, no matter the object is placed in any order, the GPU can calculate the transparency according to the correct relationship. This feature completely simulates transparent behavior in the real world. The OIT sequence is unrelated and transparent. The Atomic operations and append buffer in direct compute are used to complete per-pixel fragment lists and sort within the GPU, the performance and accuracy are greatly improved compared with the traditional alpha blending.
Simple Alpha transparency makes the effect messy; OIT transparency makes the object outline skeleton very clear
The displayed Mecha mainly displays the OIT (Order-independent transparency Transparent sequential sorting) technology. In the past, the overlapping processing of transparent objects in DirectX was very complicated, because the hierarchical relationships of transparent objects were quite complex, especially when smoke, fire, water, glass, and other things were mixed together, it is difficult for a program to determine the hierarchical order of objects.
These were previously manually designated by programmers, and some transparent objects such as smoke and flame do not have strict shape models, and the changes were quite rapid, processing these transparent objects consumes considerable effort and performance from programmers and GPUs.
Microsoft introduced the OIT Technology in DirectX 11 to achieve rapid mixing of multiple transparent objects. Through the OIT technology, it can handle multiple disordered transparent processing, this enables fast and correct sorting of transparent objects.
In fact, OIT technology is part of directcompute11, and the implementation of OIT's sorting of transparent objects is achieved through directcompute11, And the implementation is very simple. In addition, Microsoft mentioned atomic operations for the first time in OIT. As for the significance of atomic operations, non-graphic programmers do not need to understand them, in addition, Microsoft also mentioned that OIT also provides separate caches to improve the sorting of transparent objects.
In principle, the operation of OIT results can be achieved independently through stream processor units. However, the definition of OIT effect in DirectX 11 requires atomic operations. Basically, among the several new special effects introduced by DirectX 11, all except the surface subdivision have the compute shader technology and atomic operation. In Microsoft's instruction, OIT follows the cs5.0 path, which means that the atomic operation performance will greatly affect the final performance of OIT.
In fact, many test results also show that the OIT effect can indeed be significantly benefited from the optimization of the storage system, especially the cache. Atomic operations rely heavily on Cache and shared memory. If it is implemented by traditional means, it must be a layer-by-layer texture. operations such as this sequence comparison and ignoring must come with atomic operations + series and reduction. As with memory control, atomic operations must be implemented through fixed units. The shader is only a memory generator and cannot implement the control function.
The release of DirectX 11 has brought a lot of exciting new features. The use of computer shader for graphics post-processing, multi-thread rendering, out-of-order transparency, and other functions have never been imagined by GPU hardware before. These features mark the evolution of the GPU, ending with an enhanced performance to get larger floating point throughput to adapt to the increasing number of triangles and more complex shader commands, instead, we use advanced rendering technology combined with GPU universal computing to implement complex effects.
Looking back at the replacement process of DirectX, almost all of them have a disruptive impact on the GPU architecture, most of which require the GPU to change the existing shader cell structure, or to add resources to the shader unit, these improvements aim to increase the number of GPU commands, increase the number of registers, increase the texture size, and increase the texture accuracy. This improvement will inevitably lead to an increase in the number of transistors, which means that each shader unit inside the GPU has become larger.
During this DirectX 11 upgrade, the following features are worth noting:
● The version of The Shader Model is upgraded to Shader Model 5.0. It adopts the object-oriented concept and supports double precision data.
● Tessellation surface subdivision technology was officially supported by Microsoft and gradually matured;
● Multithreading multi-threaded processing, making graphic processing no longer embarrassing to the multi-threaded programming environment;
● Put forward Microsoft's compute shader general-purpose computing concept, pushing GPU general-purpose computing to a new peak;
● The New Texture compression texture compression solution saves hardware resources in an environment with extremely low image quality loss.
Although the hyper-threading concept has been developed in the CPU field for decades, most programmers have not started to care about parallelism of programs until many core CPUs have become popular in recent years, before that, most common code was a simple single thread. It was very difficult to find and mine the performance improvements brought about by multithreading in these codes.
To change this situation, the DirectX 11 feature also includes the support for multi-threading ). Yes, no matter DirectX 10 or DirectX 11, all the color information will eventually be raked and displayed on the Computer Display (whether in linear or synchronous mode ), however, DirectX 11 has added support for multithreading technology.
Multithreading changes from DirectX 10 to DirectX 11
Thanks to this, applications can synchronously create useful resources or manage statuses, and send extraction commands from all private threads, which is undoubtedly more efficient. This multithreading technology of dx11 may not accelerate the graphics subsystem (especially when our GPU resources are limited), but it can improve the efficiency of thread-based game startup, in addition, the potential of increasing the number of desktop CPU cores can be utilized.
Multi-threaded rendering 1
In DirectX 11, Microsoft divided a single Direct 3D device into three independent interfaces: Device and immediate context) and deferred context ).
Multi-threaded rendering 2
All three are distributed to independent threads, and the device and deferred context can allocate multiple threads to send tasks waiting for execution to the immediate context or rendering thread. Such a design can pre-access the resources required for Graph Generation. At the same time, the CPU can also use the multi-thread processing of the video card to speed up DirectX processing, reduce the CPU response time, so that the game is no longer limited by the CPU bottleneck.
Microsoft introduced the DirectX compute concept in DirectX 11. In fact, this concept exists in many applications before. DirectX compute is enhanced in the field of general computing, further enhancing the kinetic energy of General GPU computing. Opencl is an API standard for general GPU computing. It provides parallel computing APIs and an extended programming language. DirectX compute enhances General GPU performance, it is not a competitive relationship with opencl. Instead, it further enhances the practicality of opencl-DirectX 11 improves GPU general-purpose performance, and the general-purpose computing program based on opencl will be more efficient.
The DirectX compute concept includes the compute shader technology we will introduce next. In fact, we can understand that they are one technology and two naming methods. Therefore, we will directly refer to this compute shader technology, which enables general GPU computing to assist in graphic processing.
GPU is a graphics processor. In the past, GPU general-purpose computing requires programmers to disguise the data as a GPU-recognizable image, and then convert the image output by the GPU into the desired result, by using the compute shader general computing in dx11, any type of data (even non-graphic data) can be directly computed, and can be written and written at any time without the constraints of the graphic rendering process, GPU general-purpose computing performance has improved a lot.
Due to the powerful floating point computing capability of GPU, the technology that supports GPU for general-purpose computing is developing rapidly. nvidia and AMD have Cuda and stream technologies respectively, and the previous two companies have their own battles, now Microsoft has seen the dawn of General GPU computing. In dx11, the compute shader technology is added to unify the current general computing technology. You can think that the compute shader standard is the open Cl proposed by Microsoft.
Compute shader graphic Assembly Line
The main features of compute shader include inter-thread data communication, a set of basic units for Random Access and stream I/O operations, which can accelerate and simplify existing technologies such as images and post-processing performance, it also prepares for the new technologies of dx11 hardware, which is of great significance for Game and application development.
With the help of DirectX 11 and CS, game developers can bypass complex data structures and run more general algorithms in these data structures. Like other fully programmable dx10 and dx11 pipeline phases, CS will share a set of material resources (I .e., coloring processors ).
Compute shader can be used in many places. In the game, GPU can be used for light tracing, A-buffer sampling anti-aliasing, physical effects, AI, and other special game effect operations. In addition to games, programmers can also use the CS architecture for image processing and post process.
Compute shader technology is a new feature of Microsoft DirectX 11 API. With the help of compute shader, programmers can directly use GPU as a parallel processor. GPU not only has 3D rendering capabilities, it also has other computing capabilities, that is, the concept of gpgpu and physical acceleration.
Computer shader plays an important role in graphic computing
In Figure 1, compute shader performs post process. The image is a game in Metro 2033. Using compute shader Technology for depth-of-field processing can improve efficiency. Figure 2 shows the use of compute shader Technology for IA artificial intelligence. Figure 3 shows Cuda or the future compute shader with Optix Technology for ray tracing. Figure 4 shows a typical general-purpose computing application with high requirements on shader performance.
The compute shader technology is actually a bridge between General GPU computing and traditional graphics processing. In the future, more special features will be implemented through GPU universal computing. With this technology, the stream processor unit in the GPU can become a CPU-like Computing Center, processing a series of such as post-rendering, image quality enhancement, high-quality shadow filtering, depth of field effects, and advanced ambient light masking effects.
The original intention of computer shader is to use general computing methods for post-processing. Due to the powerful floating point computing capability of GPU, the technology that supports GPU for general-purpose computing is developing rapidly. nvidia and AMD have Cuda and stream technologies respectively, and the previous two companies have their own battles, now Microsoft has seen the dawn of General GPU computing. In dx11, the compute shader technology is added to unify the current general computing technology. You can think that the compute shader standard is the open Cl proposed by Microsoft.
Depth of field is an important feature in the human visual system. When the human eye is imaging the real world, the focal length is automatically adjusted to adapt to different scenic distances, and the objects under the eyes are placed on the focus plane, so they are clearly imaged on the retina; the imaging of objects outside the focus plane is blurred. The focal length, diameter, and distance of the lens (pupil) determine the Blur degree of the object during imaging. Starting from the Shader Model 2.0 standard, the depth of field processing has been executed by GPU hardware. In the DirectX 11 era, the execution efficiency has been greatly improved with the support of compute shader technology.
In the past, almost no depth-of-field effects were introduced in virtual reality systems, and the entire scene was clearly displayed after imaging. In this way, the entire scene was not realistic and natural, in addition, the depth implied by the depth of field is missing. The depth-of-field effect helps to synthesize three-dimensional photos and relieve eye fatigue in virtual reality systems, enhancing the sense of realism and immersion of scenes.
Call of Duty 4 game cover
In principle, CS technology can handle all the results. However, the most commonly used method is DOF (depth of field), that is, the depth of field effect. HDR (high-Dynamic Range) high-dynamic light rendering currently does not require CS technology, but it is very cost-effective to use CS technology to achieve these effects. The depth of field is mainly suitable for first-person games, such as call of duty. The depth of field effect allows players to immerse themselves in games with film-level special effects, efficient processing of depth of field will become an important part of implementing real game special effects.
Compute shader can be used in many places. In the game, GPU can be used for light tracing, A-buffer sampling anti-aliasing, physical effects, AI, and other special game effect operations. In addition to games, programmers can also use the CS architecture for image processing and post process.
Shortly after the release of the dx11 video card, amd released a dx11 demo named Ladybug, with a demo of the depth of field effect.
After the hardware supports compute shader, the corresponding hardware must be more flexible than the contemporary hardware, because when running the CS code, hardware must support random read/write and irregular arrays (instead of simple fluids or fixed-size 2D arrays) multiple outputs, the ability to directly call individual or multiple threads, 32 KB shared storage space and thread group management systems, granular data instruction sets, synchronous construction, and executable unordered Io as per the programmer's needs computing capability.
With direct compute technology, DirectX 11 can achieve order independence and transparency. That is, no matter the object is placed in any order, the GPU can calculate the transparency according to the correct relationship. This feature completely simulates transparent behavior in the real world. The OIT sequence is unrelated and transparent. The Atomic operations and append buffer in direct compute are used to complete per-pixel fragment lists and sort within the GPU, the performance and accuracy are greatly improved compared with the traditional alpha blending.
Simple Alpha transparency makes the effect messy; OIT transparency makes the object outline skeleton very clear
The displayed Mecha mainly displays the OIT (Order-independent transparency Transparent sequential sorting) technology. In the past, the overlapping processing of transparent objects in DirectX was very complicated, because the hierarchical relationships of transparent objects were quite complex, especially when smoke, fire, water, glass, and other things were mixed together, it is difficult for a program to determine the hierarchical order of objects.
These were previously manually designated by programmers, and some transparent objects such as smoke and flame do not have strict shape models, and the changes were quite rapid, processing these transparent objects consumes considerable effort and performance from programmers and GPUs.
Microsoft introduced the OIT Technology in DirectX 11 to achieve rapid mixing of multiple transparent objects. Through the OIT technology, it can handle multiple disordered transparent processing, this enables fast and correct sorting of transparent objects.
In fact, OIT technology is part of directcompute11, and the implementation of OIT's sorting of transparent objects is achieved through directcompute11, And the implementation is very simple. In addition, Microsoft mentioned atomic operations for the first time in OIT. As for the significance of atomic operations, non-graphic programmers do not need to understand them, in addition, Microsoft also mentioned that OIT also provides separate caches to improve the sorting of transparent objects.
In principle, the operation of OIT results can be achieved independently through stream processor units. However, the definition of OIT effect in DirectX 11 requires atomic operations. Basically, among the several new special effects introduced by DirectX 11, all except the surface subdivision have the compute shader technology and atomic operation. In Microsoft's instruction, OIT follows the cs5.0 path, which means that the atomic operation performance will greatly affect the final performance of OIT.
In fact, many test results also show that the OIT effect can indeed be significantly benefited from the optimization of the storage system, especially the cache. Atomic operations rely heavily on Cache and shared memory. If it is implemented by traditional means, it must be a layer-by-layer texture. operations such as this sequence comparison and ignoring must come with atomic operations + series and reduction. As with memory control, atomic operations must be implemented through fixed units. The shader is only a memory generator and cannot implement the control function.