Direct3d 10 System (5)
5.3 resource mapping and accessOne of the complex issues with API and pipeline design is how to share resources between CPU and GPU. For example, both direct3d and OpenGL allow the vertex buffer to be mapped to the address space of the program, regardless of whether the buffer is allocated to the system memory or the video memory. However, the allocation location will cause a huge performance difference. On Modern Graphics accelerators, the bandwidth between the video memory and the accelerator may exceed 50 Gb/s, while the PCI-E can only provide 2.8 Gb/s bandwidth between the system and the GPU. There are also some other problems. For example, if the CPU caches the accessed data, the performance may also be significantly different. At the same time, whether to write-combine may cause the same difference. In addition, when the GPU accesses resources, to improve space consistency, the method for loading resources may be form row-order to another (E. g, Morton, boustrophedon, or PI orders), or tiled [BlinN 1990; hakura and Gupta 1997; igehy et al. 1999]. We believe that these methods are essentially designed to ensure that the tiling pattern is transparent to the application. Therefore, when resources are mapped to the CPU, they should be organized in a linear manner. Because the split mode also depends on the access mode, there is a difference between 2D and 3D textures, and this difference is not only caused by the difference between CPU and GPU. Due to the important impact on performance, we try to expose all these features, but due to its complexity, it is difficult for applications to truly improve the performance. I used a simple model and tried my best to provide as many features as possible. We classify writers and readers of specific resources, such as E. g, GPU vs. CPU, and read vs. Write. If at first only one customer accesses resources, the problem becomes simple, because resources can be allocated in a favorable way for major customers. Similarly, it is helpful to predict that resources will be used for reading or writing. Fortunately, such a description can cover most typical applications. For example, the rendering target and texture are mainly accessed by the GPU and are restricted to write targets and read resources respectively. On the other hand, the use of vertex buffering is complicated. Although static aggregates are mainly read by GPUs, dynamic ry is often generated by the CPU and processed by the GPU as part of the animation. These operations will cause frequent CPU writes and GPU read operations. Direct3d 10 divides resources into three categories based on the Usage type: Default, immutable, dynamic, and staging. Default corresponds to a relatively simple texture, rendering target, or a static vertex buffer accessed only by the GPU. The initialization of a default type resource usually requires copying the data of another resource. Immutable resources cannot be copied. However, another method is provided for initialization during creation. The default and immutable types of resources cannot be mapped to the application address space, so that the CPU can access them. Dynamic resources can be used not only in pipelines, but also mapped to the CPU for write-only operations. It is suitable for generating vertex data or video decoding. Finally, a staging resource only allows the CPU to access it, but can replicate its data. Staging resources are useful for initializing or obtaining resources that only the GPU can access. To check the location where the resource can be bound to the pipeline, the resource layout and encoding method are verified during Resource Creation. These types include vertex buffer, index buffer, constant buffer, shader Resource (texture), output stream buffer, rendering target, or depth/stencer buffer. This classification has two purposes: to provide the driver with resource layout information and simplify error checks when using resources.
5.4 HLSL 10High-level coloring languages are widely used and quickly accepted by people, which undoubtedly shows the importance of this language. In order to support the features of the new pipeline, we also put forward some new goals for the high-level coloring language HLSL. Simply put, we want application developers to use HLSL for efficient development without having to understand the complex details of virtual machines, such as Register names or constant buffer indexes. We refine the target to the following points: 1. the application does not need to understand how resources are configured and allocated. 2. Bind-by-position is used as the main binding mechanism, rather than the current bind-by-name. 3. Programmers no longer need to write intermediate (assembly) language code. The first goal is to solve the following problem: in the current system, application developers need to learn to control the parameter layout in the constant storage space. Developers need to perform global allocation and placement for multiple shader to share certain variables among multiple shader. By adding multiple constant buffers in each pipeline stage, we believe that the compiler has enough information to automatically layout the buffer. Of course, programmers still need to control the operation of allocating parameters to constant buffering. We have extended the language and allowed the buffer name to be declared as part of the parameter. The second goal is the change of design ideas, which is mainly related to performance and future development. Bind-by-name is mainly used to match the input and output data between multiple shader, so that the layout of vertex shader matches with vertex shader. Although the name matching operation between the source data and the target data can be performed more efficiently during the runtime, and the source-target pair cache can be implemented, we feel that these operations only bring unnecessary complexity, and add additional load for the runtime. The new system will change in multiple aspects. The output and input of the shader are
Signature (
Signature
)This is similar to the original form of the C function. The pipeline is valid only when the output of the current phase is compatible with the input of the next phase. Compatibility means that the input and output correspond to element-by-element. Here, we allow the pipeline in the next stage to not use the output data of the trailing in the last stage. Bind-by-postion affects the vertex buffer binding in the IA and so phases. However, for these stages, we will create an independent object to encapsulate (encapsulate) binding, so that a costly matching operation will only run once during creation. The third goal is controversial. It indicates that our implementation does not support using shader written in intermediate languages as input. We believe that the development of the coloring program has reached a certain degree of complexity. Therefore, handwriting Il is difficult to be more efficient than the code produced by the compiler. In addition, when we improve optimization techniques, connections, and interactions with drivers, we cannot guarantee support and compatibility with handwritten il code. As a diagnostic technology, the system will support the compiler to generate intermediate code as output. However, we do not allow application developers to modify the compiler output and inject it into the runtime. There are many questions about how to optimize the performance of the code generated by the compiler. First, it is the scope of optimization. The driver may allow optimization when converting the intermediate language into a specific machine language. As the complexity of the shader increases, developers must have full control over the optimization, and changing the operation execution sequence is very important. In particular, it is necessary to ensure the invariance of key code. The multi-pass algorithm should be able to generate the same median value so that these values can be copied to the scattered shader. We have considered several schemes to specify the median value in the source code. For example, we need to compile a subroutine in a specific way, regardless of whether the subroutine is internally connected. However, the study eventually led us to select a more common method: Use the optional and well-defined optimization level related to the driver compiler. Please note that our preferred model is to compile the HLSL code when writing the shader, and compile the Il code through the driver when running the program. This aims to reduce the time it takes to compile the shader when the program is running. However, compile HLSL into Il at runtime.
5.5 HLSL-FX 10We have noticed that the success of the programming pipeline has changed people's point of view. The shader program is not only a part of the engine, but also one of the artist's creation tools. To adapt to this application, the effect (FX) system has extended HLSL, allowing it to initialize fixed feature parts of the pipeline. This is similar to the methods described in cgfx and CG. Although these methods have a common basis, HLSL-FX is innovative. Our goal is that FX must first meet the needs of real-time operation, and secondly, as a tool for content creators. For some historical reasons, the two are different in many aspects. Creative tools often sacrifice performance in exchange for flexibility, while our runtime puts performance first. Through our accumulated experience and efforts, the FX system has been fully improved in terms of ease of use and convenience of performance. Final FX, HLSL, API, runtime, and pipeline are all closely integrated as complementary solutions. We also changed frequent status operations and separated the name search and matching operations. Let's discuss how to handle status changes again. One way to build an application is to render a series of ry, and each ry has its own pipeline configuration (an effect ). The shader parameter, texture binding, and other fixed functions in constant buffer are set to pass the parameter to effect. To maximize performance, the application should use an effect to draw all objects. This is a traditional state sorting solution in the scenario management system. However, for an effect, there may be several levels of parameters. For example, the current time and observation point belong to the per-frame state, and the texture or vertex data is the static state of the role; the position and posture are the dynamic state of the object, and so on. We use a separate constant buffer to store parameters at each layer of the shader. when an object is drawn, it is directly bound to the constant buffer that saves the static parameters, the buffer for saving dynamic parameters must be updated before binding. In practical applications, applications cannot always sort objects by effect. There may also be other mechanisms that control plotting, such as the distance and transparency of objects. We have fully reduced the cost of changing the status of the direct3d 10 system. Therefore, it is very efficient to reconfigure the entire pipeline. 6 system experience (Omitted) 7 Future Work (Omitted) 8 Conclusions (omitted)
~~~~~~~~~~~~~~~~~~~~~~~~ Complete the full text ~~~~~~~~~~~~~~~~~~~~~ Finally, the main part of the call is finished, and there are not many mistakes to understand. The last three parts are summative and will not be turned over.
Recently, I am quite confused ~~~. Work for money or work for what I like?