Graphics Pipeline Tour Part 1

Source: Internet
Author: User

Original: "A trip through the Graphics Pipeline 2011" Translation: Sword of the past Reprint Please indicate the sourceYou can find many features of the PC graphics stack, but it's usually unclear. I will try to avoid the details of the hardware section to fill in these blank knowledge points. I'm going to talk about the hardware that runs D3D9/10/11 's dx11 interface on Windows, because the stack details that happen on the PC are more familiar to me than the API. This first section will tell you a lot about the local instructions we actually perform on the GPU. ApplicationThis is your code section and includes various bugs, yes, the runtime APIs and drivers have bugs, but not both, and are now ready to fix them. run-time APIsThat is, through API resource creation/status setting/draw call, etc., the runtime API tracks the status of application settings, validates parameters, handles errors, consistency detection, manages user visible resources, or verifies shader code and link shader (in D3D, OpenGL is done on the drive layer, it may also be processed in batches for more tasks, and then all things are handled in the graphics driver-more precisely, in user-mode drivers (User-mode driver) User-mode driver (UMD)This is the most mysterious part of CPU-side hiding, if your application crashes due to some API calls, that's usually the result:). It may be "Nbd3dum.dll" (NVidia) or "Atiumd*.dll" (AMD).   This is the user-mode code, as the name suggests. It runs in the same context and address space as your application, and there's nothing special about it. It implements the underlying API (DDI) that is D3D. This API is similar to what you see on the surface, but it is a bit different in memory management.   This module occurs during shader compilation. D3D pass a pre-calibrated shader token to UMD, i.e. the code has been checked, is syntactically correct, conforms to the D3D constraint (refers to using the correct type, not exceeding the available texture/sampler, not exceeding the available constant buffers, etc.). This is compiled from HLSL code, usually with quite a lot of advanced optimizations (various loop optimizations, elimination of useless code, constant passing, predictive branching, etc.)-this is good, and a lot of optimizations during the compilation phase are helpful. However, there are a lot of bottom-level optimizations (such as register allocation and loop expansion) that are driven by the driver. Long story short, it is usually first converted to an intermediate representation (intermediate Representation,ir) and then compiled. Shader hardware instructions are close to D3D bytecode (the HLSL compiler has helped to do a high degree of optimization), but there are still some underlying details (such as hardware resource constraints and scheduling constraints) that D3D don't know or care about, which is not an important process.   Of course, if your app is a well-known game, NV/AMD programmers might look at your shader and give the pin for hardware-optimized shader. These shader can be detected and replaced by UMD and are very friendly.   More interesting: Some API states may actually eventually be compiled into shader-for example, relative to some infrequently used features, such as texture borders (texture borders) may not be implemented into the texture sampler, However, the implementation is simulated with additional shader code (or is not supported at all). This means that sometimes the same shader has multiple versions that correspond to different API states.   By the way, that's why you often see a delay when using a new shader for the first time. A lot of creation/compilation work is driven by deferred execution, and only when it is actually used (you simply won't believe how many useless resources some applications create). One thing the graphics programmer knows--if you want to make sure something is actually created (not just in memory), you need to execute a virtual draw call to make it"Wake Up" (Original: warm it up). It's annoying, but it's always been the case since I first started using 3D hardware in 1999, and it's an indisputable fact that you have to adapt to it:   Keep on talking. UMD can also handle some interesting things, such as D3d9 legacy shader version and fixed pipeline--yes, all D3D functions are supported. 3.0 shader profile is not bad (actually quite reasonable), but 2.0 is a bit confusing, a variety of 1.x shader versions that's even worse-remember 1.3 pixel shader? and a fixed vertex line with vertex lighting in  ? Yes, modern graphics cards support all D3D features, although they are currently only translated to the new shader version (this has been done for a long time).   also has memory management. UMD to get to the content of the texture creation directives and need to allocate space for them. In fact UMD just further allocates some large chunks of memory from KMD (kernel-mode driver, Kernel-mode Driver), actual mapping and non-mapped pages (management UMD visible video memory, conversely, system memory that the GPU can access) is the prerogative of KMD, UMD cannot do.   However, UMD can do, like a recombinant texture (swizzling textures) (except that the GPU can be executed in hardware, typically using a 2D block transmission unit instead of a true 3D pipeline) and the transfer schedule between the system memory and the mapped video memory. Most importantly, once the KMD has been allocated and handled well, UMD can also write command buffer (or "DMA buffer"-I will use this name interchangeably). Command buffer, which contains various directives. All state changes and drawing operations will be converted to hardware-aware instructions via UMD. Many things do not need to be triggered manually-such as uploading textures and shader to video memory.   Generally, the driver tries to do as much as possible in UMD, UMD is the user-mode code, so the part that runs here does not require expensive kernel-mode conversion, it can allocate memory freely, send multiple threads to work, etc. This is just a regular DLL (although the API is not directly provided by the application). This also facilitates driver development-if UMD crashes, the application crashes, but the entire system crashes, and when the system is running, it can be replaced (it is just a DLL), can be debugged by a regular debugger, and so on. So, this is not only very effective, but also very convenient.   There's one more point I didn't mention.   Did I say "User-mode Driver"? I'm talking about "User-mode Drivers". As previously mentioned, UMD is just a DLL. Well, with the help of D3D, it leads directly to KMD, but it's still a normal DLL, running in the address space of the calling process. However, we have been using a multitasking system for a long time now. What have I been talking about on the GPU? is a shared resource. Your main display is only one (even if you use sli/crossfire). However, we have multiple applications that try to access it (assuming that only they do). This does not work automatically. In the past, the workaround was to give only one 3D application at a time when the application was activated, with all other inaccessible. But if you try to make your Windows system GPU-rendered, that's not going to work. This is why you need some components to arbitrate access to the GPU, and allocate time slices and so on. Enter the SchedulerThis is a system component. I'm talking about the graphics scheduler, not the CPU or IO scheduler. Specifically, it determines when to access the 3D pipeline that you want to use between different applications. Some GPU state transitions (generating additional commands to command buffer) can occur with context switches and may also exchange some resources inside and outside of the memory. Of course, only one process at a specified time can actually commit commands to a 3D pipeline. You will often find that the terminal programmer complains that the 3D API on the PC is too high, not controllable, and consumes performance. However, compared to the terminal game, the 3D api/driver on the PC really has too many complex problems to solve-they need to track all the current state, because the problem can be found at any time from the bottom! They also have to be well-managed to make the application, trying to fix the performance issues behind it. This is a very annoying thing, no one likes it, and of course includes the driver author themselves, but actually stands in the business sense that people want the app to continue running (and run smoothly). Just yell "wrong" to the app! "And then sulk and slowly check that you are not making friends." Pipeline Tour Next station: kernel mode! kernel mode (KMD)This part is actually handled by the hardware. There may be multiple UMD instances running at the same time, but there will always be only one KMD, and if KMD crashes, your program will be finished-it used to be a blue screen, but now Windows knows how to kill the crashed-driven process and reload it.  Although it just crashes, not kernel memory is contaminated, but once it happens, everything is gone. KMD handles everything at once, although multiple applications compete to use it, but only one copy of GPU memory. Some programs actually call for physical memory. Similarly, some programs must initialize the GPU at startup, set the display mode (get information from the display device), manage the hardware mouse pointer, set up a hardware monitor, reset the GPU if there is no response within a certain time, and so on.   That's what KMD does. There are also video player DRM format content protection, and GPU decoding pixels are not visible to illegal users, which can cause some bad things, such as forwarding to disk. KMD will also be involved in these things. The most important thing for us is that KMD manages the actual command buffer. You know, this is the actual consumption of hardware. The command buffer handled by UMD is not real-they are just random memory fragments of the GPU that can be accessed. In fact, the processing is done by UMD, committing them to the scheduler, then waiting for the processing, and then passing UMD's command buffer to KMD. Then, KMD writes the calling instruction to the main command buffer, and then the GPU instruction processor can read the main memory, the DMA access memory may be performed first. The primary command buffer is usually a (rather small) ring buffer structure--only the system/initialization instructions that are written and the actual 3D command buffer are called. But this is just a memory buffer, and the graphics card has to know where it is-usually there is a read pointer to the GPU side in the main command buffer and a write pointer to the buffer where KMD has been written (or more accurately, how much has been written to the GPU so far). KMD periodically updates these memory-mapped registers (usually every time a new work block is committed) ... Bus  Do not directly access the video card (unless it is integrated on the CPU), because it needs to pass through the PCI Express bus first, DMA transmission is also the route to go. This part will not tell for a long time, but it is also one of the sites of our graphical pipeline journey. Directive Processor  This is the front-end of the GPU-where the KMD write instructions are actually read. I will continue in the next article, this article has been written a bit long:) Small Narrator: OpenGL  OpenGL and I described above are similar, except for the difference between API and UMD layer. Unlike the D3D,GLSL shader compilation cannot be manipulated through the API, it can only be done in the driver. This is a bad place, there are many GLSL platforms, like 3D hardware vendors, they only implement the same specifications, but have their own bugs and problems. This means that the driver has to do all the optimization on its own. D3D bytecode format At this point the solution is simple-only one compiler (so there are few incompatibilities between vendors), and data flow analysis is possible.

Graphics Pipeline Tour Part 1

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.