I _dovelemon
Date: 2014/8/31
Source: csdn blog
Article: GPU hardware architecture
Introduction
In 3D graphics, the emergence of programmable rendering pipelines is undoubtedly a pioneering work. In the following article, we will briefly introduce the hardware architecture of vertex shader and pixel shader, the most important of today's programmable rendering pipelines, and how to write shader using assembly languages.
Vertex shader
On the hardware, all vertex shader operations are performed in a vertex arithmetic logic unit (ALU. Therefore, we can easily use the assembly language to operate vertex Alu. The following figure shows the structure of vertex ALU:
From the above we can see that in addition to the central ALU, there are many ancillary registers that will be used as the input and output of Alu. There are several different registers. Next we will explain their role in sequence and how to use them.
The 16 registers from V0 to V15 correspond to the vertex format descriptor or fvf descriptor defined in DirectX. In the GPU, each register is bits, that is, four 32-bit float data, that is, a vector. Therefore, these 16 registers are used as vertex ALU input registers. How to input, we need to define the vertex format, and declare in the shader which register accepts which vector. How can I understand this sentence? In fact, when we define the vertex format, if you are familiar with DirectX, we will know which vector in a vertex is used for what purpose. We will use d3dusage_noral0, d3dusage_position0 indicates its purpose. Therefore, in the shader, we can use del_normal0 and del_position0 to declare which register to accept the data. In this case, you should understand it!
In addition to the above V0 to V15 registers, there are also some registers used to accept constants. These registers are the Registers starting with C in the upper right corner. These registers are called constant registers. From this name, we can know that the constant data stored in the constant registers, that is, vertex ALU can only read the constant registers, there is no way to perform write operations. These constant registers are used to accept the additional constant data we input from the application and will be used in vertex shader, such as world transform matrix, transform the model coordinates to the matrix of the cropped space, or the position, color, and material of the light source. In short, the data of all constants you may use can be set through applications. After the data is set, we can simply use the data to complete what we want to do.
The two registers described above are input registers for vertex Alu. So where should the vertex ALU computing result be stored?
The reader can see that there is a row of registers in the lower right. These registers are the output registers of ALU, that is, the calculation results of ALU should be input into these registers. The preceding color0 and color1 are called 0d0 and 0d1 registers respectively. od0 stores the diffuse color of the vertex calculated in vertex shader, and 0d1 stores the specular color of the vertex. However, this is not always necessary. In the end, they are all just registers, only data is stored, and how to explain it depends on yourself. That is to say, you can keep other values in these registers, but it is best to follow the rules of the shader version. The results of world * view * Projection Transformation on model vertices are stored in a register named opos. Note that this Register specifies that only the coordinates of transformed vertices can be stored, and cannot be used for other purposes. In fact, we only need to fill in the data according to the above explanation. to communicate between vertex shader and pixel shader, there are the following ot0 to OT7 registers available. In ot0, the texture coordinates corresponding to the vertex are generally stored, so that the texture coordinates can be passed to pixel shader. In addition to this, we can use all the remaining registers as we want. If you have experience writing HLSL, you should know that in pixel shader, you can use: texcoord0,: texcoord1 to declare these things, therefore, this corresponds to the ot1 to OT7 registers on the hardware.
With the input and output registers, we also need to temporarily Save the variables in the ALU calculation process. These registers start with R in the lower left corner and are called temporary registers for receiving temporary data computed in vertex shader.
The above is a brief introduction to vertex Alu.
Pixel shader
ALU is included in vertex shader for processing, so ALU is also available in pixel shader for processing. The following figure shows the hardware architecture of pixel ALU:
In fact, its architecture is very similar to vertex Alu. Registers starting with C or constant registers are used in the same way as vertex Alu. Registers starting with R are also temporary registers, but there is a difference that R0 stores the color values of the pixels to be finally calculated, that is, this is the only output register of pixel Alu. V0 and V1 correspond to od0 and od1 in vertex ALU respectively. What data have you imported in vertex shader? The same rule is used in pixel shader to explain them.
In addition to the above registers, there are several registers starting with T on the left. Do you still remember the function "device-> settexture? The declaration of this function is as follows:
HRESULT SetTexture( [in] DWORD Sampler, [in] IDirect3DBaseTexture9 *pTexture);
In the DirectX SDK documentation, the first parameter is explained as follows:
It indicates the number of the probe. different textures are bound to different samplervers. In a programmable rendering pipeline, We reference the texture through the probe number.
That is to say, different textures can be bound to different samplervers and then used in pixel shader. This is why we can implement multi-texture.
However, the registers starting with T can not only be used in this way. Remember in the vertex ALU above, we once said that the way to communicate between vertex shader and pixel shader is through the oT0-oT7 of these registers. That is to say, ot0 and OT7 correspond to t0 to T7 (Oops! There is no T7. This figure is just a conceptual diagram. For details about the number of registers, you need to check the descriptions in the d3dcaps attribute ). If we want to transmit some data to the pixel shader in vertex shader, we can save the data in 0t1 or another register, then, access the data through specific commands in pixel shader and obtain the data to complete the work we need.
Well, the above is the content about the GPU hardware architecture today. Moore's law in software engineering says that every 18 months, computer hardware performance will be reshaped. This shows that the hardware architecture method is always changing. The architecture here may not be in line with the GPU architecture you are currently using, but the basic principles and settings are the same.
I hope that through this article, you can understand how shader works and hope to help you!
Here, I would like to explain the structure of the GPU. How can I use the assembly language to operate the GPU and the above registers? Go to the msdn official website to read information about the ASM shader reference.
Zfxengine Development notes-GPU hardware architecture