Graphics system in "original" Linux environment and AMD R600 graphics programming (one-by-one)--r600 instruction set

Source: Internet
Author: User

1 Low-level coloring language Tgsi

The OpenGL program uses the GLSL language to program the programmable graphics processor, the GLSL language (the following high-level coloring language refers to GLSL) is a high-level language of syntax similar to C, in the GLSL specification, GLSL language is first translated into the teaching of low-level class assembly language, and then translated into a hardware-specific instruction set. The OpenGL System Management committee passed two official extensions in June 2002 and September 2002: Arb_vertex_program and Arb_fragment_program to unify support for low-level coloring languages, The GLSL language is compiled into low-level shading languages for both extensions (so the two extensions can be seen as GLSL running virtual machines), and the graphics manufacturer's drivers translate the low-level shading language into GPU instructions. The 1.0 versions of the two extensions are arbvpl0 and arbfpl0, respectively, and the 2.0 versions of the two extensions are ARBVP20 and ARBFP2.

Figure 1

Currently, in Mesa, GLSL is first translated by the compiler into the Tgsi intermediate language, and then the graphics card specific driver compiles the Tgsi language code into GPU instructions, which is shown in 1 (regardless of geometry shader and tesselation shader). Here is a detailed description of the TGSI. This master's thesis, "Advanced coloring Language and its optimized compilation", has a more detailed discussion on GLSL and low-level coloring languages.

2 R600 Instructions

Programs running on the CPU, all the fetch instructions and operation instructions are executed in the stacking order of the code (regardless of instruction set parallelism), and if there is a jump instruction, jump to the appropriate location.

GPU is mainly used for operations, the early GPU does not contain complex control procedures, R600 shader programs and CPU programs are relatively humble, there must be a special code to indicate the order of execution of the program, and to indicate the sequence of the program run instructions (Control Flow directive, later called CF Directive) , the operation instruction (hereinafter called the ALU Instruction), and the fetch instruction (called the R600 in the GPU) must be stored by category, the same type of instruction is put together, different types of instructions are stored in some order, the same type of instruction (not including CF instructions) constitutes a clause. Figure 2 shows the form in which the R600 GPU shader program is stored in memory.

Figure 2

The R600 directive contains control Flow (hereinafter referred to as CF) instruction, ALU (operation) instruction, Vertex FECTH (fetch vertex) instruction, and texture fetch (take texture) instruction. The format of the instruction is called Microde format.

Each Shader program (Pixel Shader or vertex Shader) consists of two parts, one CF instruction and the other part clause. These clause are initialized by the CF instruction (or improperly understood as clause called by the CF Directive). The format of each instruction of the R600 (in order to maintain consistency with the terminology in the manual, the word microcode format will be used later) contains 2 or 4 DWORD (CF and Alu for 2 Dword,vertex fetch and texture fetch is 4 DWORD, followed by the address is in DWORD, these microcode format can be found in the "R600 Family instruction Set Architecture" manual.

The following two instances are used to illustrate the R600 instruction set. The equivalent of the following two instances of the GLSL program is probably this way:

Vertex shader

Attribute Vec4 a_position;

Attribute VEC3 a_texture;

Varying VEC2 V_texcoord;

void Main ()

{

Gl_position = a_position;

V_texcoord = a_texture;

}

Pixel shader

Uniform Sample2d Sampler;

Varying VEC2 V_texcoord;

void Main ()

{

Gl_fragcolor = texture2d (sampler, V_texcoord);

}

  

3 Vertex Shader Example

Below is a specific example to illustrate, the following program from our R600 exa driver copy process Vertex Shader (refer to the following chapters), according to the requirements of Figure 2, the program is divided into two parts, the first part is the CF directive, a total of four instructions, instructions 0~ Instruction 3 (instruction 3 is an empty instruction, used for alignment), the second part is to take the vertex instruction, a total of two instructions, instruction 4~ instruction 5, respectively, to take the vertex position coordinates and texture coordinates.

int R600_copy_vs (radeonchipfamily ChipSet, uint32_t* shader)
{
int i = 0;

/* 0 Instruction 0 */
shader[i++] = cf_dword0 (ADDR (4));
shader[i++] = cf_dword1 (pop_count (0), Cf_const (0),
COND (sq_cf_cond_active), I_count (2), Call_count (0),
End_of_program (0), Valid_pixel_mode (0), Cf_inst (SQ_CF_INST_VTX),
Whole_quad_mode (0), BARRIER (1));
/* 1 Instruction 1 */
shader[i++] = cf_alloc_imp_exp_dword0 (Array_base (CF_POS0), TYPE (Sq_export_pos), RW_GPR (1),
Rw_rel (ABSOLUTE), INDEX_GPR (0), elem_size (0));
shader[i++] = Cf_alloc_imp_exp_dword1_swiz (src_sel_x (sq_sel_x), src_sel_y (sq_sel_y), Src_sel_z (SQ_SEL_Z),

Src_sel_w (Sq_sel_w), R6xx_elem_loop (0), Burst_count (0), end_of_program (0),

Valid_pixel_mode (0), Cf_inst (Sq_cf_inst_export_done), Whole_quad_mode (0),
BARRIER (1));
/* 2 Instruction 2 */
shader[i++] = cf_alloc_imp_exp_dword0 (array_base (0), TYPE (Sq_export_param), RW_GPR (0),
Rw_rel (ABSOLUTE), INDEX_GPR (0), elem_size (0));
shader[i++] = Cf_alloc_imp_exp_dword1_swiz (src_sel_x (sq_sel_x), src_sel_y (sq_sel_y),
Src_sel_z (Sq_sel_z), Src_sel_w (Sq_sel_w), R6xx_elem_loop (0),
Burst_count (0), End_of_program (1), Valid_pixel_mode (0),
Cf_inst (Sq_cf_inst_export_done), Whole_quad_mode (0), BARRIER (0));
/* 3 Instruction 3*/
shader[i++] = 0x00000000;
shader[i++] = 0x00000000;
/* 4/5 Instruction 4 */
shader[i++] = vtx_dword0 (Vtx_inst (Sq_vtx_inst_fetch), Fetch_type (Sq_vtx_fetch_vertex_data),
Fetch_whole_quad (0), buffer_id (0), SRC_GPR (0), Src_rel (ABSOLUTE),
Src_sel_x (sq_sel_x), Mega_fetch_count (16));
shader[i++] = VTX_DWORD1_GPR (DST_GPR (1), Dst_rel (0), dst_sel_x (sq_sel_x), dst_sel_y (sq_sel_y),
Dst_sel_z (SQ_SEL_0), Dst_sel_w (Sq_sel_1), Use_const_fields (0),
Data_format (Fmt_32_32_float), Num_format_all (sq_num_format_scaled),
Format_comp_all (sq_format_comp_signed), Srf_mode_all (Srf_mode_zero_clamp_minus_one));

shader[i++] = Vtx_dword2 (OFFSET (0),
#if X_byte_order = = X_big_endian
Endian_swap (SQ_ENDIAN_8IN32),
#else
Endian_swap (Sq_endian_none),
#endif
Const_buf_no_stride (0), Mega_fetch (1));
shader[i++] = Vtx_dword_pad;
/* 6/7 Instruction 5 */
shader[i++] = vtx_dword0 (Vtx_inst (Sq_vtx_inst_fetch), Fetch_type (Sq_vtx_fetch_vertex_data),

Fetch_whole_quad (0), buffer_id (0), SRC_GPR (0), Src_rel (ABSOLUTE),
Src_sel_x (sq_sel_x), Mega_fetch_count (8));
shader[i++] = VTX_DWORD1_GPR (DST_GPR (0), Dst_rel (0), dst_sel_x (sq_sel_x), dst_sel_y (sq_sel_y),
Dst_sel_z (SQ_SEL_0), Dst_sel_w (Sq_sel_1), Use_const_fields (0),

Data_format (Fmt_32_32_float), Num_format_all (sq_num_format_scaled),
Format_comp_all (sq_format_comp_signed),
Srf_mode_all (Srf_mode_zero_clamp_minus_one));
shader[i++] = Vtx_dword2 (OFFSET (8),
#if X_byte_order = = X_big_endian
Endian_swap (SQ_ENDIAN_8IN32),
#else
Endian_swap (Sq_endian_none),
#endif
Const_buf_no_stride (0), Mega_fetch (0));
shader[i++] = Vtx_dword_pad;

return i;
}

The process of running the above program is as follows

The process of running the program is described in detail in the following diagram.

Figure 3

Figure 4

Figure 5

Figure 3 shows the initial state, with two threads in the figure, and two threads now processing two vertex data.

    • CF instruction 0, the addr bit of instruction 0 instructs the program to run from the instruction at address 4 (instruction 4), the I_count bit indicates a total of 2 instructions (instruction 4 and instruction 5), after execution back to instruction 0, instruction 0, the End_of_program bit indicates that the program is not finished, Proceed to CF instruction 1.
    • Vertex Fetch CLAUSE,CF Instruction 0 indicates that the program will be executed from the beginning of command 4, instruction 4 and instruction 5 constitute a Vertex Fetch Clause, two instructions to complete the vertex data, here is the copy operation, Vertex data includes the position and texture coordinates of the vertex. Because it is a 2D operation, the effective component of the coordinates here is only two. The vtx_inst bit of instruction 4 indicates that the change instruction is an instruction to fetch data, take vertex (fetch_type) data from buffer_id 0 of memory, SRC_GPR the source register location where the index number is, and the amount of data is 16 bytes at a time (the size of a four-element vector). The extracted data is placed in a register numbered 1 (DST_GPR), dst_sel_x (sq_sel_x) indicates that the X component of the extracted vector is placed at the first DWORD of the destination register, and the Y component is placed in the second DWORD (Dst_sel_y (Sq_ sel_y)), the third DWORD of the destination register is set to 0, and the fourth DWORD is set to 1 (can use 0,1 or 0.5). Directives 5 and 4 are similar, since all vertex attribute data has been exhausted, so the address that originally existed in GPR0 is no longer needed and can be overwritten. Figure 4.
    • CF Instructions 1 and 2, which are two output instructions, the instructions do work 5, instruction 1 is used to output vertex position coordinates (TYPE (sq_export_pos)), this instruction reads data from GPR1 (RW_GPR (1)) and outputs the data to position In Buffer 0 (array_base (CF_POS0)). There is also a swizzle operation on the output, and the swizzle operation of this instruction does not change the component of the vector. The END_OF_PROGRAM flag of Directive 1 indicates that the program is not finished, so continue to execute instruction 2, and the End_of_program bit of instruction 2 indicates that the program has ended (after which the instruction will not be executed if it is written).
4 Pixel Shader Example

/* Copy PS---------------------------------------*/
int R600_copy_ps (radeonchipfamily ChipSet, uint32_t* shader)
{
int i=0;

/* CF INST instruction 0 */
shader[i++] = cf_dword0 (ADDR (2));
shader[i++] = cf_dword1 (pop_count (0), Cf_const (0), COND (sq_cf_cond_active), I_count (1),
Call_count (0), End_of_program (0), Valid_pixel_mode (0), Cf_inst (Sq_cf_inst_tex),
Whole_quad_mode (0), BARRIER (1));
/* CF INST instruction 1 */
shader[i++] = cf_alloc_imp_exp_dword0 (Array_base (cf_pixel_mrt0), TYPE (Sq_export_pixel), RW_GPR (0),
Rw_rel (ABSOLUTE), INDEX_GPR (0), elem_size (1));
shader[i++] = Cf_alloc_imp_exp_dword1_swiz (src_sel_x (sq_sel_x), src_sel_y (sq_sel_y),

Src_sel_z (sq_sel_z),  src_sel_w (Sq_sel_w),  r6xx_elem_loop (0),  burst_count (1),
End_of_program (1),  valid_pixel_mode (0),  cf_inst (Sq_cf_inst_export_done),
Whole_quad_mode ( 0),  barrier (1));
/* TEX INST Instruction 2 */
shader[i++] = tex_dword0 (Tex_inst (sq_tex_inst_sample),  bc_frac_mode (0),  fetch_w Hole_quad (0),
resource_id (0),  SRC_GPR (0),  src_rel (ABSOLUTE),  r7xx_alt_const (0));
shader[i++] = tex_dword1 (DST_GPR (0),  dst_rel (ABSOLUTE),  dst_sel_x (sq_sel_x),/* R */ 

Dst_sel_y (sq_sel_y),/* G */
Dst_sel_z (SQ_SEL_Z),/* B */
Dst_sel_w (Sq_sel_w),/* A */
Lod_bias (0),
Coord_type_x (tex_unnormalized), coord_type_y (tex_unnormalized),
Coord_type_z (tex_unnormalized), Coord_type_w (tex_unnormalized));
shader[i++] = Tex_dword2 (offset_x (0), offset_y (0), offset_z (0), sampler_id (0), src_sel_x (sq_sel_x),
Src_sel_y (sq_sel_y), Src_sel_z (SQ_SEL_0), Src_sel_w (Sq_sel_1));
shader[i++] = Tex_dword_pad;

return i;
}

A total of three instructions, CF instructions 0 and CF Instructions 1 are two CF instructions, Tex Instruction 2 is a texture instruction.

Instruction 0 indicates that the program will start at instruction 2 addr 2, and instruction 2 is a texture fetch instruction, which is based on the texture coordinates given in GPR0 (SRC_GPR (0), according to the previous semantic configuration, The interpolated texture coordinates are stored in GPR0) and the texture value is removed from the texture resource with the ID number 0 and placed into the GPR0 (DST_GPR (0x0)).

After the texture operation is completed, command 1 is executed, instruction 1 is an output instruction, and the texture taken is placed directly on the render target 0 (array_base (cf_pixel_mrt0)).

  

R600 the instructions of the graphics card far more than these, readers in understanding the above content, reading the R600 instruction set manual will not be too much difficulty, interested can go deep in to learn more instructions.

Graphics system in "original" Linux environment and AMD R600 graphics programming (one-by-one)--r600 instruction set

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.