R700 Instruction Set Architecture Reference Manual-Chapter 2: program organization and status

Source: Internet
Author: User
Tags pixel coloring

The r700 program consists of control flow (CF), Alu (arithmetic logic unit), texture retrieval, and vertex retrieval commands. Alu can have up to three source operations and one destination operation. Command to operate 32-bit or 64-bit IEEE floating point values and signed or unsigned integers. The execution of some commands causes the predicate bit to be written, thus affecting Subsequent commands. Graphics programs generally use vertices and texture fetch commands to load data, while general computing applications generally use texture fetch commands to load data.

 

2.1 program type

 

 

The following program types generally run on r700 (see Figure 1.2 ):

1,Vertex shader ()-- Read vertices and process them. Depends on whether there is a geometric coloring machine (GS) in activity. It either caches the output results for a vs loop or for a parameter cache and location. It does not introduce new elements. A vertex shader can callChild routines (FS, fetch subroutine)It is a special global program used to retrieve vertex data, for execution purposes, as part of the vertex program. FS provides the driver independence between the process of getting data as required by Vs and vs itself.

2,Geometric coloring machine (GS)-- Read the elements from the vs ring cache and write one or more elements for each input element as output to the GS ring cache. This program type is optional; when active, it requires a DMA copy (DC) program to be active. GS reads up to six vertices from the memory cache created by vs. It outputs the number of elements of a variable to the second memory cache.

3,DMA copy (DC)-- Transfer data from GS ring cache to parameter cache and location cache. It is necessary for a system that is running a geometric coloring device.

4,Pixel paintors (PS) or fragment paintors-- Such programs:

-- Receives pixel data from the Raster to be colored.

-- Process a four-pixel set (four pixel elements arranged in a 2x2 array) and

-- Write and output data to up to 8 local memory caches, which are called multiple rendering targets (MRT, multiple render target). A single MRT can contain one or more frame caches.

5,Computing shader (CS)-- A general program that uses its thread ID as an index for execution:

-- Collect reads from one or more sets of input data

-- Arithmetic computation, and

-- Write one or more sets of output data to the memory discretely

 

All Program types accept the same command type, and all program types can run in any available DPP array pipeline that supports these programs; however, each kernel type has certain limitations, this will be described based on the specific type.

 

2.1.1 data stream

 

The host can initialize the r700 to run with one or two configurations-with or without a geometric coloring program and a DMA copy program. See the figure 1.2. Each data stream is described in the following section.

 

2.1.2 no geometric program exists

 

This configuration consists of the following steps:

1. The vs program sends a cache pointing to the local memory, containing up to 64 vertex indexes.

2. The r700 hardware organizes These vertices in its input cache (Remote Memory) into vectors. (Note: You can imagine OpenGL functions such as gldrawarray. The hardware organizes vertices into vectors based on the input drawing mode type, such as gl_line and gl_triangle_strip)

3. When all vertices are ready to be processed, the r700 allocates GPR and thread space to process each of the 64 vertices based on the size provided by the compiler.

4. The vs program calls the FS program, extracts vertex data to GPR, and returns control to the vs program.

5. Transformation, illumination, and other operations of the vs program.

6. The vs program allocates space in the location cache and outputs the vertex position (xyzw ).

7. The vs program allocates parameter cache and location cache space, and outputs the positions and parameters of each vertex.

8. When the vs program exits, r700 releases its GPR space.

9. When the vs program is completed, the pixel coloring Er (PS) program starts.

10. The r700 hardware assembles elements from the data in the location cache and from the vertex (VGT), performs scan conversion and final pixel interpolation, and loads these values into GPR.

11. The PS program then runs for each pixel.

12. The program outputs data to a frame cache and then r700 releases the GPR space.

 

2.1.3 The Geometric coloring tool exists.

 

Table 2.2 shows the program running sequence when a geometric program exists.

This configuration consists of the following steps.

1. The r700 hardware loads the input indexes, elements, and vertex IDS into GPR from the vertex ry Translator (VGT.

2. The vs program obtains the vertex (Translator's note: singular) or the required vertex (Translator's note: plural ).

3. Change, illumination, and other part of vs program.

4. The vs program writes vertices and outputs them to the end of the vs loop cache.

5. The GS program reads multiple vertices from the vs ring cache, performs its ry function, and outputs one or more vertices to the GS ring cache for each input vertex. The vs program can write only one vertex for each input. The GS program can write a large number of vertices for each input. Each time a GS program outputs a vertex, it indicates the vertex VGT, and a new vertex has been output (using the emit _ * command ). VGT counts the total number of vertices created by each GS program. The GS Program Splits the primitive strip by publishing the cut_vertex command ).

6. When all vertices have been output, the GS program ends. No location or parameter is output.

7. The DC program reads vertex data from the GS ring cache and uses a mem _ * Memory output command to transmit the data to the parameter cache and location cache.

8. The DC program exits and the r700 releases the GPR space.

9. The PS program is running.

10. Data assembly elements in the r700 from-location cache, parameter cache, and VGT.

11. The hardware performs scan conversion and final pixel interpolation, and the hardware loads these values into GPR.

12. PS program running

13. When the PS Program reaches the end of the data, it uses the Export command to output the data to a frame cache or other rendering targets (up to 8 ).

14. The program exits by executing an export_done command, and the processor releases the GPR space.

 

2.2 instructions

 

Table 2.3 summarizes the terms related to certain commands used in this document. The instruction itself is described in the remaining sections. The details of each instruction are provided in Chapter 9th. The register type is described in "register.

 

1,Microcode format-- 32-bit. One or several encoding formats of all commands. They are described in sections 3.1, 4.1, 5.1, and 6.1.

2,Command-- 64 or 128 bits. Two to four microcode formats are specified:

(1) control flow (CF) Command (64-bit ). These include general control flow commands (such as branches and loops), instructions for allocating cache space and output data, and instructions for starting ALU, taking textures, or obtaining vertex clauses (clause).

(2) ALU command (64-bit)

(3) Texture retrieval command (128 bits)

(4) obtain the vertex command (128 bits)

(5) data sharing command (128 bits)

(6) memory READ command (128 bits)

Commands are identified in the microcode format by their domain names and the _ inst _ string in the mnemonic. The command function is described in Chapter 9th.

3,Alu Command Group-- 64 to 448 bits. Variable-size commands and constant groups are composed of the following:

(1) One to five ALU commands

(2) zero to two 64-bit literal Constants

The ALU Command Group is described in section 4.3.

4,Literal constant-- 64-bit. A literal constant specifies two 32-bit values, which can represent the values associated with two elements of a 128-bit vector. These constants can be included in the ALU Command Group.

The literal constant is described in section 4.3.

5,Slot)-- 64-bit. An ordered location in An ALU Command Group. Each ALU Command Group has 1 to 7 slots, corresponding to the ALU command and literal constant number in the ALU Command Group.

The slot is described in section 4.3.

6,Clause)-- 64 to 64 x128 characters (64 128 characters ). A group of commands of the same type. The types of clauses include:

(1) ALU clause (including ALU Command Group)

(2) Texture clause

(3) vertex clause

The clause is started by the control flow (CF) command and described in section 2.3.

7,Output (Export)-- Do any of the following:

(1) write data from GPR to the output cache (a "temporary cache (Scratch buffer)", "frame cache", "ring cache", and "stream cache ", or "reduction (reduction) cache ").

(2) Write a data input address to the memory controller.

(3) read data from an input cache ("temporary cache" or "Ring cache") to GPR.

7,Fetch)-- Use a vertex or texture FETCH Command clause to load data. Loading is not required for general purpose registers (GPR); the specified type of loading may be limited by the specified type of storage destination.

8,Vertex-- A group of (x, y) 2D coordinates.

9,Quad)-- Four (x, y) data elements of a 2-by-2 array.

10,Primitive)-- A vertex, line segment, or polygon before raster. It has vertices specified by geometric coordinates. Through linear interpolation across elements, the vertex can be associated with additional data.

11,Fragment)-- Graphic programming:

(1) raster the result of an element. A piece has no vertex; instead, it is represented by (x, y) coordinates.

For general purpose programming:

(1) A group of (x, y) data elements

12,Pixel (pixel)-- Graphic programming:

(1) place a piece in a (x, y) frame cache.

For general purpose programming:

(1) A group of (x, y) data elements

 

2.3 Control Flow and Clause

 

Each program consists of two parts:

1,Control Flow-- Control flow commands can be:

-- Start the execution of the ALU command, the texture command, or the vertex command.

-- Output data to a cache.

-- Control Branch, loop, and stack operations.

2,Clause-- A group of homogeneous commands. Each sub-statement is independent of ALU, texture retrieval, vertex retrieval, local data sharing, or memory read commands. A control flow command that starts An ALU, gets a texture, or obtains a vertex clause is executed by referencing an appropriate clause.

 

Table 2.4 provides a typical Program Stream example.

 

FunctionMicrocode format

Control Flow (CF) code clause code

 

Start the loop cf_dword [0, 1]

 

Start the texture fetch clause cf_dword [0, 1]

 

Texture or vertex clause to load data from memory to GPR tex_dword [0, 1, 2]

 

Start the ALU clause cf_alu_dword [0, 1]

 

The ALU clause is calculated on loaded data and word constants. Alu_dword [0, 1]

This example shows a single ALU command alu_dword [0, 1]

(Every two or four characters) and the alu_dword of the literal constant of two or four words [0, 1]

A single clause composed of the ALU Command Group (Note: alu_dword [0, 1]

Here, a word is 16 characters) alu_dword [0, 1] the last digit, set 1

Literal [x, y]

Literal [Z, w]

 

End loop cf_dword [0, 1]

 

Allocate space in an output cache cf_alloc_export_dword0

Cf_alloc_export_dword1_buffer

 

Output (write) The result from GPR to the output cache cf_alloc_export_dword0.

Cf_alloc_export_dword1_buffer

 

 

Control Flow command (Translator's note: plural ):

 

1. Construct the main program. Branch statements, loops, and subroutine calls are directly expressed in the control flow of the program.

2. Includes a synchronization mechanism.

3. indicates when a clause is completed.

4. the cache is required. The cache is allocated in the output cache of the program block and written to the output cache of the program block.

 

Some program types (Vs, GS, DC, PS) have control flow commands synchronized with other blocks.

Each sub-statement is called by a control flow command. It is a list of finite-length commands in sequence. The clause does not contain control flow statements, but the ALU clause instruction can apply an asserted based on each instruction. Serial execution of commands in a single clause. Multiple clauses of a program can be executed in parallel if they contain different types of commands and the clauses are independent of each other (such parallel execution is invisible to programmers, except for performance improvement ).

The ALU clause contains. [x, y, z, w] and Alu. trans) commands for executing operations, including setting and using assertions, and pixel kill (see section 4.8.1 ). The texture fetch clause contains commands that execute textures and read constants from memory. The vertex fetch clause is used to obtain vertex data from the memory. A system without a vertex cache can perform vertex fetch operations in a texture clause.

 

A predicate is a bit that can be set to 1 or 0 as the result of certain conditions. Thus, it is used to block writing an ALU result or itself as a condition. There are two assertions, each of which is set to 1 in An ALU clause:

1. The first type is a single local asserted for the ALU clause itself. Once calculated, the assertion can be referenced in a subsequent instruction to write an ALU calculation result to the indicated general purpose register with conditions.

2. The second type is a bit in an asserted stack. An ALU clause calculates the assertion bit in the stack and operates on the stack. An assertion bit in the stack can be referenced in a control flow command to cause a branch with conditions.

 

 

2.4 command types and Groups

 

The r700 family devices recognize the following command types:

1. Control Flow commands

2. Clause types: Alu, texture fetch, vertex fetch, local data sharing clause, and memory read clause

 

Each instruction type has an independent instruction cache in the processor.

 

A CF program has no maximum size. However, each clause has a maximum size. When a program is organized in memory, the instructions must be arranged as follows:

1. All CF commands

2. All ALU clauses

3. All texture fetch and vertex fetch clauses

4. All local data sharing clauses

5. All READ memory clauses

 

The CPU host configures the base address of each program type before executing a program.

 

2.5 Program Status

 

2.6 Data Sharing

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.