R700 Instruction Set Architecture Reference Manual Chapter 2-2.5 Program Status

Source: Internet
Author: User

From table 2.5 to table 2.8, the programmer's point of view summarizes the r700 program State that can be accessed by a single thread in an r700 program.

 

The table does not include:

1. Status specially maintained by r700 hardware, such as internal loop control registers

2. Only States accessible to the host software, such as configuration registers, or

3. Copying the status of many execution threads

 

The column headers in tables 2.5 to 2.8 have the following meanings:

1,Accessed by r700 Software-- Software executed on the r700 processor can be readable (R), writable (W), or read/write (R/W ).

2,Accessed by host software-- Software executed on the host processor can be readable (R), writable (W), or read/write (R/W ). The table does not include status objects, such as r700 configuration registers, which are only accessible to the host software.

3,Number of threads-- Number of such State objects available for each thread. In some cases, the maximum number is shared by all execution threads.

4,Width-- Bit width of the State object

 

Table 2.5 control flow status

 

StatusAccessed by r700 SoftwareAccessed by host software# Every threadBit WidthDescription

 

Integer constant register (I) R W 1 96 (3*32) is the cf_dword1 of the current loop * command

The

Variable cyclic Constants

 

 

Loop index (Al) r No 1 13 a register, initialized by the loop * command,

And hardware, in each iteration of a loop

Incremental, based on the cf_dword1 microcode format

The cf_const domain of the loop * command

Provided.

 

 

Stack no chip specific hardware maintains a single, multiple destination stacks

Save and restore nested loops and pixels (valid

Mask and active mask), assertion and other instructions

Details. The total number of stack entries is

All execution threads are divided.

 

 

 

General purpose register (GPR) r/W No 127-(2 * clause temporary GPR) 128 (4*32) Each thread can access up to 127

GPR, less than twice the number of temporary GPR clauses

The four GPR is retained as only for one ALU

The clause of the clause is temporary GPR (and thus cannot be

).

GPR can be maintained in one or more formats.

Data: Alu can process 32-bit IEEE floating point

(S23e8 with special values), 32-bit

Unsigned integer and 32-bit signed integer.

 

 

Clause temporary GPR-- No; yes; 4; 128 (4*32-bit); GPR containing the temporary variable of the clause. The number of temporary GPR of the clause used by each thread reduces the total number of GPR available to the thread.

 

SIMD global GPR-- R/W; no; defined by the driver; 128 (4*32-bit); GPR groups that are persistent across all threads during kernel execution. It can be used to transmit data between threads.

 

Address Register (AR)-- W; no; 1; 36 (4*9 digits); contains a four-element vector index register, written by mova instructions. Hardware reads this register. An index is used for relative addressing of a constant file (called a constant waterfall. This status continues only for one ALU clause. When used for relative addressing, a specific vector element must be selected.

 

Constant register (CR)-- R; W; 512; 128 (4*32-bit); registers containing constants. Each register is organized into four 32-bit elements of a vector. Software can use CR orConstant CacheBut cannot be used at the same time. DirectX calls these floating point constants (f) registers.

 

Previous vector (PV)-- R; no; 1; 128 (4*32-bit); registers that contain the results of previous Alu. [x, y, z, w] operations ). This status continues only for one ALU clause.

 

Previous scalar (PS)-- R; no; 1; 32; a register that contains the results of previous Alu. Trans operations. This status continues only for one ALU clause.

 

Local Data Sharing (LDS)-- R/W read: 16 KB; write: 16 to 256 bytes. Each thread can both read and write; no; each SIMD of the On-chip memory is shared. Data can be shared between elements of a SIMD using a read model written and shared by an owner. The application should query the ATI runtime function to obtain the actual size.

 

Assertion register -- r/W; no; 1; 1; a register containing the assertion bit. These assertion bits are set to 1 or 0 by the ALU command as the result of calculating certain conditions. These bits are used later, either to block the result of a write operation or as the condition itself. An ALU clause calculates the assertion bit in this register. An assertion bit in this register can be referenced in a control flow instruction to promote the condition branch. This status continues only for one ALU clause.

 

Pixel status-- No; no; 1; 192 (64*2-bit ??? :)); The status bit reflects the activity status of each pixel when the condition command is executed. The status can beActivity,Inactive Branch, OrInactive termination.

 

Effective mask-- No; no; 1; 64; a mask that indicates which pixels have been erased by one pixel. When a cf_inst_kill command is executed, the mask is updated.

 

Active mask -- W (indirect); no; 1; one pixel per bit; one mask indicating which pixels are currently being executed and which ones are not (1 = execution, 0 = skip ). This can be updated using the pred_set * ALU command, but the update will not take effect until the ALU clause ends. The cf_alu command can update the mask with the result of the last pred_set * command in the clause.

 

Table 2-7 vertex status

 

Obtains vertex constants.-- R; W; 128; 84; these describe the cache format and so on.

 

Table 2.8 retrieving textures and constant states

 

Texture sampler-- No; W; 18; 96; each vs, GS, and PS program type has 18 samplervers (16 for direcxtx, plus 2 for backup) available, two of which are standby. A texture sampler constant is used to specify how a texture is accessed. It contains information such as filtering and cropping modes.

 

Texture Resources-- No; W; 160; 160; each vs, GS, and PS program type has 160 available resources, and 16 are given to the FS program type.

 

Border color-- No; W; 1; 128 (4*32-bit); this is stored in the texture assembly line, but is referenced in the texture FETCH Command.

 

Dual-cube weight-- No; W; 2; 176; these define the weight, a horizontal, a vertical, and two-cube interpolation. This state is stored in the texture assembly line, but is referenced in the texture FETCH Command.

 

Kernel size filtered for cleartype-- These define the kernel size, a horizontal, and a vertical, used to filter Microsoft's cleartypetm sub-pixel rendering and display technology. This state is stored in the texture assembly line, but is referenced in the texture FETCH Command.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.