R700 Instruction Set Architecture Reference Manual-Chapter 1: Introduction

Source: Internet
Author: User
Tags pixel coloring

The r700 family processor implements a parallel microarchitecture that not only provides graphic computing applications, but also provides an excellent platform for general purpose stream applications. Any data-intensive application that can be mapped to a 2D matrix can be used as a candidate for running on the r700 family processor.

Figure 1.1 shows the Integrated Block of the r700 family Processor

It includes a Data Parallel processor (DDP) array, a command processor, a memory controller, and other logic (not displayed ). The r700 command processor reads the memory ing r700 registers written by the host to the system memory address space. When the command is complete, the command processor sends the hardware interruption to the host. The r700 memory controller can directly access the r700 local memory and the system memory area specified by the host. To meet read/write requests, the memory controller performs a Direct Memory Access (DMA) controller function, including calculating the memory address offset based on the data format requested in the memory.

 

A host application cannot directly write r700 local memory, but it can command r700ProgramAnd data are copied between system memory and r700 memory. There are two methods for writing CPU to GPU memory:

1. Request the DMA engine of the GPU to write data to it by pointing to the location of the source data on the CPU memory, and then pointing to the offset in the GPU memory to be written.

2. Load a kernel and run it on the shader. The shader accesses the memory through the PCIe connection, processes the data, and finally stores the data in the GPU memory.

 

A complete r700 application consists of two parts:

1. A program running on the host processor and

2. programs running on r700 ProcessorsKernel(Kernel).

 

The r700 program is controlled by host commands:

1. Set the r700 internal base address and other configuration registers,

2. Specify the data domain to be operated by r700,

3. the cache on the r700 is invalid and scrubbed, and

4. Run the r700 program.

 

The r700 Driver runs on the host.

 

The DPP array is at the heart of the r700 processor. The array is organized into a set of SIMD pipelines, each of which is independent of each other and operates concurrently on floating point or integer data streams. The SIMD pipeline can process data, or read and write data to the memory through the memory controller. The computation in a SIMD pipeline can be conditional. The output written to the memory can also be conditional.

The Host Command requests a SIMD pipeline to execute a kernel and pass it to it:

1. An identifier pair (x, y)

2. A condition value and

3. KernelCodeLocation in the memory.

 

When the SIMD pipeline receives the request, it loads the command and data from the memory, starts to execute, and continues until the kernel ends. When the kernel is running, the r700 hardware automatically fetches commands and data from the memory to the on-chip cache. The r700 software does not play any role here. The r700 software can also load data from off-chip storage to the on-chip GPR and cache.

 

In concept, each SIMD pipeline maintains an independent interface for memory, this interface consists of an index pair and a domain that identifies the request type (Program instruction, floating point constant, integer constant, Boolean constant, input read, or output write. The input, output, and constant index pairs are specified by requesting the r700 command from the program State maintained by hardware in the pipeline.

The r700 program does not support exceptions, interruptions, errors, or any other events that can interrupt its pipeline operations. In particular, it does not support IEEE floating point exceptions. The software interrupt representation from the command processor to the host, as shown in Figure 1.1, is the hardware interrupt caused by the completion of command sending and related management functions.

 

Figure 1.2 shows data streams of three versions of an r700 application from the programmer's perspective. The top version (A) is a graphic application that contains a geometric coloring program and a DMA copy program. The intermediate version (B) does not contain the geometric coloring program and the DMA copy program. The bottom version (c) is a general purpose application. The program that runs on the DPP array. Circle and cloud indicate non-programmable hardware functions. For graphic applications, each block in the chain processes a specific type of data and transmits its results to the next block. For general purpose applications, only one processing block executes all calculations.

 

Figure 1.2 Abbreviations:

CS-computing coloring er Program

DC--DMA Copy Program

GS-geometric coloring Program

PAC -- parameter Cache

POC-location Cache

PS-pixel coloring er Program

RB-ring Cache

Vs -- Vertex coloring Program

 

The data stream sequence is started by reading 2D vertices, 2D textures, or other 2D data from the local r700 memory or system memory; it ends by writing 2D pixels or other 2D data results to the local r700 memory. The r700 processor potentially keeps traces of hundreds of threads in different stages of execution, and hides memory latency through stacked computing operations and memory access operations.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.