How the GPU works

Source: Internet
Author: User

As early as 1990, the ubiquitous interactive 3D graphics were just something in science fiction. Ten years later, almost every new computer contains a graphics processing unit (GPU ). Until today, the original computing power of the GPU has exceeded the most powerful CPU, and the gap is steadily increasing. Today, GPUs can directly use graphical hardware to implement many parallel operations.Algorithm. Appropriate algorithms that utilize the underlying computing power often achieve a huge speed increase.

The task of any 3D graphics system is to synthesize an image based on the description of a scenario-60 images per second for real-time rendering graphics (such as games. This scenario contains the observed geometric elements and the description of the lighting scene, the way each object reflects the illumination and the position and orientation of the observer.

Graphic pipeline Input

Most Real-Time graphics systems assume that everything is made up of triangles. They first divide any complex shape (such as a quadrilateral or surface) into triangles. The developer uses the graphics library (such as OpenGL or direct3d) to pass each triangle to the graphic pipeline, each passing a vertex, And the GPU combines the vertex into a triangle as needed.

Model Conversion

GPU can specify objects in a scenario using the local coordinate system defined by each object, which is convenient for objects defined by hierarchy. However, this convenience is costly: Before rendering, the GPU must first convert all objects to a common coordinate system. To ensure that triangles are not distorted and distorted, the conversion is limited to some simple affine operations such as rotation, translation, scaling, and similar operations. The output of this phase is a series of triangles which are converted to a common coordinate system. In this coordinate system, the observer is located at the origin and faces the positive direction of the Z axis.

Illumination

Once each triangle is converted to a global coordinate system, the GPU can calculate its color based on the light source in the scenario. GPU processes multi-source effects by accumulating the contribution of each independent light source. The traditional graphic pipeline supports the phong illumination model. The phong illumination model outputs the color C = KD × Li × (N. l) + KS × Li × (R. v) ^ s, where KD is the diffuse color, Li is the color of the light source, n is the surface method vector, L is the vector from the vertex to the light source, and KS is the mirror light color, R is the reflection vector from the light source to the vertex, V is the vector from the vertex to the camera, and s is the light intensity.

Camera Simulation

The graphic pipeline then casts each colored triangle onto the plane of the virtual camera. The output of this phase is a seriesTo be converted to pixelsTriangle.

Grating

The triangle in each visible screen space will cause overlapping pixels during display to determine the bestThe process of approaching the pixel set of the image becomes rasterized. GPU designers have combined many raster algorithms over the years, taking advantage of the crucial point: each pixel can be processed independently of any other pixel. Therefore, machines can process all pixels in parallel-some strange machines have a processor for each pixel. This inherent independence leads GPU designers to build pipeline collections that are gradually parallel.

Texture

Although the color of each pixel can be calculated by illumination, texture is often used on the ry for the sake of realism. GPU stores these textures in High-Speed memory and needs to access the memory when calculating the color of each pixel. In fact, when the size of a texture appears on the screen is larger or smaller than its original size, for each pixel, the GPU needs to access the texture multiple times for sampling to reduce visual errors. Because the access mode of high-speed memory storing textures is usually very regular (neighboring pixels often access neighboring texture coordinates for sampling), the specific cache design will help to hide the latency of memory access.

Hide Surface

In many scenarios, some objects mask other objects. If each pixel is simply written to the video memory, the Triangle Surface recently submitted will be displayed. All modern GPUs provide a deep cache, which is a memory area that stores the distance from each pixel to the observer. Before the new Pixel is written to the video memory, the GPU willTo observerThe distance value of is compared with the value already in the deep cache, if the new PixelTo observerDistanceCloser, the GPU will update the value in the deep cache.

Homogeneous coordinates

In the 3D world, vertices are usually expressed as (x, y, z ). However, in computer graphics, adding the fourth coordinate W is usually useful. To convert a vertex to a new representation, we set w = 1. When we want to restore the original vertex, we execute the following conversions: (X, Y, Z, W)-> (x/W, Y/W, Z/W ). Although it seems unnecessary at first glance, this method has great advantages. For example, we can represent a vector (x, y, z) as (X, Y, Z, 0 ). Through this unified representation of vectors and points, we can execute some useful transformations, such as matrix-vector multiplication.

Evolving graphic Pipeline

Traditional graphic pipelines only provide 8-digit integer values for color use, and the permitted range is from 0 to 255. ATI radeon 9700 supports 24-bit floating point values, while NVIDIA geforce FX supports 16-bit and 32-bit floating point values. The current video card supports 64-bit double precision floating point values. To meet the demand for graphics performance, the GPU actively includes parallel design.The number of stream processors is growing.

Is the graphic pipeline for development and evolution:

Gpgpu

The high parallel workload of real-time computer graphics requires extremely high computing throughput and stream memory bandwidth, but because the final picture is displayed every 16 milliseconds, therefore, a large latency can be tolerated in separate computing. These features shape the underlying GPU architecture: the GPU is optimized for high throughput while the CPU is optimized for low latency. The original GPU computing power is amazing: The geforce 8800 chip can perform 330 billion floating point operations per second. The increasing GPU capabilities, programmability, and accuracy have inspired a large number of general-purpose computing graphics hardware (gpgpu) research. Gpgpu researchers and developers use GPU as a computing coprocessor rather than an image synthesis device.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.