Analysis on the advantages and disadvantages of three main architecture of mobile GPU _GPU

Source: Internet
Author: User



Guide: GPU is graphic Processor unit abbreviation, as the name implies is the graphics processor. The concept of GPU was first developed from the graphics workstation, from the 90 's PC popularization began, the GPU ushered in its era of great development. In the 90 's, desktop GPU experienced 2D to 3D spanning, from 3D graphics rendering to replace 2D as the mainstream of PC games
1. Mobile GPU and desktop GPU mobile GPU relative desktop GPU is only the penis. The disadvantage of mobile GPU is mainly expressed in theoretical performance and bandwidth. Mobile GPU is limited by the size of the chip, energy consumption and cost, so it has to sacrifice some performance and bandwidth to achieve the balance between cost and battery endurance. Compared with the desktop GPU with 256bit or even 512bit bit width and 1.2-1.5ghz high frequency video memory, the mobile GPU is not only to share the memory bandwidth with the CPU, but also commonly uses the dual 32bit bit width, lpddr2-800 or about 1066 memory system. Total bandwidth is generally within 10gb/s


The highest memory bandwidth in the mobile processor above is the ipad 3/4, because they use the retina screen, 2048x1536 High resolution for GPU bandwidth requirements, but even these two products, 17gb/s bandwidth and PC graphics on the 200gb/ The bandwidth of S is more than pediatrics. Without high bandwidth there is no large texture data, and there is no high quality. Although bandwidth is not the only factor restricting the development of mobile GPU, but under the current constraints, mobile GPU manufacturers concerned about the head, and the big thing is how to maximize the bandwidth requirements of the GPU performance and quality, texture compression is a method, there is a different rendering architecture. At present, there are IMR, TBR and tbdr, three kinds of mainstream architectures in GPU Field 2. The mode of mobile GPU
2.1. IMR mode

IMR (Immediate Mode Rendering) as literally, every render command that is submitted is immediately executed, and the render command starts executing the next render command after execution in the entire line
Advantages of this model:
1. GPU architecture is more straightforward than TBR mode
2. When performing FBO operations within a frame, performance is not affected by the need to empty the buffered rendering instructions
3. Do not need to be the same as the TBR architecture needs on-chip cache to save intermediate results
4. Instead of caching the triangle List like the TBR architecture, there are advantages over TBR in scenarios where there is a large number of vertex operations. For example, a complex model on the PC may have millions of triangle
The disadvantage of this model is that:
1. IMR rendering can present a waste of bandwidth. For example, when a two-time rendering has a front-and-back shadowing relationship, the IMR mode is executed because of two draw commands, so there will be pixel after pixel shader, which is wasted depth unit computing power. Fortunately, at present, almost all the IMR architecture of the GPU will provide early z of the way, is generally in the rasterizer inside the image of the shading relationship to judge, if the need to render the graphics are blocked, then directly discard the graphic without the need to perform pixel Shader
2. Another disadvantage of IMR is that its render commands are required to read and write frame buffer,depth buffer and stencil buffer at any time, which leads to a large amount of memory bandwidth consumption, which is the most consumed and time-consuming operation for accessing the external memory on the mobile platform.
Therefore, in the field of desktop GPU, TBR save bandwidth and low performance is not in line with the requirements of PC, IMR Unified Lake. But in the mobile GPU field, TBR low bandwidth consumption, low power is just enough to meet the mobile device demand, rather than on the PC side of the treatment, mobile device field TBR almost unified lake
3.2. TBR mode differs from the IMR simple and brutal approach, TBR (Tile Based Rendering) It divides the rendered images into rectangular blocks (Tile), Tile generally a 4x4 or 8x4 rectangular block. The vertices of the model are assembled into triangle after vertex shader operations, and these triangle are cached in a triangle cache. If a triangle needs to be drawn within a tile, an index is stored in the triangle list of the tile. ,,, a frame all the rendering commands after the execution of vertex shader generation triangle, each tile will have a triangle list, this list contains the need to draw inside the tile all triangle. Then the GPU then executes each tile raster and per-fragment based on the Triangle list operation
The advantages of TBR are:
There is no need to repeatedly access the frame buffer,depth buffer,stencil buffer when performing raster and per-fragment operation. This is because the GPU can keep the entire tile frame buffer/depth Buffer/stencil buffer in a cache on one slice so that the GPU accesses the tile directly without having to access the external memory. This greatly reduces the bandwidth consumption of memory, but also means the reduction of energy consumption
The disadvantages of TBR are:
You need to save the results of vertex shader execution and the triangle list for each tile. This means that if the scene has a lot of vertices, then the cache can not save so many vertex information and triangle list, you have to rely on external memory to store, there will be additional bandwidth consumption. Fortunately, the current mobile 3D rendering will not have too many triangle scenes. A complex model also has more than 10,000 triangle, so a common scenario is probably hundreds of thousands of triangle. As mobile games become more sophisticated, the complexity of the models will rise rapidly, which is one of the challenges that the TBR architecture will face in the future.
If you have two or more rendering in one frame, you need to use the frame buffer object to cache the intermediate results, which is a great performance loss for TBR. According to our previous explanation, TBR need to buffer a frame of all the entities, the completion of all the entities to start raster and per-fragment operation. In this case, once the subsequent draw command needs to use the results of the preceding render build, it is necessary to require the GPU to perform all the cached draw commands before the command is executed, and then discard the current cached content. In extreme cases, such as each draw need to read the previous draw rendering results, then TBR will degenerate directly into the IMR model based on the above shortcomings, we can see in the desktop GPU domain TBR no advantage, so it completely out of the desktop GPU market. But in the mobile GPU market it is more adaptable to performance/bandwidth/energy Balance
3.3. TBDR mode



TBDR (Tile Based Deferred Rendering, Texture delay rendering) is a close relative of TBR, similar to the principle of TBR, but through HSR (Hidden Surface removal, hidden surface elimination) operations, in the execution of pixel Shader further reduces the need for rendering fragment and reduces bandwidth requirements. Before performing pixel shader, each pixel generated by raster is compared to depth test, and the shaded pixels are removed, which is the principle of HSR. Theoretically after HSR culling, the tbdr pixel limit per frame is the number of pixels on the screen (without considering alpha blend). Traditional TBR may need to render a pixel 6 times times the size of the screen when performing a more complex game.
TBDR is PowerVR's trump card, because the bandwidth and computational overhead of TBR and HSR make the mobile phone's endurance a marvel. The following figure is the PowerVR's SGX series of GPU frame composition, which can be seen with its complexity of architecture




Guide: GPU is graphic Processor unit abbreviation, as the name implies is the graphics processor. The concept of GPU was first developed from the graphics workstation, from the 90 's PC popularization began, the GPU ushered in its era of great development. In the 90 's, desktop GPU experienced 2D to 3D spanning, from 3D graphics rendering to replace 2D as the mainstream of PC games
1. Mobile GPU and desktop GPU mobile GPU relative desktop GPU is only the penis. The disadvantage of mobile GPU is mainly expressed in theoretical performance and bandwidth. Mobile GPU is limited by the size of the chip, energy consumption and cost, so it has to sacrifice some performance and bandwidth to achieve the balance between cost and battery endurance. Compared with the desktop GPU with 256bit or even 512bit bit width and 1.2-1.5ghz high frequency video memory, the mobile GPU is not only to share the memory bandwidth with the CPU, but also commonly uses the dual 32bit bit width, lpddr2-800 or about 1066 memory system. Total bandwidth is generally within 10gb/s


The highest memory bandwidth in the mobile processor above is the ipad 3/4, because they use the retina screen, 2048x1536 High resolution for GPU bandwidth requirements, but even these two products, 17gb/s bandwidth and PC graphics on the 200gb/ The bandwidth of S is more than pediatrics. Without high bandwidth there is no large texture data, and there is no high quality. Although bandwidth is not the only factor restricting the development of mobile GPU, but under the current constraints, mobile GPU manufacturers concerned about the head, and the big thing is how to maximize the bandwidth requirements of the GPU performance and quality, texture compression is a method, there is a different rendering architecture. At present, there are IMR, TBR and tbdr, three kinds of mainstream architectures in GPU Field 2. The mode of mobile GPU
2.1. IMR mode

IMR (Immediate Mode Rendering) as literally, every render command that is submitted is immediately executed, and the render command starts executing the next render command after execution in the entire line
Advantages of this model:
1. GPU architecture is more straightforward than TBR mode
2. When performing FBO operations within a frame, performance is not affected by the need to empty the buffered rendering instructions
3. Do not need to be the same as the TBR architecture needs on-chip cache to save intermediate results
4. Instead of caching the triangle List like the TBR architecture, there are advantages over TBR in scenarios where there is a large number of vertex operations. For example, a complex model on the PC may have millions of triangle
The disadvantage of this model is that:
1. IMR rendering can present a waste of bandwidth. For example, when a two-time rendering has a front-and-back shadowing relationship, the IMR mode is executed because of two draw commands, so there will be pixel after pixel shader, which is wasted depth unit computing power. Fortunately, at present, almost all the IMR architecture of the GPU will provide early z of the way, is generally in the rasterizer inside the image of the shading relationship to judge, if the need to render the graphics are blocked, then directly discard the graphic without the need to perform pixel Shader
2. Another disadvantage of IMR is that its render commands are required to read and write frame buffer,depth buffer and stencil buffer at any time, which leads to a large amount of memory bandwidth consumption, which is the most consumed and time-consuming operation for accessing the external memory on the mobile platform.
Therefore, in the field of desktop GPU, TBR save bandwidth and low performance is not in line with the requirements of PC, IMR Unified Lake. But in the mobile GPU field, TBR low bandwidth consumption, low power is just enough to meet the mobile device demand, rather than on the PC side of the treatment, mobile device field TBR almost unified lake
3.2. TBR mode differs from the IMR simple and brutal approach, TBR (Tile Based Rendering) It divides the rendered images into rectangular blocks (Tile), Tile generally a 4x4 or 8x4 rectangular block. The vertices of the model are assembled into triangle after vertex shader operations, and these triangle are cached in a triangle cache. If a triangle needs to be drawn within a tile, an index is stored in the triangle list of the tile. ,,, a frame all the rendering commands after the execution of vertex shader generation triangle, each tile will have a triangle list, this list contains the need to draw inside the tile all triangle. Then the GPU then executes each tile raster and per-fragment based on the Triangle list operation
The advantages of TBR are:
There is no need to repeatedly access the frame buffer,depth buffer,stencil buffer when performing raster and per-fragment operation. This is because the GPU can keep the entire tile frame buffer/depth Buffer/stencil buffer in a cache on one slice so that the GPU accesses the tile directly without having to access the external memory. This greatly reduces the bandwidth consumption of memory, but also means the reduction of energy consumption
The disadvantages of TBR are:
You need to save the results of vertex shader execution and the triangle list for each tile. This means that if the scene has a lot of vertices, then the cache can not save so many vertex information and triangle list, you have to rely on external memory to store, there will be additional bandwidth consumption. Fortunately, the current mobile 3D rendering will not have too many triangle scenes. A complex model also has more than 10,000 triangle, so a common scenario is probably hundreds of thousands of triangle. As mobile games become more sophisticated, the complexity of the models will rise rapidly, which is one of the challenges that the TBR architecture will face in the future.
If you have two or more rendering in one frame, you need to use the frame buffer object to cache the intermediate results, which is a great performance loss for TBR. According to our previous explanation, TBR need to buffer a frame of all the entities, the completion of all the entities to start raster and per-fragment operation. In this case, once the subsequent draw command needs to use the results of the preceding render build, it is necessary to require the GPU to perform all the cached draw commands before the command is executed, and then discard the current cached content. In extreme cases, such as each draw need to read the previous draw rendering results, then TBR will degenerate directly into the IMR model based on the above shortcomings, we can see in the desktop GPU domain TBR no advantage, so it completely out of the desktop GPU market. But in the mobile GPU market it is more adaptable to performance/bandwidth/energy Balance
3.3. TBDR mode



TBDR (Tile Based Deferred Rendering, Texture delay rendering) is a close relative of TBR, similar to the principle of TBR, but through HSR (Hidden Surface removal, hidden surface elimination) operations, in the execution of pixel Shader further reduces the need for rendering fragment and reduces bandwidth requirements. Before performing pixel shader, each pixel generated by raster is compared to depth test, and the shaded pixels are removed, which is the principle of HSR. Theoretically after HSR culling, the tbdr pixel limit per frame is the number of pixels on the screen (without considering alpha blend). Traditional TBR may need to render a pixel 6 times times the size of the screen when performing a more complex game.
TBDR is PowerVR's trump card, because the bandwidth and computational overhead of TBR and HSR make the mobile phone's endurance a marvel. The following figure is the PowerVR's SGX series of GPU frame composition, which can be seen with its complexity of architecture



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.