Mali GPU OpenGL ES Application Performance Optimization--basic method

Source: Internet
Author: User

1. Common optimization Tools



2. Common optimization Scheme

The main task of OpenGL es optimization is to find the bottleneck that affect performance in the graphics pipeline, and its bottleneck generally appear in the following aspects:

? In application code, such as conflict detection
     ?  Data transfer between the GPU and main memory
     ?  Vertex processing in VP (Vertex Processor)
     ?  Fragment processing in FP (Fragment Processor)

Performance bottlenecks (Locate bottleneck) can be located by DS-5 Streamline. To get better performance, you can start with the following specific features:

2.1 Textures (Texture)

High-resolution textures occupy a large amount of memory, which is the main load on the Mali GPU and can be optimized in the following ways:
     ?  Try not to use large textures unless necessary
     ?  Always turn on texture mapping (mipmapping), which can sometimes degrade rendering quality
     ?  If possible, sort the triangles together when you render them in the order of render, with triangles that overlap each other
     ?  Compress textures to reduce memory footprint, transmit bandwidth, Mali-400 MP GPU support etc Texture Compression (4bits per pixel, and alpha channel not supported), GPU hardware can decompress etc textures, disadvantage is to reduce image quality

2.2 anti-aliasing (anti-aliasing)

     ?  The GPU supports 4x full Scene anti-aliasing (FSAA) with negligible performance loss, and when creating context and rendering surface, you can activate 4x FSAA by selecting the EGL configuration (egl_samples=4)
     ?  The Mali GPU also supports 16x FSAA, which degrades performance to 1/4 of 4x FSAA

2.3 Drawing modes (draw mode)

For large meshes, a vertex is contained in multiple triangles, so that the number of vertices processed depends on the drawing function that is called:

     ?  Gldrawelements: Each vertex is processed only once and is more efficient.
     ?  Gldrawarrays: Each vertex data is transmitted and processed once in each triangle that uses it

Storing vertex data in the order in which it is used can improve the vertex cache effect and reduce the amount of data transferred from RAM to vertex cache.

2.4 Vertex Buffer object (Vertex buffer Objects)

     ?  Vertex data stored using a vertex array (Vertex array) is located in client memory (that is, main memory), and when Gldrawarrays or gldrawelements is called, the vertex data is copied from the client memory to the graphics memory.

     ?  Vertex Buffer Objects allows OpenGL ES2.0 applications to allocate and CAHCE vertex data in high-performance graphics memory and then render from this memory. This avoids the data being re-sent whenever a primitive is draw.

     ?  Vertex Buffer Objects Classification:
1) Array buffer object (array buffer objects): identified by Gl_array_buffer for storing vertex data (Vertex)
2) Element Array buffer object (element array buffer objects): identified by Gl_element_array_buffer for use in the original index (indices of primitive)

2.5 accuracy (data Precision)

Wherever possible, use low-precision data to avoid using floating-point and other 32-bit data types:
     ?  Defining vertex positions using Gl_short
     ?  Define surface normal using Gl_byte
     ?  Defining colors using Gl_unsigned_byte

2.6 Amount of data processed (Volume by data processed)

To reduce the amount of data processed by the Mali GPU in the following ways:
     ?  Only the current frame can be seen in the original language: in the application through the clipping or frustum clulling to achieve
     ?  Use etc to compress textures
     ?  Sort geometry based on depth: sort the geometry from front to back, sort draw calls based on depth.

2.7 Render target (render Targets)

The following factors are related to render targets:
     ?  Render all textures in the order of factors (cause-and-effect)
1) render to textures before texture is used
2) Last render back buffer
     ?  Draw to only one render target at a time: Ensure that all calls to the current target have been completed before the next target is drawn
     ?  Do not modify textures in a frame: set all the textures needed for the current frame before invoking the API

2.8 Processing pipe (processing Pipeline)

The following factors are related to the graphics processing pipeline:

? Using eglswapbuffers:
If the app shows an animation, make sure to end a frame by calling Eglswapbuffers. The application then produces the next frame, which ensures that the current frame is still displayed stably when the next frame is computed.

? Avoid using glreadpixels:
Even if a few pixels are read, the performance impact is greater because it suspends processing management

? Limit the number of vertices in gldrawelements:
After the gldrawelements is called, the polygon list is not created until the previous operations (such as vertex shading, transformations, lighting) have been completed. To make it parallel, ensure that the number of vertices contained in a single gldrawelements call does not exceed 1/5 of the number of vertices in the current frame. This is especially important before or after calling Gldrawarrays immediately.

2.9 Shader Program (Shader Programs)

     ? Perform shader compilation first: Complete all related calls to the shading language compiler when the application starts and before it starts sending vertex or texture data to the driver
     ? Using a custom shader program: Small, fine shader programs typically run faster by cutting the big shader program into what each surface needs instead of using the chatty shader program
     ? Consider program size: You can use offline Shader compiler to detect program size. A GPU instruction can contain a series of ESSL operations
     ? Loops and Conditional branches: Do not manually expand loops. Instead, put the data in the array and use the For loop where possible. Of course, you can also use the IF statement.
     ? Avoid excessive varyings: When programming in shader, use varings as much as possible in fragment shader programs, because memory bandwidth is required to pass varings between VP and Memory or FP and memory
     ? Avoid using too many matrix multiplication: A 4x4 matrix multiplied by a vector of 4x1, which requires 16 multiplication and 12 additions, is very expensive, and if a vector is required to multiply multiple matrices, the vectors are multiplied by each matrix instead of multiplying all the matrices and multiplying the vectors.
     ? The cost of the evaluation process: the usual cost levels are shown in the table below, and using the offline Shader compiler allows you to get the cost of the program more precisely.


2.10 Shader Operations (Shader arithmetic)

     ?  Vertex processors work based on 32-bit floating-point values: Vertex shader uses floating-point representations of integers. To avoid 32-bit values, set the output varing of the vertex shader program to Mediump or LOWP.
     ?  Fragment Shader uses 16-bit floating-point values to work: Its composition is: sign;5-bit exponent to counteract 15, 10-bit mantissa, with an implied most significant 1-bit

2.11 Other

? Using the DOT Wizard:
Instead of triangles or quads to represent granular solids

? Use triangles of the appropriate size:
Avoid the use of long, thin triangles. The FP (Fragment Processor or pixel Processor) always processes 4 neighboring Fragment groups. Therefore, processing a strip of 1 pixel widths consumes more time than a strip that handles 2 pixel widths.

? Optimize state changes:
Avoid changes in state, you can organize calls of the same state together to reduce state changes

? Clear the entire framebuffer:
Always call Glclear to clear the entire framebuffer. If possible, clear all buffer, such as: color, depth, and stencil buffers, when Framebuffer is cleared.

       void glclear (Glbitfield mask);
        parameter description:
       glbitfield: You can use the | operator to combine different buffer flag bits to indicate the buffers that need to be cleared, For example Glclear (Gl_color_buffer_bit | Gl_depth_buffer_bit) indicates that to clear the color buffer and the depth buffer, you can use the following flag bits:
       1) Gl_color_buffer_bit:     Current writable color buffers
       2) Gl_depth_buffer_bit:     Depth buffering
       3) Gl_ Accum_buffer_bit:   Cumulative buffer
4) Gl_stencil_buffer_bit: template buffering

? To minimize draw calls:
When calling Gldrawarrays or gldrawelements, the GPU driver collects all current OpenGL ES State, texture, and vertex attribute data, and then drives the data and produces commands that can be executed on the GPU hardware to perform a real draw call. This can take a lot of time, so if you make multiple calls, it becomes a performance bottleneck for rendering. If multiple objects have the same rendering parameter, but use different textures, you can merge the textures into a large texture and adjust their corresponding texture coordinates.

? Avoid using Glflush and glfinish:
Do not call Glflush or glfinish, and use eglswapbuffers to trigger the end of a frame, unless you cannot avoid it. (Note: Glflush only sends commands to the server, but does not wait for execution to complete.) If you need to wait until server execution is complete, you need to call glfinish, but it can severely affect performance. )

3. Discovery and elimination of bottleneck

Basic methods:

1) Use professional tools (e.g. DS-5 Streamline)

2) Increase or decrease load during the questionable graphics management phase, and then observe performance changes

The discovery and elimination of bottleneck can be referenced in the following scenarios:

Problem Point Solution Solutions
Application code
Reduce The amount of processing that's unrelated to OpenGL ES calls, such as input processing, game logic, collision dete Ction, and audio processing.
Driver Overhead
Group geometry with similar state together and eliminate unnecessary state changes.
Vertex attribute Transfer
Use smaller data types for the values. Also, use a more economical triangle scheme, and in general use gldrawelements rather than gldrawarrays.
Vertex shader processing, or Transform
and Lighting in OpenGL ES 1.1
Try the following options:
1) Use gldrawelements rather than gldrawarrays.
2) for OpenGL ES 1.1, reduce the number of lights.
2) Minimize the transformations of texture coordinates. You can avoid these transformations by setting the transformation matrix using OpenGL ES 1.1 function glloadidentity.
3) for OpenGL ES 2.0, simplify the vertex shader program.
Polygon List Building
Use fewer graphics primitives. Also, avoid drawing significant amounts of the total geometry on any single call to Gldrawelements.
Varying data transfer
In OpenGL ES 1.1, use fewer texture coordinates. In OpenGL ES 2.0, use fewer varyings, and specify lower precision on varying variables out of the vertex shader.
Fragment shader processing, texture, color sum, and fog in OpenGL ES 1.1
Lower the resolution of the render target or reduce the size of the viewport.
For OpenGL ES 1.1, use fewer texture stages.
For OpenGL ES 2.0, simplify the fragment shader program.
Texture Bandwidth
Try the following options:
1) Use fewer texture stages
2) Lower the size of the textures, by using a smaller data format for each pixel, lower resolution, or texture compression
3) Use a simpler texture filtering mode
4) collapse texture coordinates so, they always read from the same position in the texture.


Transfer to display framebuffer
Try the following options:
1) Use a mode with lower pixel precision
2) Lower the resolution of the render target.

4. Essl Limit Value

The OpenGL ES Shading Language Specification defines the minimum values for various shader resource (shader resources) sizes, and in Mali GPU implementations, some of these values are greater than the minimum values defined in the specification. Commonly used as shown in the following table:



Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Mali GPU OpenGL ES Application Performance Optimization--basic method

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.