3D Scene Rendering Optimization

Source: Internet
Author: User
(1) effective performance evaluation
                         
For any 3D application, pursuing realistic scenes is a complete goal. The result is to make our scenarios more complex and more precise, this will inevitably bring a huge load to the graphics hardware so that the frame rate cannot be set in real time. Therefore, rendering optimization is essential. Before rendering optimization, We need to systematically evaluate the application performance, identify bottlenecks, and remedy the problem. For 3D applications, the performance is greatly affected, and the bottle diameter varies with different hardware configurations. Therefore, to evaluate the performance of an application, we not only need to have a deep understanding of the principle of the entire rendering pipeline, but also use some evaluation tools to make our work easier.

We know that the speed of the rendering pipeline is determined by the slowest stage. Therefore, to evaluate a 3D application, we must first analyze whether the bottleneck affecting rendering performance is on the CPU or GPU end, therefore, we absolutely optimize the object. Because the current graphics acceleration hardware is powerful, this bottle diameter often appears on the CPU end. We can obtain this information through some tools, such as NVIDIA nvperfhud. In the evaluation options, we can check the CPU and GPU busy degree. When the CPU busy degree is 100% and the GPU is not yet, we know that the performance bottleneck is on the CPU end, we must perform operations on the CPU side and try to "Feed" the GPU as much as possible to shift some troublesome computing values to the GPU, such as hardware skeleton skin. When the GPU side is a bottleneck, it indicates that the GPU is overloaded, probably because there is too much rendering fill, that is, the number of polygon is too large (the current powerful GPU makes this rare case.

There are two bottlenecks on the CPU. One is complicated AI computing or inefficient code, and the other is poor rendering batch processing or resource management. In the first case, we can use tools such as vturn to sort the call time of all functions in the application from large to small, so that we can easily know the problem. In the second case, we can also use nvperfhud to check the number of DP entries per frame to see if there are too many batches (a specific conversion formula) and the number of texture memory, whether excessive memory is consumed. Using these tools, we can basically locate application bottlenecks. Compile an embedded profiler function in the application to facilitate the evaluation. In addition, the script program like Lua allows us to debug during running and improve the evaluation efficiency.

(2) Static scenario Optimization

Static scenarios include entity sets that generally do not change positions, such as terrain, vegetation, and buildings. Optimization of these entities is the most important part of scenario optimization. This article discusses common problems of static scenario optimization.

1 Batch Optimization

Batch is one of the most important concepts in scenario optimization. It refers to a rendering call (DP). The batch size is the number of polygon that can be rendered by this rendering call. Each batch of calls consumes a certain amount of CPU time. For the video card, the number of polygon in a batch cannot reach the maximum number of draws. Therefore, it is the basic principle of Batch Optimization to put more polygon in one batch for rendering, so as to reduce the number of batches and ultimately reduce the CPU time. However, it is often unsatisfactory. In some cases, the original batch will be broken, resulting in additional overhead, such as texture changes or different matrix states. To solve these problems, we can use some methods to avoid them as much as possible, and the batch size has been maximized.

(1) Merge multiple small textures into a large texture

In a specific scenario, there are more than 10 different vegetation types on the ground. Except for different textures, the rendering statuses are the same. We can package their textures into a large texture and specify UV for each vegetation model, so that we can use a rendering call to render all objects, the number of batches is reduced from more than 10 to one. This method is suitable for objects with low texture precision requirements and few faces.

(2) Use vertex shader to unify different matrices

Even if the materials of all objects in the scenario are the same, if their matrix states are different (especially the scene graph management engine), the original batch will be broken. This situation can be avoided by using the vertex shader technology, because the transformation matrix to be multiplied can be transferred to the shader program through the constant register. This unifies the matrix state of the object and can be rendered in a batch.

2. Render Status Management

The rendering state is used to control the rendering behavior of the Renderer. In d3d, It is the setrenderstate. By changing the rendering state, we can set the texture state, deep writing, and so on. Changing the rendering status is time-consuming for the video card, because the video card must strictly follow the rendering path when the rendering status changes, the video card must perform a floating point operation to change the rendering path. Therefore, it consumes time for the CPU and GPU (the CPU must wait). The larger the rendering state changes, the more floating point operations to be performed. Therefore, you can effectively manage the rendering status to minimize its changes, which has a huge impact on rendering performance. (The new six-generation video card geforce8 series stores some common state parameter sets in the core of the video card. When the rendering state changes, you can directly read the saved parameter sets, to eliminate unnecessary overhead ). Most 3D engines perform grouped rendering of pass by rendering status.

3 levels

I don't need to talk nonsense about this rotten technology that has been discussed by people. Let's talk about some practical applications. There are too many methods to deal with the details of the terrain, but I feel that the most practical method is the chain slice method. For the model's dashboard, the automatic surface reduction algorithm, such as vdpm, is not uncommon, but the effect is average. The conventional practice is to let the artist do the low-mode replacement. For complex scenarios, the effect of the model's dashboard is quite obvious. You need to use some techniques to make the texture of the object after the fog, including the terrain, into a uniform material that uses the fog color, so that the rendering state is unified, it depends on the specific situation to determine whether to package the data into a dp (it is better to turn off the illumination effect of the uniform material, which is time-consuming ). As for the role model, the level of granularity is similar to that of the common model. The low mode reduces the number of vertices and naturally reduces the amount of skin computing. In my opinion, it is not particularly necessary to look at the specific situation.

4. Scenario Management Optimization

Optimization of Scenario Management includes scenario segmentation and visibility removal. There are many reference articles, so I will not talk about them here. In today's outdoor scenarios, quadtree or REE is generally used. When we find that the process of traversing the tree is slow during performance evaluation, there may be two reasons. First, the depth setting of the tree is unreasonable. We can easily find the best depth. Another reason may be that we have allocated nodes for too many large objects, but small objects, resulting in redundancy of the number of nodes. The solution is to divide these small objects into their large nodes.

Visibility removal is the most common optimization method. We usually use cone reduction, which is also very effective. Cone Cutting is also a lot of optimization methods, which will not be detailed here. Blocking and cutting are also frequently used methods. Common methods include Horizon cutting. However, in some cases, the effect of blocking and cutting is not obvious. For example, when the CPU usage is already 100%, the CPU end is the bottleneck. In this case, blocking and cutting are performed to calculate the CPU consumption time, and the effect is not obvious. However, in some cases, some pre-generated information methods are used to reduce the complexity of occlusion reduction calculation, improve the efficiency of occlusion reduction calculation, and improve the scenario performance.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.