Unity Performance Optimization CPU Chapter

Last Update:2018-07-25 Source: Internet

Author: User

Tags serialization cpu usage

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Original link: http://blog.uwa4d.com/archives/optimzation_cpu.html

Performance optimization is an eternal topic in the development of game projects. Player needs and project requirements are constantly growing, with the number of people, screen effects and the complexity of the scene is always approaching the trend of "squeeze dry" hardware. Therefore, regardless of the level of hardware equipment development, no matter how rich experience in the development team, performance optimization is always a very difficult and can not be bypassed.

In the current game, performance optimization is mainly around the CPU, GPU and memory. Below, we will talk about these three aspects of the current mobile game projects in the existing common problems and corresponding solutions.

CPU Module

As far as the current unity mobile game is concerned, the CPU performance overhead can be attributed to two main categories: Engine module performance overhead and its own code performance overhead. Among them, the engine module can be subdivided into rendering module, animation module, physical module, UI module, particle system, loading module, GC call and so on. Because of this, in the UWA Assessment Report, we conduct detailed performance analysis on these modules to facilitate quick positioning of project performance bottlenecks, and to quickly troubleshoot and resolve problems based on our analysis and recommendations.

With a lot of performance evaluation data, we found that the rendering module, UI module and loading module often occupy the Top3 of the game CPU performance overhead. first, the rendering module

The rendering module can be said to be the most CPU-intensive engine module in any game, because almost all of the games are rendered with scenes, objects and special effects. For the optimization of the rendering module, we mainly start with the following two aspects:

(1) Reduce draw call

Draw call is a top priority in rendering module optimization, in general, the higher the draw call, the greater the CPU overhead of the rendering module. The reason for this is to start from the bottom driver and GPU rendering process, confined to the space we do not do too much introduction. Interested friends can check here, or their own Google related technical literature.

The method of reducing draw call is mainly to reduce the material type of the object being rendered, and reduce the number by the draw call batching. The Unity documentation provides a very detailed explanation of the principles and considerations of the draw call batching, and interested friends can view the official Unity document directly.

However, it is important to note that game performance is not as small as the draw call as the better. This is because, in addition to draw call, which determines the performance of the rendering module, there is a bus bandwidth for transmitting the rendered data. When we use the draw call batching to put together a mesh model of the same material, the data that needs to be transmitted at the same time (Texture, Vb/ib, and so on) can be greatly increased, causing the bandwidth to "jam" and the GPU can wait only if the resource is not able to transmit in time , which reduces the game's operating frame rate.

Draw call and bus bandwidth are both ends of the balance, and what we need to do is to maintain the balance as much as possible, and any side too high or too low for performance is useless.

(2) Simplifying resources

Simplifying resources is a very effective means of optimization. In a large number of mobile games, its rendering resources are actually "excessive", excessive grid resources, non-compliant texture resources and so on. Therefore, we have a detailed demonstration of the use of resources in the UWA Assessment Report (the number of triangular patches per frame rendered, the use of meshes and texture resources, etc.) to help you quickly find and refine the resources in question.

There are many ways to optimize the CPU of the rendering module, such as LOD, occlusion culling and culling distance, and so on. We will explain in more detail in the following technical topics of the rendering module, please look forward to. second, the UI module

The UI module is also a must-have module for almost all game projects. A well-performing UI module can push the game's user experience up a notch. Ngui as a UI solution is still very high in a large number of projects in the country. As a result, the UWA Assessment report provides great support for Ngui performance analysis, and we offer different performance analysis and optimization recommendations based on the UI solutions (Ugui or Ngui) that users use.

In the optimization of Ngui, Uipanel.lateupdate is the top priority of performance optimization, which is the most CPU-overhead function in Ngui, not one of them. The difficulty with UI module authoring is not in its performance, as the expressiveness of the UI interface is determined by the designer, but the performance overhead of the two fully consistent UI systems can vary widely. How to make the UI system use as little CPU overhead as possible to achieve the designer's expressive force is enough to test every UI developer's skills.

Currently, we present the most time-consuming functions of statistically significant CPU overhead in the UWA assessment report, and count their detailed CPU usage and heap memory allocations, allowing the development team to directly grasp the performance overhead of the UI system. In conjunction with the project screenshot, the UI module is intuitively positioned when there is a large overhead.

For uipanel.lateupdate optimization, the main focus is on the layout of the UIPanel, with the following principles: Separating dynamic UI elements and static UI elements into different uipanel as much as possible (rebuilding the UI in UIPanel) As far as possible, because of the changes in the UI elements caused by the refactoring control in a smaller scope, as far as possible, dynamic UI elements according to synchronization, that is, the different motion frequency UI elements as far as possible separated in different uipanel; control the number of dynamic UI elements in the same uipanel, the more The larger the mesh is created, the greater the cost of refactoring is significantly increased. For example, the Battle of the HUD movement of blood bars may appear more, at this time, it is recommended that the development team to separate the movement of blood strips into different uipanel, each group of UIPanel under 5~10 a dynamic UI appropriate. The essence of this approach is to minimize the uipanel reconstruction overhead in a single frame from a probability.

In addition, limited to space limitations, we will only describe the important performance issues in Ngui, and for the Ugui system and the UI system itself draw call issues, we'll be in the next UI module technical topics in detail to explain, please look forward to. three, loading module

The loading module is also an essential component of any game project. Unlike the previous two modules, the load module's performance overhead is relatively concentrated, mainly in the scene switching, and CPU consumption peaks are high.

Here, let's start by talking about the main manifestation of the performance overhead when the scene is switched. For the current version of Unity, the main performance overhead for scene switching is in two areas, scenario offload in the previous scenario, and scene loading for the next scene. Below, we will specifically say these two aspects of the performance bottleneck:

(1) Scene unloading
For the Unity engine, scenario offload is typically done automatically by the engine, which means that when we invoke an API like Application.loadlevel, the engine begins to process the previous scene and its performance overhead is dominated by the following parts: Destroy
The engine collects gameobject and its component that are not identified as "dontdestoryonload" when switching scenes, and then destroy. At the same time, the ondestory in the code is triggered, and the performance overhead here depends primarily on the code logic in the OnDestroy callback function. Resources.unloadunusedassets
In general, the API will be called two times during a scene switchover, one time for the engine to automatically invoke when switching scenes, and another for the user to call manually (typically after the scene is loaded, the user calls it to ensure that the resources of the previous scenario are unloaded cleanly). In the large number of projects we have evaluated, the CPU overhead of the API is mainly focused on 500ms~3000ms. The time-consuming overhead depends primarily on the number of asset and objects in the scene, and the more the number, the slower the time.

(2) Scene loading
The performance overhead of the scene loading process can be subdivided into the following sections: resource loading
Resource loading takes up more than 90% times of the entire loading process, and its load efficiency depends primarily on how the resources are loaded (resource.load or assetbundle), the amount of load (the size of the resource data such as textures, meshes, materials, etc.) and the resource format (texture format, Audio format, etc.) and so on. Different loading methods, different resource formats, the load efficiency is very diverse, so we in the UWA Assessment report, especially the specific use of each resource to display, to help users can immediately find the problem resources and timely correction. Instantiate instantiation
In the scene loading process, often with a lot of instantiate instantiation operations, such as UI interface instantiation, role/monster instantiation, scene building instantiation and so on. When instantiate is instantiated, the engine will see if its related resources have been loaded, and if not, it will load its related resources before instantiating, which is the root cause of most of the "instantiate time-consuming problems" that we have encountered. This is also why the resource dependencies we advocated in previous Assetbundle articles were packaged and preloaded to mitigate the stress of instantiate instantiation (the loading of assetbundle resources is another big story, We will explain in detail in the future Assetbundle loading technology topic).

On the other hand, the performance cost of instantiate instantiation is also reflected in the serialization of script code, and if there is a lot of information that needs to be serialized in the script, the time of instantiate instantiation will also be very long. The most straightforward example is Ngui, which has many serializedfield identities in its code, resulting in more code serialization overhead when instantiated. So, when you add serialization information to your code, this is something you need to keep an eye on all the time.

The above are the three most expensive modules in the game project, of course, the game type, different design, the other modules will still have a large CPU usage. For example, ARPG games in the animation system and physical systems, music and leisure games in the audio system and particle systems. For this, we will be in the following technical topics for detailed explanation, please look forward to. Iv. Code Efficiency

Logic code often occupies a large performance overhead in a more complex game project. This situation is very common in game types such as Moba, ARPG, and MMORPG.

In the process of project optimization, we often wonder what functions occupy a large amount of CPU overhead. At the same time, the performance overhead of most projects follows the "28 principle", which is that 80% of the performance overhead is concentrated on the 20% function. So, we count the CPU overhead in the project in the UWA evaluation report, not only to provide the overall cumulative CPU footprint of the code, but also to see the performance allocations within the function in a closer step to help you locate the problem function faster.

Of course, we also hope to provide you with more code performance information, such as the function of any frame more detailed performance allocation, more accurate screenshot information and so on. These are the features we are currently working on, and are available for use in subsequent releases.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More