Summary of Unity+ngui Performance optimization method

Source: Internet
Author: User

1 Resource separation package and load

There are many places in the game that use the same resources. For example, some interface will share the same font, the same atlas, some scenes will share the same posting, some monsters will use the same animator, and so on. You can separate these common resources from other resources and package them separately when making a game installation package. For example, if resources A and B both refer to resource C, separate C to make a bundle. When the game is running, if you want to load a, C is loaded first, and then if you want to load B, because the instance of C is already in memory, just load B directly and let B point to C. If the package does not separate C from a and B, then a in the package will have a copy of the C,B package will also have a copy of C, the redundant C will install the package is large, and at run time, if both A and B are loaded into memory, there will be two C instances in memory, increased memory consumption.

Resource separation packaging and loading is the most effective means to reduce the volume of the package and run-time memory consumption. The finer the granularity of the packaging, the smaller the two indicators, and the two drawcall can be merged when two renderqueue adjacent Drawcall use the same map, material, and shader instances. But the granularity of packing is not as fine as it is. If the runtime is to load a large number of small bundles at the same time, the load will be very slow-time is wasted on scheduling between the threads and small I/O on multiple batches, and drawcall merging does not necessarily improve performance and sometimes degrades performance, as described later. Therefore, it is necessary to strategically control the granularity of packaging. Generally only separate fonts and decals this larger public resource.

You can use Assetdatabase.getdependencies to learn what other resources are used by a resource.

2 Map transparent channel separation, compression format set to ETC/PVRTC

Originally we used DXT5 as the map compression format, hoping to reduce the footprint of the map, but soon found that the mobile platform graphics card is not supported by hardware decompression DXT5. So for a 1024x1024 size RGBA32 map, although DXT5 can be compressed from 4MB to 1MB, but the system will be sent to the video card before, the CPU will be in memory to extract it into 4MB RGBA32 format (software decompression), and then the 4MB into the video memory. So during this time, this map takes up 5MB of memory and 4MB of graphics, and mobile platform often does not have independent memory, need to pull a piece of memory from the RAM as video memory, so originally thought to occupy only 1MB memory of the map actually accounted for 9mb!

All compression formats that do not support hardware decompression have this problem. After some research, we found that the most extensive hardware support format on Android is etc, and Apple is PVRTC. However, both of these formats are non-transparent (Alpha) channels. So we separate the transparent channel of each original map and write it into the red channel of another map. Both of these posters use ETC/PVRTC compression. When rendering, two stickers are sent into the video memory. At the same time we modified the Ngui shader, in the rendering of the second map of the red channel is written to the first map of the transparent channel, restore the original color:

[plain] view plain copy
  1. Fixed4 Frag (v2f i): COLOR
  2. {
  3. Fixed4 Col;
  4. Col.rgb = tex2d (_maintex, I.texcoord). RGB;
  5. COL.A = tex2d (_alphatex, I.texcoord). R;
  6. return col * I.color;
  7. }

In this way, a 4MB 1024x1024 size RGBA32 original map is separated and compressed into two 0.5MB ETC/PVRTC maps (we use ETC/PVRTC 4 bits). The memory footprint of their rendering is 2X0.5+2X0.5=2MB.

3 Turn off read and write options for stickers

Each map imported in unity has a switch that enables readable writable (Read/write enabled), and the corresponding program parameter is textureimporter.isreadable. You can see this switch in the Import setting tab after you check the map. Only open this switch, you can use Texture2d.getpixel for the map, read or overwrite the map resource pixels, but this requires the system to keep a copy of the map in memory for the CPU access. There is no such requirement in the general game runtime, so we turn off this switch for all stickers and open it only when the map is processed in the edit (such as separating the transparent channel from the original map). In this way, the 1024x1024 size of the map mentioned above, its runtime 2MB memory consumption can be less than half, reduced to 1MB.

4 reducing the number of gameobject in a scene

Once we reduced the number of gameobject in the scene by nearly 20,000, and the game's memory footprint on the iphone 3S was immediately reduced by 20MB. Although these gameobject are basically hidden (Activeinhierarchy is false), they still consume a lot of memory. These gameobject also carry a lot of scripts, each script in each gameobject to be instantiated, is a very expensive memory consumption. As a result, we specified that the number of gameobject in the scene should not exceed 10,000, and that the Gameobject quantity is listed as the weekly version of the performance monitoring indicator.

5 Organizing the Atlas

The primary purpose of organizing the Atlas is to save run-time memory (although it can sometimes act as a merge Drawcall). From this point of view, the sum of the size of the atlas that is sent into the video memory when displaying an interface is as small as possible. There are generally ways to help us do this:

1) in the interface design, as far as possible to make the art of control design can do nine Gongra extension, that is, the type of uisprite is sliced. So the art can just cut out a small picture, and we pull it up in unity. Of course, a control to do nine Gongge also means that the number of vertices from 4 to at least 16 (nine Gongge of the central lattice using tiled tile type, the number of top points will be more), build drawcall will be more expensive (see 6th), But generally as long as the drawcall arrangements reasonable (see also 6th) there will be no problem.

2) Also in the interface design, as far as possible to allow the art to design the pattern into a symmetrical form. In this way, the art can only be cut in part, and we will spell out the complete pattern in unity. For example, to a circular pattern, the art can only cut out one-fourth; to a face, art can cut only half. However, similar to the 1th point, this method also has other performance costs-the number of vertices and gameobject of a pattern increases. As mentioned in the 4th, the increase in the number of gameobject is sometimes significantly more memory intensive. Therefore, this method is generally used only for large-sized patterns.

3) Make sure that you do not allow unnecessary mapping footage to reside in memory, and do not send extraneous mapping footage to video memory during rendering. In order to do this, we need to separate the atlas according to the interface, in general, one Atlas only puts the footage of one interface, and the uisprite in one interface does not use the Atlas of other interfaces. Suppose interface A and interface B have a small and identical gold coin icon, do not because of the convenience in the production, let interface a uisprite directly refer to the coin material in interface B, otherwise the interface a display, will be the entire interface B's Atlas is also sent to the video memory, and as long as a is in memory, The Atlas of B also resides in memory. In this case, an identical coin icon should be placed in the Atlas of A and B, and the uisprite in a only uses an atlas of A, and the Uisprite in B uses only the Atlas of B.

However, if there is a large number of identical footage between the two interfaces, then the two interfaces can share the same atlas. This reduces the total amount of memory used for all interfaces. The specific operation needs to be weighed against the art design. The larger the common interface between General interfaces, the smaller the memory burden of the program. But the interface between the same thing too much, the art effect may not be vivid, this is the art and procedures between the need to find a balance between the place.

In addition, a large number of icon resources (such as item icon) do not do in the Atlas, but should be used uitexture.

4) Reduce the white space in the chart set. The total transparent pixels in the set and the non-permeable pixels occupy the same amount of memory space. Therefore, in the case of the same amount of material, we should try to reduce the white space in the graph set. Sometimes a 1024x1024 atlas, the footage occupies less than half of the area, you can consider this atlas into two 512x512 atlas. (Someone might ask why you can't make a 1024x512 atlas, because the iOS platform seems to require that the stickers that feed into the video memory must be square.) Of course, two different Atlas Drawcall cannot be merged, but this is not a problem (see 6th).

It should be said that the arrangement of the Atlas in the specific operation does not have a constant standard, many times need to weigh the pros and cons to decide how to organize, because no matter what kind of measures will have other performance costs.

6 Place panel According to the design of each UI control, separate Drawcall

Once we found that Ngui's uipanel.lateupdate function had a very high CPU overhead. After careful study, it is found that too many drawcall have been combined, especially the Drawcall of UI controls that move at runtime and the static UI controls. When properties such as the position, size, or color of a UI control (uiwidget) change, UIPanel needs to rebuild the Drawcall used by the control, and in some cases all drawcall on the panel. Sometimes rebuilding a drawcall consumes a lot of CPU overhead, and it needs to recalculate vertex information for all the controls on the Drawcall, including vertex positions, Uvs, and colors. If a lot of controls are centered on the same drawcall, the vertices of all the controls on this drawcall will have to be re-traversed as long as a single control changes a little bit, and our UI is heavily stretched by nine Gongra, making the number of vertices of the control even more. So the cost of rebuilding a drawcall is even greater.

So we grouped the UI controls, and the controls that changed over time--like the blood bars on the monster's head and the damage jumps--were placed on the same panel, and the Panel had only those controls, and the rest of the basic unchanged controls were placed on the other panel. Thus, the two types of controls are separated into different drawcall panel, and when a control changes to cause Drawcall to rebuild, there is no need to traverse through the unchanged controls. Because in the art design, a period of time in the change of the control is always a few, so the optimization effect is very obvious, saving CPU utilization can reach 25%.

This method will add some drawcall, but will not have any effect. In the early stages of our project, we paid too much attention to the compression of drawcall quantities, but later found that adding a few drawcall was not so scary. The main process once even experimented with cocos2d-x, even in the case of 500 Drawcall, the animation can still run smoothly, in contrast, the size of the map is much more impact on the smoothness.

7 Optimize the internal logic of the anchor point so that it is updated only when necessary

After optimizing the panel's Drawcall rebuild efficiency at the previous point, we found that the Ngui anchor's own update logic also consumes a lot of CPU overhead. Even if the control is stationary, the anchor point of the control is updated every frame (see the Uiwidget.onupdate function), and its update is recursive, which makes the CPU usage higher. So we have modified the internal code of Ngui so that the anchor is updated only when necessary. It is generally only updated when the control is initialized and the screen size changes. However, the cost of this optimization is that when the vertex position of the control changes (such as when the control is moving, or the size of the control changes), the upper logic needs to be responsible for updating the anchor point.

8 reduce the map footage resolution

The trick is actually to reduce the size of the texture material. For example, for a map that is 100x80 in the original picture, we will reduce it to 50x40, which is twice times smaller, when we import it into unity. The game actually uses a zoomed-out map. However, this trick is bound to significantly reduce the quality of art, the art will immediately find the picture becomes more blurred, so generally less than the program will not be used.

9 Delay load and timed unload policy for interface (not implemented)

If some of the interfaces are less important and less frequently used, you can wait until the interface needs to open the display to load the resources from the bundle and unload the memory when it is closed, or wait a while before uninstalling. However, this method has two costs: first, it will affect the experience, the player asked to open the interface, the display of the interface will be delayed, the second is more prone to bugs, upper write logic to consider the asynchronous situation, when the programmer to access an interface, the interface may not be in memory. So far we have not yet implemented the programme. Now just go into a new scene and unload the interface that the previous scene used but the new scene won't use.

Of the above 9 methods, 4, 5, 6 need to some extent from the perspective of planning and art, and need to maintain monitoring to maintain the optimal state (because there will always be a new interface in the design of the need or change the needs of the old interface), and the other is once and for all the solution, as long as the implementation of stability, There is no need to expend energy on it. But 2 and 8 are ways to reduce the quality of art, especially 8. If the quality of art is not able to reduce the degree of tolerance, it may not be allowed to use these two methods.

Postscript

Then I learned another trick:

Avoid frequent calls to Gameobject.setactive

Summary of Unity+ngui Performance optimization method

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.