Unity Load Module Depth analysis of Shader chapter

Source: Internet
Author: User
Tags unity 5

Original link: http://blog.uwa4d.com/archives/LoadingPerformance_Shader.html

Next to the Unity Load module depth analysis of the grid, we focus on the load performance of grid resources. Today, we come back to you to uncover the load efficiency of shader resources.

This is the original Tiger science and technology articles, welcome to forward sharing, without the author's authorization do not reprint. At the same time, if you have any unique insight or find also welcome to contact us, to discuss together. (QQ group 465082844) resource load performance test code

As with the test code presented in the previous article, we also use the test code for load performance analysis of shader resources. At the same time, we make the shader file into a certain size of assetbundle file, and one by one through the following code on different devices to load, in order to get the corresponding resource load performance comparison.

Test environment
Engine version: Unity 5.2 version
Test equipment: Three different grades of mobile devices (Android: Red rice 2, red rice Note2 and Samsung S6) Shader Resources

The shader resource differs from the previous grid resource and texture resource, and its physical size is small. In general, the physical size of a shader resource is only a few kilobytes, and it is only dozens of KB in memory. Therefore, the efficiency of shader resource loading bottleneck is not on its own size loading, but on the resolution of shader content. Therefore, in this article, we select four of the most commonly used shader resources in our project to give you a better understanding of the specific cost of shader resource loading.

Test 1: Load efficiency test for different kinds of shader resources
We have selected four of the most commonly used shader resources in the current project, namely mobile-diffuse,mobile-vertexlit,mobile-bumped diffuse and Mobile-particles Additive. The memory consumption of the four groups of grid resources is 93KB, 115KB, 110KB and 6.9KB respectively, the corresponding Assetbundle size is 10KB, 7KB, 12KB and 3KB (Lzma compression).

We load these grid resources on three different models, and for less chance, repeat the 10 load operation on each device and take its average as the final performance cost. The specific test results are shown in the following table.


Through the above tests, we can get the following conclusions:

1, the physical volume and memory consumption of shader resources is very small, but its load time-consuming CPU is very high, mainly because of the high CPU cost of shader, which becomes the performance bottleneck of shader resource loading.

2, Mobile/particles additive in the resolution of time is much less than mobile/diffuse, mobile/bumped diffsue or even mobile/vertexlit;

3, in addition to mobile/particles additive outside, the other three mainstream shader in the loading will cause obvious drop frame, or even lag. Therefore, the research and development team should avoid the shader loading operation at the time of the non-switching scene as far as possible;

4. With the improvement of hardware equipment performance, the analysis efficiency difference is more and less obvious.

Test 2:mobile Shader vs. Normal Shader
In the UWA performance Assessment Report, we found that in addition to the use of mobile shader in the project, diffuse, bumped diffuse and other shader are also being used extensively. The difference in rendering between the two should already know, then there is a certain performance gap in the loading aspect.

For this, we conducted the following set of experiments. We added the corresponding normal Shader on the basis of the Shader resources in test 1, and repeated 10 load operations on three test devices and averaged them as the final performance cost, as shown in the following figure.

Through the above tests, we can get the following conclusions:

1, Mobile shader compared with the same normal shader in the loading aspect does have a certain performance improvement;

2, the lower the performance of equipment, the greater the performance gap, such as mobile/bumped diffuse and bumped diffuse load performance gap in the red Rice 2 low-end machine to achieve 30ms+.

See here, presumably most readers will be surprised, that is a few small shader, its loading time is more than a few Atlas texture or have tens of one-sided mesh mesh. Yes, yes, shader's loading overhead is often hundreds of or more milliseconds.

In the UWA performance Assessment Report, we found that the shader loading of a large number of projects accounted for a significant performance overhead. The following diagram is a time-consuming process for shader loading of a project during operation. As you can see, the shader load reaches a CPU footprint of hundreds of milliseconds at the time of the switching scenario, and a dozens of millisecond CPU time is associated with the normal copy. These can have a significant impact on the operating frame rate and the efficiency of the scene switching.

So, here's the question, how do we optimize it?

Before optimizing, the first thing we need to do is to understand the real time-consuming reason of shader parsing. In general, the CPU time spent shader loading is related to the number of keyword, and the more keyword the number, the greater the load cost. With Unity 5.x inspector, you can see that the 39,mobile/diffuse variable number of keyword variables keyword mobile/bumped diffuse 27,mobile/ The number of keyword variables for the keyword variable number of vertexlit 15,mobile/particles additive is 1. Similarly, in Unity 4.x, the number of keyword variables for mobile/bumped diffuse is 44,mobile/diffuse keyword variable quantity 25,mobile/ The number of keyword variables for the keyword variable number of vertexlit 6,mobile/particles additive is 0. This is also the main reason why the mobile/particles additive parsing overhead is so low.

Note: The number of keyword in shader will vary depending on the scene settings. In Unity 5.x, unity defaults to adjust the keyword of the Shader based on the scene settings, Shader Pass, and so on, for example, if there is lightmap use, the corresponding keyword will be opened by default, and for items that do not use fog, The relevant keyword will be closed directly.

After understanding the importance of keyword for shader loading efficiency, we need to find ways to reduce the keyword number of shader. In this respect, we recommend that the development team try the following methods:

Method One:
For unity 5.x projects, the relevant keyword can be removed directly from the shader using the skip_variants operation.

For example, in the Unity5.2 version, by default, the number of keyword for mobile/vertexlit is 15, as shown in the following figure. It can be seen that there are eight keyword in the shader fragment # # #, in which case we can add skip_variants operations to the corresponding shader code to remove them, then mobile/vertexlit the keyword number becomes 8 after removal, The original 8 keyword are no longer used, leaving only one default.


As can be seen from the above, this method can effectively reduce the number of keyword, but the method also has certain limitations, one is that the current skip_variants operation can only be used in Unity 5.0 or more, the second is that the method needs the research and development team to shader a certain degree of understanding, Shader can be modified according to the actual situation of the project.

Method Two:
Remove the fallback option directly from the shader. The fallback feature is that hardware devices that cannot use the current shader can be rendered using a lower-fallback shader for hardware devices to guarantee rendering stability. However, for the current mobile market, devices that do not support mobile/diffuse and mobile/bumped diffuse are already very small (or we have not yet encountered device feedback that does not support mobile/diffuse shader). Therefore, for projects that use mobile shader, you can try to remove them directly fallback to drastically reduce the number of keyword. In our test project, remove the fallback function, mobile/bumped shader keyword from the original 39 down to 12,mobile/diffuse from the original 27 down to 12.


This method does not completely remove the "useless" keyword as "method one", but the method is simple and easy to use, so it is cost-effective in a single step. At the same time, the method fully supports the Unity 4.x engine project.

Read here, you will certainly have a question, that is, even if the number of keyword can be lowered, shader the efficiency of the resolution will be how much improvement. In this respect, we carried out the following experiments:

Test 3: Load efficiency test on/off fallback function
For simplicity, we directly turn off the fallback function of mobile/bumped diffuse and mobile/diffuse to make a set of contrasting data. When fallback is turned off, the number of keyword for these two shader is 12, while the keyword of the original shader is 39 and 27.

As with Test 1, we repeat 10 load operations on three different grades of Android models and take their averages as the final performance cost. The specific test results are shown in the following figure.

As can be seen from the above test, the reduction of keyword can greatly reduce the analytic time of shader and improve the loading efficiency. Load Mode

Above, we show and analyze the loading performance, time-consuming cause and optimization method of shader through concrete experiments. In the real project development process, we also need to pay special attention to one point, that is shader loading mode. As seen in the figure below shader loading situation, each time you switch the scene, the shader load consumes a lot of CPU time, but in fact, this is actually a lot of the same shader repeated parsing caused. The reason is that because shader is packaged into different assetbundle files, each time the scene is switched, assetbundle are frequently loaded and unloaded, resulting in a large number of identical shader being repeatedly loaded and unloaded.

For the above questions, if you are using Assetbundle to load resources, then we recommend the following loading methods:

1, through the dependency relationship packaging, the project all the shader out and into a separate assetbundle file, and other assetbundle to build dependency;

2, shader Assetbundle file in the game after the launch of the load and resident memory, because a project shader kind of the number of general in 50~100, and each is very small, even if all resident memory, the total memory consumption will not exceed 2MB;

3, after the subsequent prefab loading and instantiation, the unity engine will find the corresponding shader resources directly through the dependency between Assetbundle, and no more loading and parsing operations.

Note: For the unity4.x version, shader Assetbundle load and parse all shader after loading, but for unity5.x version, Loadallassets is required in addition to shader.warmupalls operation. Haders operation, because in the unity5.x version, the parsing and creategpuprogram operations of shader are separate.

For the Unity 5.x version and the development team that is using resources.load to load resources, you can try using shadervariantcollection to preload the shader. The same can be achieved to avoid the same shader repeated loading effect.

Note: For unity5.x versions, if you can load and parse shader through Assetbundle, it is not recommended to handle shader loading through shadervariantcollection. In the latest Unity 5.3.5, we have been tested to find that shadervariantcollection still has some problems in the loading and management of shader, and we are not sure whether it is an engine problem at the moment, which is beyond the scope of this article. Don't dwell on it here.

Through the above testing and analysis, our management recommendations for shader resources are as follows:

1, in order to ensure the rendering effect and project requirements, as far as possible to reduce the number of shader keyword to enhance the load efficiency of shader;

2, for simple Shader, can try to remove fallback operation, this method is very suitable for the mobile/diffuse, mobile/bumped diffuse and other built-in Shader, which is currently being used extensively;

3, as far as possible to shader to separate, dependency packaging and pre-loading, to reduce the subsequent unnecessary loading costs.

The above is a performance test for shader resources at load time. On the performance of loading module, we will continue to introduce the audio, animation and other resources such as load performance analysis, resource offload performance analysis, resource instantiation performance analysis, different loading mode performance analysis, and a series of technical articles, and the current UWA tested the common problems of the project to summarize, In order to give you a more in-depth understanding of the load efficiency of the project, and improve the ability to control the loading module.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.