Dog planing Learning Network] In the last month, I have supported some programs developed by the next generation of unity. Of course, they are also related to physical rendering. I suggest using skyshop marmoset, he is one of the outstanding unity4.x PBR solutions. However, in terms of usage and performance, there are still some pitfalls and deficiencies. Here are some of my experiences. I hope other friends who are interested in this solution can help me. I. There is no difference between the old version of the project and the latest version of the project before the national day when the fps is reduced. However, the running efficiency of the new version on the real machine is problematic, only 7.5fps. Development and running environment, XCode6.1 IOS8.1 IPad Air resolution 2048x1536 Graphic level GLES 3.0
Figure 1 run the old version. For reference, I copy the new version of the scenario file to the old project.
In the same scenario, the new version only has less than 8 FPS.
Then I used Xcode to analyze the two versions. The old version fps analyze
In the new fps analyze version, the same shader consumes 5 times the ms of the old version.
Locate the bug 1 at first, it was regarded as a marmoset version problem, but it was completely covered by the old version, and fps was not improved. 2. delete all the resources related to the scenario resources. After re-build, the fps will be restored. However, this solution should be nothing more than a temporary solution, replace the files in the ProjectSettings of the old project, and continue troubleshooting for specific reasons. In addition, some versions of Mobile shader of SkyShop do not support gles3.0. You need to add keywords to the shader of Marmoset used in all scenarios. # Pragma surface MarmosetSurf MarmosetDirect vertex: MarmosetVert exclude_path: prepass noforwardadd approxview # Pragma only_renderers d3d9 opengl gles d3d11 d3d11_9x gles3 Good morning, marmoset is designed by UberShader, so you only need to change it a few times. To enable unity to support es3.0, select Automatic or force gles3.0 in the graphic level of Player Settings.
If only es2.0 is used, because unity will use texCube or texCubeBias to replace texCubeLod, and there is no support for dles, the use of cubemap's mipmap to achieve roughness will be affected. The texture roughness may not be displayed in the final rendering screen. If this bug is solved, the performance evaluation should be the same as the report earlier than 10.1. the IPad Air can withstand full screen, and the 100 Draw Call supports drawing IBL objects, in addition, there are IBL shader bottlenecks. It should be on pixel shader. If one or two IBL materials occupy the entire 2048x1536 screen, there will also be potential bottlenecks. Ii. Solve the Problem in another scenario Draw Call can be viewed in Editor or online Porfile first. Some devices do not support the GPU Profile of Unity Editor, so Xcode is used.
Without any optimization, the draw call is 592, and the two shader instances in the GPU occupy the most, 9.8ms and 7.94ms are occupied by Terrain (Terrain and other and shadow textures ), in addition, after the shadow is disabled, it is set to 289 drawcall. After the terrain is disabled, 289-> 219 (terrain 70dc) and-> 246dc (water surface reflection) are disabled) First of all, there are many Terrain, both shader and dc. In the Frame view of Xcode, there are unreasonable ways to draw Terrain. From the perspective of the scenario, this kind of terrain requires 70 draw call times, which is exaggerated. In addition, on the shader, if it is a static baking shadow, it can be merged into a shader for painting. We recommend that you reduce the dc and merge the shader from optimizing the terrain. If the terrain of u3d is not optimized, you should directly create a grid in max to replace it. The reflection part can be optimized in the water surface, and the reflection in the static scenario can be pre-baked into a texture, while the reflection part is recommended to be added to a layer separately. Reduce the number of dc instances. In Xcode's analysis, the visual range, angle, and shader level of the camera are still drawn from a mountain in a very far location, and the shader is of the same high quality as the nearby role, this should be solved with the internal settings of unity. In addition, the angle of this demo camera is too low, so that objects in the distance are also rendered. If you modify the camera angle appropriately, such as the traditional 45-Angle View, you should be able to crop some scene objects. To reduce the filling rate of dc and ps. Iii. Improvement Suggestions on Marmoset shader 1. If you do not use the dynamic function of skyshop's skybox, skymanager's update function can disable public void LateUpdate (){ If (firstFrame ){ If (_ GlobalSky ){ FirstFrame = false; _ GlobalSky. Apply (0 ); _ GlobalSky. Apply (1 ); If (_ SkyboxMaterial ){ _ GlobalSky. Apply (_ SkyboxMaterial, 0 ); _ GlobalSky. Apply (_ SkyboxMaterial, 1 ); } } } # If UNITY_EDITOR If (! Application. isPlaying) return; # Endif Return directly before # if UNITY_EDITOR, which can save the cpu consumption and GC of skymanager Update. 2 GlossyMap usage. In order to solve the problem of 7 FPS, we tried a solution. In Anylaze, we directly modified the parameter tmpvar_40 = texturelayd (_ SpecCubeIBL, lookup_38.xyz, lookup_38.w ); For example, tmpvar_40 = textureld (_ SpecCubeIBL, lookup_38.xyz, 1); this improves the efficiency by more than one time.
Highp float glossLod_36; GlossLod_36 = tmpvar_27; Mediump vec4 spec_37; Mediump vec4 lookup_38; Highp vec4 tmpvar_39; Tmpvar_39.xyz = (v_33.xyz * tmpvar_32.x) + (v_34.xyz * tmpvar_32.y) + (v_35.xyz * tmpvar_32.z ))); Tmpvar_39.w = glossLod_36; Lookup_38 = tmpvar_39; Lowp vec4 tmpvar_40; Tmpvar_40 = fig (_ SpecCubeIBL, lookup_38.xyz, lookup_38.w );
There is no modified shader above, and there is no modification in the shader, such as the lookup_38.w. Based on the value of the glossy map, we can calculate it in the shader.
Tmpvar_39.w = glossLod_36; Lookup_38 = tmpvar_39; Lowp vec4 tmpvar_40; Tmpvar_40 = textureld (_ SpecCubeIBL, lookup_38.xyz, 1.0); directly transmits a const value method in the shader. After hot update, we can see that the number of milliseconds consumed by the shader is reduced to a certain extent.
However, if you restart the game, ms will be reduced by about 40%. This is because of IOS optimization. If you have determined before the shader is running, the texture created by pre fetch will be directly used, you do not need to re-execute a sample in each pix shader. Therefore, we recommend that you use a proper amount of glossy map. For materials that do not require details, you can directly use a constant roughness parameter. There is also the optimization of the pbr shader algorithm. marmoset itself has the keyword of MARMO_HQ internally, which can be optimized to a certain extent through switching. First, the comparison of the two quality types is marmo_scsi.
The difference between MARMO_HQ and shader lies in the accuracy of shader and the brightness of the output light.
The usefulness of MARMO_HQ serves to normalize vectors, which is similar to the conservation of energy. The algorithm selection of Fresenl functions makes the total amount of light emitted by rendering more in line with the physical effect. // Self-shadowing blinn # Ifdef MARMO_DIFFUSE_DIRECT Spec * = saturate (10.0 * dp ); # Else Spec * = saturate (10.0 * dot (N, L )); # Endif # Ifdef MARMO_DIFFUSE_DIRECT Spec * = saturate (10.0 * dp ); # Else Spec * = saturate (10.0 * dot (N, L )); # Endif # Ifdef MARMO_HQ LocalN = normalize (localN ); # Endif The above is a sample of some normalization settings in the shader code. There are other Optimizations in the shader
For example, Specular Intensity, Sharpenss, And fresnel Strength are not standard parameters of pbr Material. Generally, you only need to select one of them in roughness or glossy map as the roughness parameter. other parameters can reduce the calculation workload of some shader. The shader support is required for switching between glossymap and roughness, and the editor must be modified. For the fresenl reflection equation, marmoset provides two methods: splineFresnel and fastFresnel. However, the actual calculation workload is large. According to the unity5 solution on GDC2014, it can be changed to a simple 4-power pow format. Determine the direction of TextureCubeLod and the space that is optimized on the algorithm of the dump value. The Design of UberShader is an advantage. It can reduce the workload during development. In actual operation, there are also a small number of shader creation and compilation, facilitating the analysis and locating of shader problems, and optimizing and debugging in Xcode. In addition, the skymanager part of marmoset still needs to be modified. His original design goal is to generate Cubemap for IBL Based on the sky box, while the real IBL light is generated based on the surrounding environment, therefore, the whole scenario like skyshop is a unified cubemap method, which is worth considering in terms of authenticity and effect. Its IBL generation and management interfaces are not very convenient for games, the advantage is that it provides all the code generated by the editor and cumbemap. It is feasible whether to expand or modify bugs. In the unity4.x version, the Mobile End cannot support the delay rendering method. Therefore, the restrictions on the light source in the scenario are strict, and the light source in one direction is the limit, for other lighting scenarios, you can only use lightmap. The specific lightmap algorithm also needs to be implemented by the user. Marmoset is implemented in MarmosetDirect. cginc implements directional lightmap lighting. The LightingMarmosetDirect_DirLightmap name must be named according to the unity custom shader specifications, and can be automatically recognized in the unity rendering pipeline. If you want to optimize it, or to support other types of liaghtmap, such as Dual lightmap, you 'd better provide your own optimization method. We hope that TBDR support will be provided in our own projects to support a large number of light sources in the future. Finally, we should make a comparison with the UE4 mobile products:
The Sum Temple project of UE4 is a good reference. It can also achieve good results under the graphics specifications of gles2.0.
The graphic specification of UE4 mobile is simply a Directional light + distance shadow to generate illumination and shadow. In other scenarios, brightness and brightness of various colors are implemented by lightmap. In IBL, you can set RefelctCapture to generate IBL for objects in the specified region, multiple cubemap implementations can be used in the same scenario (in the ppt, he uses a unified cubemap on the Mobile End, this requires a real machine analysis of its project ). In addition, there are post-processing effects such as bloom + AA + light shaft + dof, but ES2.0 does not support real HDR due to the limitations of the graphic specifications. Next, I will prepare an analysis of UE4 rendering and optimization methods. 4. Conclusion: Through this U3D PBR experiment, there is still no problem in making PBR games on hardware-level machines such as IPAD Air and K1, at first, I worried about the filling rate of the Retina screen. In actual tests, it was still feasible. However, the entire development team should have a certain degree of optimization consciousness to ensure a good running efficiency, for example, to support the allocation of IBL, and in terms of game production, what kind of game should be considered in order to take advantage of PBR rendering, in particular, indirect lighting improves the quality of game scenarios (the most consumed IBLshader supports the highlights of Indirect lighting ). In addition, the batch function of unity is used to minimize dc and shader status switching. Unfortunately, due to the previous 7 FPS bug, this time, we don't have time to implement the post effect part of unity. I personally think we can port this part of ue4. UE4 can play a very good competitive role in U3D, in the future PBR performance and efficiency testing and optimization, some comparative analysis and reference are also very helpful. Disclaimer: This document is from the "Dog Learning Network" community. It is an Unity3D learning article published by netizens. If any content infringes on your rights and interests, please contact the official website, it will be processed in real time. More highlights: www.gopedu.com |