[Unity skills] Optimization Technology in Unity and unity skills Optimization Technology
Preface
This article is based on a series of tutorials by Digital Tutors and summarizes the extended content ~ Digital Tutors is a great tutorial website. It contains many materials in the multimedia field and is very cool! In addition, we also refer to a tutorial in Unity Cookie. There are many other references in the following link.
This article aims to briefly describe common optimization strategies. However, I have a very in-depth explanation of each basic, and you can go to relevant materials for the children's shoes you need.
There are also some reference articles that I think are very good:
Performance Optimization for Mobile Devices
4 Ways To Increase Performance of your Unity Game
Unite 2013 Optimizing Unity Games for Mobile Platforms
Unity optimization Tips
Factors Affecting Performance
First, we need to know the factors that affect the performance of the game. For a game, there are two main computing resources: CPU and GPU. They work together to make our games work at the expected Frame Rate and resolution. The CPU is responsible for the frame rate, while the GPU is mainly responsible for the resolution.
To sum up, the main performance bottleneck lies in:
- CPU
- Too many Draw CILS
- Complex script or physical simulation
- Vertex Processing
- Excessive vertices
- Excessive vertex-by-vertex Computation
- Fragment Processing
- Excessive fragment and overdraws
- Excessive pixel-by-pixel computing
- Bandwidth
- Large and uncompressed textures
- Framebuffer with high resolution
For the CPU, the main limit is the Draw CILS in the game. So what is Draw Call? If you have learned OpenGL, you must remember that before each drawing, we need to prepare the vertex data (Position, normal, color, texture coordinates, etc.) first ), then, call a series of APIs to place them in the specified locations that can be accessed by the GPU. Finally, we need to call the _ glDraw * command to tell the GPU, "hey, I have prepared everything, hurry up and work (rendering! ". When the _ glDraw * command is called, it is a Draw Call. So why does Draw Call become a performance bottleneck (and a CPU bottleneck )? As mentioned above, when we want to Draw an image, we must Call Draw Call. For example, if there is a tree in a scenario, we use a material and a shader when rendering the water. But when rendering the tree, we need a completely different material and shader, the CPU needs to re-Prepare the vertex data and re-set the shader, which is very time-consuming. If different material and textures are used for each object in a scenario, too many Draw calls will be generated, affecting the frame rate and reducing the game performance. Of course, this is very simple. For more details, Google. Other CPU performance bottlenecks include physical, fabric simulation, particle simulation, and so on.
For GPU, it is responsible for the entire rendering pipeline. It processes the model data transmitted by the CPU, performs Vertex Shader, Fragment Shader, and other work, and finally outputs each pixel on the screen. Therefore, its performance bottleneck may be related to factors such as the number of vertices to be processed, screen resolution, and display memory. The performance bottleneck includes both vertices and pixels. Overdraw is one of the most common performance bottlenecks in pixel processing. Overdraw indicates that we may draw pixels on the screen multiple times.
After learning the basic content above, the following optimization technologies are involved:
- Vertex Optimization
- Using the Level of detail Technology
- Occlusion culling Technology
- Pixel Optimization
- Control the draw Sequence
- Alert to transparent objects
- Reduce real-time illumination
- CPU Optimization
- Bandwidth Optimization
- Reduce texture size
- Scaling
The first part is vertex optimization.
Vertex Optimization
Optimize the body
This step is mainly for "vertex processing" in the performance bottleneck. The ry here refers to the grid structure of the objects in the scenario.
3D games are created by models. In modeling, we need to remember the following:
Minimize the number of triangles in the ModelSome vertices that have no influence on the model or are hard to detect the difference with the naked eye should be removed as much as possible. For example, in the left figure below, many vertices in the cube are not required, but importing this model into Unity will be the scene on the right:
In the Game view, we can view the number of triangles and the number of vertices in the scenario:
We can see that a simple square generates so many vertices that we don't want to see.
At the same time,
Reuse vertices whenever possible. In many 3D modeling software, corresponding optimization options are available to automatically optimize the grid structure. After optimization, a cube may have only eight vertices:
It corresponds to the number of vertices and the number of triangles as follows:
Wait! Here, you may ask, why is the number of vertices 24, not 8? Art friends often encounter this problem, that is, the number of model vertices displayed in the modeling software is different from the number of vertices in the Unity. Generally, there are many more Unity nodes. Who is right? In fact, they are calculated from different angles and all have their own principles, but what we should really care about is the number of Unity.
Here is a brief explanation. 3D software is more about understanding vertices from the perspective of our humans, that is, one point we see is one. Unity calculates the number of vertices from the GPU perspective. In the GPU's view, it seems that it is likely to be processed separately, resulting in additional vertices. There are two main reasons for splitting vertices: UV splits and Smoothing splits. They are actually because, for GPUs, each attribute of a vertex must have a one-to-one relationship with the vertex. UV splits is generated because there are multiple UV coordinates of a vertex During modeling. For example, in the previous cube example, because each plane has a common vertex, the UV coordinates of the same vertex may change on different faces. This is not understandable for the GPU, so it must split this vertex into two fixed vertices with different UV coordinates before it can be reconciled. The generation of Smoothing splits is similar. In different cases, a vertex may correspond to multiple normal information or tangent information. This is usually because we need to determine whether an Edge is a Hard Edge or Smooth Edge. Hard Edge is usually the following effect (note the crease in the middle ):
If you observe its vertex normal, you will find that each vertex at the crease actually contains two different normal lines. Therefore, for a GPU, it cannot understand such a thing, so it splits the vertex into two parts. On the contrary, Smooth Edge is the following:
For a GPU, it only cares about the number of vertices. Therefore, minimizing the number of vertices is what we really care about. Therefore, the last optimization suggestion is:
Remove unnecessary
Hard Edge and texture cohesion to avoid
Smoothing splits and UV splits.
Using the Level of detail Technology
Unlike the Mipmap technology, the level-of-failure (SLD) technology creates a model pyramid for the model. Based on the distance between the camera and the object, the model with different precision is selected. The advantage is that the number of vertices to be drawn can be greatly reduced when appropriate. Its disadvantage is that it also needs to occupy more memory, and if the distance is not adjusted, it may cause a simulated mutation.
In Unity, you can use the level-of-the-second (or more) Group to implement the level-of-the-second (level-of-detail) technology:
Through the above dashboard, we can select the model to be controlled and set the distance. The following example shows how a bucket is removed from a complete grid to a simplified grid:
Occlusion culling Technology
Occlusion Removal is used to eliminate objects invisible to other objects. This means that resources are not wasted on computing those invisible vertices, thus improving performance. Unity Taiwan has a series of articles for you to see (you need to flip the wall ):
Unity 4.3 Occlusion Culling: Basic
Unity 4.3 Occlusion Culling: Best Practices
Unity 4.3 Occlusion Culling: error diagnosis
You can find the specific content on your own.
Now let's talk about pixel optimization.
Pixel Optimization
The focus of pixel optimization is to reduce overdraw. As mentioned earlier, overdraw indicates that a pixel is drawn multiple times. The key is to control the draw sequence.
Unity also provides a view to view overdraw. In the Scene view, Render Mode-> Overdraw. Of course, the view here only provides the relationship between the layers of object occlusion, and it is not the overdraw of the final screen. That is to say, it can be understood that the overdraw is displayed if no deep test is used. In this view, all objects are rendered into a transparent contour, And the occlusion of objects is determined by viewing the cumulative degree of transparent colors.
As shown in the figure, the more intense the red color, the more serious the overdraw is, and all transparent objects are involved here, which means the performance will be greatly affected.
Control the draw Sequence
The main reason for the need to control the drawing sequence is to avoid overdraws to the maximum extent, that is, pixels at the same position may need to be drawn in a changeable manner. On PC, There are infinite resources. In order to get the most accurate rendering results, the Rendering sequence may be to draw opaque objects from the back to the front, and then draw transparent objects for mixing. However, on mobile platforms, this method may cause a large number of overdraw operations, which is obviously not suitable. We should try our best to draw from the past and beyond. The reason why overdraw can be drawn from the past and later is because of the credit of the deep test.
In Unity, the objects in the Shader that are set as "Geometry" Queues are always drawn from the past and the objects in other fixed queues (such as "Transparent" and "Overla, all are drawn from the back. This means that we can try to set the object queue to "Geometry ".
In addition, we can also take full advantage of the Unity queue to control the draw sequence. For example, for a sky box, it covers almost all pixels, and we know that it will always be behind all objects, so its queue can be set to "Geometry + 1 ". In this way, the overdraws will not be caused by it.
Always be alert to transparent objects
For transparent objects, because of their own characteristics (you can refer to a previous article about Alpha Test and Alpha Blending), we decided to get the correct rendering effect, it must be rendered from the back to the front (the depth method is not discussed here), and the depth test is discarded. This means that transparent objects will almost certainly cause overdraws. If we do not pay attention to this, it may cause serious performance on some machines. For example, for GUI objects, most of them are set to semi-transparent. If the GUI occupies a large proportion on the screen, and the main camera does not adjust it, IT projects the entire screen, then the GUI will cause a large number of overdraws on the screen.
Therefore, if a large area of transparent objects, or multi-layer transparent objects with a lot of multi-layer coverage (even if each of them can be small), or transparent particle effect, there will also be a large number of overdraws on mobile devices. This should be avoided as much as possible.
In this case, we can minimize the area occupied by the GUI in the window. If there is nothing we can do, we can hand over GUI and 3D scenes to different cameras, while the camera's angle of view for 3D scenes should not overlap the GUI as far as possible. In other cases, we can only say that it should be used as little as possible. Of course, this will have a certain impact on the appearance of the game, so we can judge the performance of the machine in the Code, for example, first disable all performance-consuming functions, if you find that this machine is doing very well, try to enable some special effects.
Reduce real-time illumination
Real-time lighting is very expensive for mobile platforms. If only one parallel light is fine, but if the scenario contains too many light sources and uses a lot of Passes shader, it is likely to cause performance degradation. In addition, there are also risks of shader failure on some machines. For example, if a scenario contains three pixel-by-pixel point light sources and a pixel-by-pixel shader is used, it is very likely that Draw CILS will be increased by three times and overdraws will be added. This is because, for Pixel-by-pixel light sources, objects illuminated by these light sources will be rendered again. Worse, either dynamic batch processing or dynamic batch processing (in fact, the document only mentions the impact on dynamic batch processing, but I don't know why the experiment results are useless for static Batch Processing). This kind of pixel-by-pixel pass cannot be batch processed, that is, they will interrupt the batch processing.
For example, in the following scenario, all four objects are marked as "Static", and the shader they use is the built-in Bumped Diffuse. All the point light sources are marked as "Important", that is, pixel-by-pixel light. We can see that the Draw cballs after running is 23, not 3. This is because only the Pass of "Forward Base" has a static batch processing (here, the dynamic batch processing has completely expired due to multiple passes), saving a Draw CILS, in the next "Forward Add" Pass, each rendering is a separate Draw Call (and the number of Tris and Verts is also increased ):
As mentioned in The document, The draw callfor "additional per-pixel lights" will not be batched. I am not very clear about the reason. I have a discussion here, but it means it has no effect on static batch processing. It is different from the results here. Please leave a message if you know the reason, thank you very much. I also asked questions in the Unity Forum.
We have seen many successful mobile games whose screen effects seem to contain a lot of light sources, but they are all deceptive.
Use Lightmaps
A common optimization strategy for Lightmaps. It is mainly used for the overall illumination effect in the scenario. This technology stores the illumination information in the scene in a single illumination texture in advance, and then obtains the illumination Information Based on Texture sampling at runtime.
Of course, we also work with Light Probes technology. Feng yuchong has a series of articles, but it has been a long time, but I believe there are many online tutorials.
Use God Rays
This method is used to simulate the effects of many small light sources. They are generally not produced by a real light source. In many cases, they are simulated by transparent textures. For more information, see the previous article.
CPU Optimization
Reduce Draw CILS
Batching)
The most optimization tutorials are required. The most common is Batching. In terms of name, it means processing multiple objects in one piece. So what kinds of objects can be processed together? The answer is
Objects of the same material. Therefore, for objects using the same material, the difference only lies in the difference of vertex data, that is, the difference in grid. We can combine the vertex data and send it to the GPU together to complete a batch processing.
There are two batch processing methods in Unity: Dynamic batch processing and static batch processing. For dynamic batch processing, the good news is that all processing is automatic, and we do not need to perform any operations on our own, and objects can be moved, but the bad news is that there are many restrictions, this mechanism may be damaged accidentally, and Unity cannot batch process objects with the same material. For static batch processing, the good news is that there is a high degree of freedom and few restrictions. Bad messages may occupy more memory, and all objects after static batch processing cannot be moved.
First, dynamic batch processing.
The condition for Unity to perform dynamic batch processing is that the object uses the same material and meets certain conditions.. Unity always performs dynamic batch processing for us without knowing it. For example, the following scenario:
This scenario contains four objects, two of which use the same material. As you can see, its Draw CILS is now 3 and the Save by batching is 1. That is to say, Unity saves us 1 Draw Call by Batching. Next, let's change the size of one of the boxes to see what will happen:
It can be found that Draw cballs is changed to 4, and the number of Save by batching is also changed to 0. Why? They only use one material. The reason is the other conditions that need to be met. Although dynamic batch processing is moving automatically, it has many requirements on the model:
- The maximum number of vertex attributes is 900, and may change in the future. Do not rely on this data.
- In general, all objects must use the same scaling Scale (which can be (1, 1, 1), (1, 2, 3), (1.5, 1.4, 1.3) and so on ). However, if the scaling scale is not uniform (that is, the scaling scale for each dimension is different, for example (1, 2, 1 )), then, if all objects use different non-uniform scaling, they can be processed in batches. This requirement is weird. Why is batch processing related to scaling? This is related to the technology behind Unity. If you are interested, You can Google it on your own, for example, here.
- Objects Using lightmap are not processed in batches. The shader with multiple passes will interrupt the batch processing. Objects that receive real-time shadows are not processed in batches.
In addition to the most common scenario where the batch processing is damaged due to scaling, There are also constraints on vertex attributes. For example, in the above scenario, we add the box model that has not been optimized before:
We can see that Draw cballs is changed to 5 at once. This is because the newly added box model contains 474 vertices, And the vertex attributes used by the model include location, UV coordinate, and normal. The total used exceeds 900.
As there are so many conditions for dynamic batch processing, it will not work unless careful, so Unity provides another method for static batch processing. In the above example, we keep the modified scaling, but check the "Static Flag" of the four objects:
Click the triangle drop-down box next to Static. We can see that many things have been set in this step. Here we only want "Batching static. Now let's look at Draw CILS again. Well, there is still no change. But don't worry. Let's click "run" and the change will appear:
Draw CILS returns to 3, and the Save by batching value is 1. This is the benefit of static batch processing. Furthermore, if we view the Mesh of the model at runtime, we will find that they all become something called the Combined Mesh (roo: scene. This grid is the result of Unity merging all objects marked as "Static". In our example, there are four objects:
You can ask, why can these four objects not all use a material? If you observe it carefully, you will find "4 submeshes" in it. That is to say, the merged mesh actually contains four sub-grids, which are our four objects. For the merged mesh, Unity judges that the submesh uses the same material and then processes them in batches.
However, after careful consideration, we can find that our boxes use the same grid, but after merging, they become two. In addition, we observe the "VBO total" in the Stats window before and after running. Its size has changed from 241.6KB to 286.2KB, which has become larger! Do you still remember the disadvantages of static batch processing? That is, it may occupy more memory. The document is written as follows:
"Using static batching will require additional memory for storing the combined geometry. if several objects shared the same geometry before static batching, then a copy of geometry will be created for each object, either in the Editor or at runtime. this might not always be a good idea-sometimes you will have to sacrifle ice rendering performance by avoiding static batching for some objects to keep a smaller memory footprint. for example, marking trees as static in a dense forest level can have serous memory impact."
That is to say, if some objects share the same grid (such as the two boxes) before static batch processing, each object will have a replica of the grid, that is, a grid will become multiple grids and be sent to the GPU. In the preceding example, the VBO size increases significantly. If there are many objects using the same mesh, this is a problem. In this case, we may need to avoid using static batch processing, which means sacrificing certain rendering performance. For example, if you use static batch processing in a forest that uses 1000 repeated tree models, the result will produce 1000 times of memory, which will cause serious memory impact. In this case, either we can tolerate this kind of method of sacrificing memory in exchange for performance, or do not use static batch processing instead of dynamic batch processing (the premise is that everyone uses the same scaling size, or you can use different non-uniform scaling sizes), or write your own batch processing method. Of course, I think it is best to use dynamic batch processing.
TIPS:
- Select static batch processing as much as possible, but always be careful about memory consumption.
- If you cannot perform static batch processing, but want to use dynamic batch processing, be careful with the various precautions mentioned above. For example:
- Make such objects as few as possible and make them contain a small number of vertex attributes.
- Do not use uniform scaling, or use different non-uniform scaling.
- Dynamic batch processing can be used for small items in the game, such as gold coins that can be picked up.
- For such objects that contain animations, we cannot use Static batch processing. However, if there is a moving part, we can mark this Part as "Static ".
Some discussions:
How static batching works
Static batching use a ton of memory?
Unity3D draw call optimization
Merge textures (Atlas)
Although batch processing is a good method, it is easy to break its rules. For example, objects in a scenario use Diffuse materials, but they may use different textures. Therefore, it is a good idea to merge as many small textures as possible into one Atlas.
Use vertex data of the grid
But sometimes, apart from different textures, there are also some minor parameter changes in materials for different objects, such as different colors and some floating point parameters. But iron's law is,
Both dynamic batch processing and static batch processing require the same material.. Is the same, not the same, that is, the materials they point to must be the same object. This means that as long as we adjust the parameters, all objects using this material will be affected. So what should we do with minor adjustments? Since the rules in Unity are very dead, we have to think about some "tricks", one of which is to use the vertex data of the mesh (the most common is the vertex color data ).
As mentioned above, after batch processing, an object will be processed into a VBO and sent to the GPU. Data in the VBO can be transmitted as input to the Vertex Shader, therefore, we can skillfully control the data in VBO to achieve different effects. In one example, the same material is used for all trees in the forest. We hope they can be implemented through dynamic batch processing, but the colors of different trees may be different. In this case, I can adjust the vertex data of the mesh. For more information, see the following article.
However, this method requires more memory to store the vertex data used to adjust parameters. No way, there will never be a perfect method.
Bandwidth Optimization
Reduce texture size
As mentioned earlier, Texture Atlas can help reduce Draw CILS, and the size of these textures is also a problem to be considered. Before that, we should mention that the aspect ratio of all textures should be square and the length value should be an integer power of 2. This is because there are many optimization strategies that can be used to the maximum extent only in this case.
You can view texture parameters in Unity through the texture panel:
You can adjust the parameters through the texture's Advance panel:
For more information about the preceding parameters, see. The Optimization-related options include "Generate Mip Maps", "Max Size", and "Format.
"Generate Mip Maps" creates many small textures of different sizes for the same texture to form a texture pyramid. In the game, you can dynamically choose which texture to use based on the distance between objects. This is because even if we use a very fine texture when we are far away from an object, it cannot be identified by the naked eye, in this case, we can use smaller and more fuzzy textures instead, which greatly reduces the number of access pixels. But its disadvantage is that it needs to create an image pyramid for each texture, so it needs to occupy more memory. For example, in the above example, before selecting "Generate Mip Maps", the memory usage is 0.5 Mb, and after selecting "Generate Mip Maps", the memory usage is 0.7 MB. In addition to memory usage, we do not want to use Mipmaps, such as GUI textures, in some cases. We can also view the generated Mip Maps in the panel:
In Unity, you can also view the usage of the Mip Maps of objects in a scenario. More specifically, it shows the ideal texture size of an object. Red indicates that the object can use a smaller texture, and blue indicates that a larger texture should be used.
"Max Size" determines the length and width of the texture. If the texture itself exceeds the maximum value, Unity will narrow down the texture to meet this condition. Again, the aspect ratio of all textures is preferably a square, and the length value is preferably an integer power of 2. This is because there are many optimization strategies that can be used to the maximum extent only in this case.
"Format" is used to compress the texture. Generally, you can select this automatic mode. Unity will be responsible for selecting the appropriate compression mode based on different platforms. For GUI textures, we can choose whether to perform Compression Based on the image quality requirements. For details, see the previous article on image quality.
We can also choose textures of different resolutions based on different machines, so that the game can run on some old machines.
Scaling
In many cases, resolution is also the cause of performance degradation. In particular, in addition to high resolution, many domestic hosts are currently in a mess with other hardware. This is precisely the two bottlenecks of game performance: large Screen Resolution + bad GPU. Therefore, we may need to scale down the resolution of a specific machine. Of course, this will lead to a decline in the game performance, but the performance and the picture will always be a question to weigh.
You can directly call Screen. SetResolution to set Screen resolution in Unity. In actual use, there may be some situations. Yu Song MOMO has an article about this technology. You can check it out.
Conclusion
This article is a summary, so there is no detailed explanation of each technology. We strongly recommend that you read the various links at the beginning of the article and write them well.