Chat optimization: From draw calls to GC

Source: Internet
Author: User

C # Language Specification

Read Catalogue

    • Objective:
    • See where the optimization needs to start?
    • Optimizations for the CPU:
    • Optimization of the GPU
    • Optimization of memory
    • Update, use the Unity Profiler tool to detect memory
Back to catalogue preface:

When I first wrote this article, I chose a very earthy topic ... "Unity3d Optimization Full Analysis". Because this is a temporary spur to write the article, and the statement is the existing facts, so to their "dou (BI)" Processing left room for a lot less. But also think this piece is have to mention a place, usually see a lot of people here also give neglect, need to go online to pick up some bits of information. Also coincides with the year before, thinking about the weekend to write something to meet the holiday is not sure to read, so simply wrote this temporary article. The topic is very earth, because with the direction of the very clear "unity3d", Let people less ya (blind) Think of space, at the same time with the "tall full" such word-formation, also let bastard have become the potential of the target ... So the final change to now you see the topic. Words do not say, the following is the beginning of the body ~ is so-called "grass snake Gray Line, V-Pulse thousands of miles." Well, let's ~~~~~~ first.

Go back to the catalog and see where optimization needs to start?

Bastard impression in the children's Boots, Unity3d project optimization will mention Drawcall, which is naturally true, but also very bad influence. Because this will give people a wrong understanding: The so-called optimization is to compare the drawcall must not.

There are a few people who have this first impression of optimization, Drawcall is indeed a very important indicator, but not all. In order for you and bastard to achieve as much consensus as possible, bastard first introduces the several concepts that this article may involve, and then proposes the three main aspects of optimization:

    • What's Drawcall? It's actually a call to the underlying graphics program (for example, OpenGL ES) to draw something on the screen. So, who is going to call these interfaces? Cpu.
    • What's fragment? Often some people say VF What, vertex we all know is the vertex, that fragment is what? Say it before you say it. Pixels, pixels, you know what? Pixels are the basic units that make up a digital image. What about the fragment? Is something that is likely to become a pixel. What do you mean, it's possible? Is that it will eventually be drawn not necessarily, is a potential pixel. Who does this involve? Gpu.
    • What's batching? Do you know what the batch is for? Yes, merging objects that require a lot of calls (Drawcall) before batching, and then just calling the interface of the underlying graphics program once. It sounds like the ultimate solution to optimization! However, the ideal is beautiful, the world is cruel, some shortcomings after we talk.
    • Memory allocation: Remember, in addition to unity3d own memory loss. We still have mono, and we have a managed set of things. Not to mention your excitement, but also introduced a few of their own DLLs. These are the memory overhead that need to be taken into account.

Well, the text of several concepts in advance to speak clearly, in fact, you can also see the bastard next to say bastard attention to the optimization of the need to pay attention to:

    • CPU aspect
    • GPU Aspect
    • Memory aspects

Therefore, this article will also follow the CPU---->GPU----> Memory order.

Back to the optimization of the directory CPU:

As mentioned above, drawcall affects the efficiency of the CPU and is also the best known optimization point. But in addition to drawcall, what other factors will affect the efficiency of the CPU? Let's make a list of what we can get for the moment:

    • Drawcalls
    • Physical Component (Physics)
    • GC (what?) Isn't the GC dealing with memory problems? Bastard, don't you lie to me! However, Bastard also reminds me that GC is used to handle memory, but who uses GC to handle memory? )
    • Of course, there's the code quality.

As I said earlier, Drawcall is the CPU calling the underlying graphics interface. For example, there are thousands of objects, each of the rendering needs to call the underlying interface, and every time the CPU needs to do a lot of work, then the CPU must be overwhelmed. But for GPUs, the work of graphics processing is the same. So the optimization of the drawcall is mainly to liberate the CPU from the overhead of invoking the graphics interface as much as possible. So the main idea for Drawcall is that each object minimizes the number of renders, and multiple objects are best rendered together. Therefore, according to this idea has the following several scenarios:

    1. Use draw call batching, which is to depict the invocation batch. Unity can combine some objects at run time to render them with a depiction tune. The details are described below.
    2. Minimize the use of materials by packaging textures into Atlas.
    3. Minimize the use of reflective, shaded, and the like, because that will make the object more than one render.
Draw Call Batching

First of all, we need to understand why 2 objects that do not use the same material, even with batch processing, cannot achieve a decrease in the number of draw call and a performance improvement.

Because the mesh model of the 2 objects being "batched" needs to use the same material, the texture is the same, so that the purpose of simultaneous rendering can be achieved. The same material is guaranteed so that the textures are rendered the same.

Therefore, in order to combine 2 different textures, we need to take the second step listed above and package the textures into an atlas. In this case, the combination of 2 textures is a texture. This allows us to replace the previous 2 material with just one material.

and draw call batching itself, will also be subdivided into 2 kinds.

Static Batching Batch Processing

Look at the name, guess the use of the scene.

Static? That's not moving. What's more? Well, it sounds like the state will not change, there is no "life", such as mountain Rock stone, building school what. Well, what's that more like? Well, smart guys must feel like the nature of the scene! So our scenario seems to be able to use this approach to reduce draw call.

So write a definition: As long as these objects do not move and have the same material, static batching allows the engine to batch operations on any size geometric object to reduce the depiction call.

So how do you use static batches to reduce draw call? You just have to specify which objects are stationary and never move, rotate, or scale in the game. To complete this step, you only need to tick the static check box in the detector (Inspector), as shown in:

What about the effect?

For example: Create 4 new objects, namely Cube,sphere, Capsule, Cylinder, which have different mesh models, but also have the same material (default-diffuse).

First, we don't specify that they are static. The number of Draw call is 4 times,

We now set all 4 of them as static and run them:

, the number of Draw call becomes 1, and the number of saved by batching becomes 3.

One of the many benefits of static batching is that there are a lot less constraints than the dynamic batching that is said below. Therefore, it is generally recommended that the static batch of draw call reduce the number of draw call. So then, let's talk about the dynamic batching of draw call.

Dynamic Batching Dynamics Batch processing

There is Yin Yang, there is static, so after the static batch processing, definitely followed by the dynamic batch processing. The first thing to make clear is that the Unity3d draw call dynamic batching mechanism is automated by the engine, without the need to manually set static like static batching. Let's give an example of a dynamic instantiation of the prefab, if a dynamic object shares the same material, then the engine automatically optimizes for draw call, that is, using batch processing. First, we make a cube into a prefab and then instantiate it 500 times to see the number of draw call.

for (int i = 0; i <; i++) {    gameobject cube;    Cube = gameobject.instantiate (prefab) as Gameobject;}

Number of Draw call:

You can see that the number of draw call is 1, and the number of saved by batching is 499. And in this process, we didn't do anything but instantiate the creation of the object. Yes, the Unity3d engine automatically handles this for us.

But there are a lot of child boots that are also in this situation, that is, I am also the object created from the prefab instantiation, why my draw call is still very high? That's what Bastard said earlier, there are many constraints to the dynamic batching of draw call. The following bastard is a demonstration of the creation of a simple object such as cube, which would cause the draw call to soar if a slight carelessness was made.

We also create 500 objects, different 100 of them, each with a different size, that is, the scale is different.

for (int i = 0; i <; i++) {    gameobject cube;    Cube = gameobject.instantiate (prefab) as Gameobject;    if (i/100 = = 0)    {        Cube.transform.localScale = new Vector3 (2 + i, 2 + I, 2 + i);}    }

Number of Draw call:

We saw that the number of draw call rose to 101 times, and the number of saved by batching dropped to 399. As you can see, it's just a simple cube creation, and if the scale is different, it's not going to be optimized for batch processing, crossing. This is only a constraint to the dynamic batching mechanism, so let's summarize the constraints of dynamic batching, and you might also find out why dynamic batching doesn't work in your own projects:

    1. Batching dynamic objects requires a certain amount of overhead on each vertex, so dynamic batching only supports mesh objects that are less than 900 vertices.
    2. If your shader uses three properties such as vertex position, normal and UV values, then you can only batch objects below 300 vertices, and if your shader needs to use vertex position, normals, uv0,uv1 and tangent vectors, you can only batch objects below 180 vertices.
    3. Do not use zoom. Two objects with a scaling size (1,1,1) and (2,2,2) will not be batched.
    4. Uniformly scaled objects are not batched with objects that are not uniformly scaled.
    5. Two objects using scale scale (1,1,1) and (1,2,1) will not be batched, but two objects using the scale scale (1,2,1) and (1,3,1) will be batch-processed.
    6. Instantiating objects using different materials (instance) will cause the batch to fail.
    7. Objects with Lightmap contain additional (hidden) material attributes, such as lightmap offset and scaling coefficients. Therefore, objects that have lightmap will not be batched (unless they point to the same part of Lightmap).
    8. Multi-channel shader can interfere with batch operations. For example, almost all shaders in unity support multiple light sources in forward rendering and effectively open multiple channels for them.
    9. Instances of the preset body automatically use the same mesh model and material.

Therefore, use static batching as much as possible.

Physical components

Once upon a time, bastard in a strategy game need to formations on the cell, and to detect which mantell in which lattice bastard chose to use the ray, because the soldier unit a lot, and in order to accurate every frame will perform detection, then the burden of CPU called a miserable. Later bastard decisively abandoned this practice, and had a psychological shadow over the physical components.

Here bastard only to mention 2 points bastard feel more important optimization measures:

1. Set a suitable fixed timestep. Location of the setting

What is the "fit"? First we need to understand the relationship between the fixed timestep and the physical components. What is the most important physical component, or a component that simulates various physical effects in the game? Calculate AH. Yes, it takes a calculation to show real physical effects in a virtual game. Then the fixed timestep is related to the physical calculation. Therefore, if the frequency of calculation is too high, it will naturally affect the CPU overhead. At the same time, if the calculation frequency can not meet the requirements of the game design, there will affect the implementation of the function, so how to choose the specific analysis, select a suitable value.

2. Just don't use grid collider (Mesh collider): Why? Because it's too complicated. A grid collider leverages a grid resource and builds a collider on it. For collision detection on complex mesh models, it is much more accurate than the application of prototype Collider. A mesh collider marked as raised (convex) can collide with other mesh collider. You have to search the net Mesh Collider pictures, naturally will understand. Our mobile games naturally do not need this cost-effective things.

Of course, from the perspective of performance optimization, the physical components can be used less or less as well.

A GC that handles memory but makes the CPU hurt

Does it feel weird to talk about GC in the CPU section? In fact, small bastard do not think so, although the GC is used to deal with memory, but it does increase the CPU overhead. So it can actually achieve the effect of releasing memory, but the cost is heavier, it will increase the burden of the CPU, so the optimization goal for GC is to trigger the GC as little as possible.

First we want to make clear that the so-called GC is the Mono runtime mechanism, rather than the mechanism of the Unity3d game engine, so the GC is primarily for Mono's object, and it manages the managed heap of mono. With that in mind, you understand that GC is not designed to handle the memory release of the engine's assets (textures, sounds, etc.) because the U3D engine also has its own heap of memory instead of using the so-called managed heap with Mono.

Second, we need to figure out what's going to be assigned to the managed heap. Good, is the reference type. such as instances of classes, strings, arrays, and so on. As int,float, including struct structs are actually value types that are allocated on the stack rather than on the heap. So the object of our attention is nothing more than class instances, strings, arrays of these.

So when does the GC trigger? Two cases:

    1. First of all, of course, when our heap is low on memory, the GC is automatically called.
    2. Second, as programmers, we can call the GC ourselves manually.

Therefore, in order to achieve the purpose of optimizing the CPU, we can not frequently trigger GC. The above also says that the GC deals with the managed heap, not the resources of the Unity3d engine, so the GC optimization is plainly the code optimization. Then bastard feel that there are the following points to note:

    1. Processing of string connections. Because the process of connecting two strings is actually the process of generating a new string. The old strings of the previous ones naturally became rubbish. As a string of reference types, the space is allocated on the heap, and the space of the discarded old string is garbage collected by the GC.
    2. Try not to use foreach, but instead use the for. foreach actually involves the use of iterators, and it is rumored that each iteration of the loop produces a bytes of garbage. Then the Loop 10 times is 240Bytes.
    3. Do not directly access the Gameobject Tag property. For example if (Go.tag = = "Human") it is best to change to if (Go.comparetag ("Human")). Because the tag property of accessing an object will allocate additional space on the heap. If you do this in a loop, you can imagine the rubbish left behind.
    4. Use the "pool" to make space reusable.
    5. It is best not to use LINQ commands because they allocate temporary space and are also the target of GC collection. And I hate it. The point of LINQ is that it is possible that in some cases it will not perform well in AOT compilation. For example, "by-products" generates an internal generic class "Orderedenumerable". This is not possible at the time of AOT compilation because it is only used in the method of an order by. So if you use the by-application, you might get an error on the iOS platform.
Code? Script?

Talk to the code this topic, perhaps some people will think bastard superfluous. Because the quality of the code varies from person to person, it is difficult to have a clear standard of judgment as mentioned above. Also, the public is justified in writing, and the PO is justified. But bastard the so-called code quality here is based on a premise: Unity3d is written in C + +, and our code is written in C # as a script, then the question is-does the scripting and underlying interaction overhead need to be considered? That is, we use Unity3d to write the game's scripting language, that is, C # is hosted by the Mono runtime. The function is implemented by C + + of the underlying engine, and the function implementation in "game script" can not be separated from the call of the underlying code. So what is the cost of this part, and how should we optimize it?

    1. As an example of an object's transform component, we should only access it once, then retain its reference, not every use. Someone here has done a little experiment, that is, by comparing the method getcomponent<transform> () to get the Transform component, by Monobehavor the Transform property to fetch, And the time it takes to access it after retaining the reference:
      • Getcomponent = 619ms
      • Monobehaviour = 60ms
      • CACHEDMB = 8ms
      • Manual Cache = 3ms

2. As mentioned above, it is best not to use getcomponent frequently, especially in loops.

3. Good at using onbecamevisible () and onbecamevisible () to control the execution of an object's update () function to reduce overhead.

4. Use the built-in array, for example with instead of the new Vector (0, 0, 0);

5. Optimization of method parameters: good at using the ref keyword. The parameters of a value type are passed by value to the method by copying The value of the argument to the parameter, which is what we usually say by value. Copy it, always make people feel very cumbersome. For example, a more complex value type such as matrix4x4, if you copy a new copy directly, instead of passing a reference of a value type to the method as a parameter.

Well, the CPU part bastard think it's about the same. The following is a brief chat in fact bastard is not very familiar with the part of GPU optimization.

Back to the optimization of the directory GPU

The GPU differs from the CPU, so the focus is naturally different. The bottleneck of GPU is mainly in the following aspects:

    1. Fill rate, which can be easily understood as the number of pixels per second that the graphics processing unit renders.
    2. Pixel complexity, such as dynamic shading, lighting, complex shader, etc.
    3. Complexity of geometry (number of vertices)
    4. And, of course, GPU memory bandwidth.

So for the above 4 points, in fact, we can find that the impact of GPU performance is nothing more than 2, on the one hand is too many vertices, pixel computation is too complex. The other is GPU memory bandwidth. The two-way approach to confrontation is also obvious.

    1. reduce The number of vertices, simplifying computational complexity.
    2. shrink the picture to fit the memory bandwidth.
Reduce the number of draws

So the first aspect of optimization is to reduce the number of vertices, simplifying complexity, the specific actions are summarized as follows:

    • Keep the number of materials as small as possible. This makes unity easier for batch processing.
    • Use a texture atlas (a large map contains many sub-maps) to replace a series of individual small maps. They can be loaded more quickly, have very little state transitions, and are more friendly to batch processing.
    • If you are using a texture Atlas and a shared material, use renderer.sharedmaterial instead of renderer.material.
    • Use light textures (lightmap) instead of real-time lights.
    • The advantage of using LOD is that the details of objects that are far away and cannot be seen can be overlooked.
    • Occlusion culling (occlusion culling)
    • Use the mobile version of shader. Because it's simple.
Optimize Memory bandwidth

What's the second direction? Compress the picture to reduce the pressure on the memory bandwidth.

    • OpenGL ES 2.0 uses ETC1 format compression and so on, in the packaging settings there are.
    • Use Mipmap.

Here bastard to highlight what Mipmap is. Because someone said that Mipmap will take up memory, but why optimize memory bandwidth? Then you have to start talking about what Mipmap is. A picture can actually solve this problem.

Above is an example of how the mipmap is stored, and the main diagram on the left is accompanied by a series of smaller, piecemeal backup plots

Is it a very clear glance? Each level of the mipmap is a small copy of the narrowing detail of a particular proportion of the main graph. The memory footprint is larger than before because the main diagram and its smaller replicas are saved. But why is the memory bandwidth optimized? Because it can be based on the actual situation, choose the appropriate small map to render. So, although it consumes some memory, it is recommended for the quality of the image rendering (better than compression).

Back to the optimization of directory memory

Since it's time to talk about memory optimization in Unity3d runtime, we naturally need to know how the Unity3d game engine allocates memory first. Can be divided into three parts:

    1. Internal memory of Unity3d
    2. Mono's managed memory
    3. Some of the memory we need to introduce ourselves to DLLs or third-party DLLs.

The 3rd category is not the focus of our attention, so we'll take a look at Unity3d internal memory and Mono managed memory, and finally analyze the case of an official on-line assetbundle to illustrate memory management.

Unity3d Internal Memory

What does Unity3d's internal memory hold? Think about it, but what else does a game need to drive logic with code? Yes, a variety of resources. So simply summarize what Unity3d internal memory holds:

    • Resources: textures, grids, audio, and more
    • Gameobject and various components.
    • Internal logic of the engine requires memory: renderer, physical system, particle system, etc.
Mono managed Memory

Because our game scripts are written in C # and cross-platform, it's clearly necessary to take a mono hosting environment. Then Mono's managed memory naturally has to be considered in the memory optimization category. So what's the difference between what we're talking about in mono managed memory and what's stored in Unity3d internal memory? In fact, the memory allocation of mono is very traditional run-time memory allocation:

    • Value types: int type, float type, struct struct, bool, etc. They are all stored on the stack (note the amount, not the heap so no GC is involved).
    • Reference types: In fact, can be narrowly understood as examples of various classes. For example, the game script in the game engine of various controls encapsulation. In fact, it is very well understood that there must be a corresponding class in C # to correspond to the controls in the game engine. So this part is the encapsulation in C #. Because it is allocated on the heap, it involves a GC.

Those encapsulated objects in the mono managed heap, in addition to the memory required to allocate the encapsulated class instantiation on the Mono managed heap, are also involved in the allocation of internal memory for the game engine's internal controls behind the Unity3d.

To give an example:

A WWW-type Object declared in a. cs script Www,mono allocates the memory it needs for www on the mono managed heap. Also, the memory required for the engine resource behind the instance object needs to be allocated.

The resources behind a WWW instance:

    • Compressed files
    • Unzip the required cache
    • Files after decompression

So here's an example of a assetbundle:

Assetbundle Memory Processing

To download Assetbundle as an example, talk about the allocation of memory. Bastard from the Manual on the official website to find a use of the Assetbundle scenario is as follows:

IEnumerator Downloadandcache () {        //Wait for the Caching system to is ready while        (! Caching.ready)            yield return null;        Load the Assetbundle file from the cache if it exists with the same version or download and store it in the Cache        using (www www = www.) Loadfromcacheordownload (Bundleurl, version)) {            yield return www;//www is 1th part            if (www.error! = null)                throw New Exception ("WWW download had an error:" + www.error);            Assetbundle bundle = Www.assetBundle;//AssetBundle is the 2nd part            if (Assetname = = "")                Instantiate (Bundle.mainasset) ;//instantiation is the 3rd part of the            else                instantiate (bundle. Load (Assetname));                    Unload the Assetbundles compressed contents to conserve memory                    bundle. Unload (FALSE);        } Memory is freed from the Web stream (www. Dispose () gets called implicitly)    }}

The three parts of the memory allocation bastard have been identified in the code:

    1. Web Stream: Contains compressed files, extracts the required cache, and extracts the files.
    2. assetbundle: A map of a file in a Web stream, or a reference.
    3. Object After instantiation : The various resource files of the engine, which are created in memory.

Then analyze it separately:

www www = WWW.LoadFromCacheOrDownload (bundleurl, version)
    1. To read a compressed file into memory
    2. Create the cache required for decompression
    3. Unzip the file, unzip the file into memory
    4. Shut down the cache created for decompression
Assetbundle bundle = Www.assetBundle;
    1. Assetbundle is now a bridge between the extracted files from the Web stream and the last instance of the created object.
    2. So Assetbundle is essentially a mapping of the objects in the extracted file of the Web stream. Rather than the real object.
    3. The actual resource still exists in the web stream, so the web stream is preserved at this time.
Instantiate (Bundle.mainasset);
    1. Get resources through Assetbundle, instantiate objects

Finally, you may see this example in the official website using:

using (www www = WWW.LoadFromCacheOrDownload (bundleurl, Version)) {}

The use of this using. This usage is actually to release the memory after the Web stream is used. Because WWW also inherits the IDispose interface, this usage of using can be used. is actually equivalent to the last execution:

Delete Web Streamwww.dispose ();

The Ok,web stream was deleted. Then who else? To Assetbundle. Then use

Delete Assetbundlebundle.unload (false);

OK, just write it down here first. It's a bit super-written. A little rush also a bit temporary, later in the supplementary editor.

Back to catalog update, use the Unity Profiler tool to detect memory

This article was written with a bit of haste, so there was no special introduction to the Unity Profiler tool or the use of the Unity Profiler tool to monitor the state of memory. But using the Unity Profiler tool to monitor is still a necessity, so here's a quick way to add a little bit of knowledge.

There are two modes available in the Profiler tool for us to monitor memory usage, namely, easy mode and verbose mode. In the simple mode, we can see that the total amount of memory is listed in two columns, used total, which uses all of the memory, and Reserved total (reserved memory). Used Total and reserved are physical memory, where reserved is the overall amount of memory that unity has applied to the system, and unity's bottom-level has opened up a larger chunk of memory as a cache , in order not to apply to the system frequently. Reserved memory , while running, Unity uses memory first in the reserved to request memory, and when not in use, it is the first to release memory to reserved, thus ensuring the smooth running of the game. In general, the larger the used total, the greater the Reserved total, and the lower the total of Reserved total when used total falls (but not necessarily in sync with used total).

Unity3d's memory can be broadly divided into the following sections:

    1. Unity: The memory allocated by the underlying code of the bit Unity3d.
    2. Mono: the managed heap. The memory required by the Mono runtime to run the game script, in other words, the size of the managed heap is independent of our number of gameobject, the amount of resources, only the script code. This part of the memory is a garbage collection mechanism .
    3. Gfxdriver: Can be understood as GPU memory overhead , consisting mainly of Texture,vertex buffer and index buffer. So as much as possible to reduce or release resources such as texture and mesh, you can reduce gfxdriver memory.
    4. FMOD: Memory overhead for audio.
    5. Profiler

At the bottom of the monitor in simple mode, some of the most common resources and the memory they consume are listed.

    1. Texture
    2. Grid
    3. Material
    4. Action
    5. Audio
    6. Number of Game objects

The verbose mode requires clicking on the "Take Sample" button to capture detailed memory usage. It is important to note that due to the time it takes to obtain the data, we cannot get real-time detailed memory usage. In verbose mode, we can observe the memory usage of each specific resource and game object.

If you crossing think the article is well written, then the small bastard kneeling beg you to give a "recommendation", Thank you ~

Pretending to be StatementOnce: This Bo article is not special note is original, if need to reprint please keep the original link ( and author information Murong Small Bastard

Chat optimization: From draw calls to GC

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.