Use good lua+unity, let performance fly up--lua and C # Interactive Chapter

Source: Internet
Author: User
Tags float double lua

After looking at the performance comparison of common LUA solutions for unity Projects released before UWA, the preface decided to write a performance optimization article about the lua+unity scheme. Integrating LUA is now the most powerful unity hot-update solution, which is the only way to support iOS hot updates. However, as a heavy Ulua user, we have stepped on a lot of pits to ulua up to a state that can be used on a large scale in a project. In fact, even to the present lua+unity solution can not be easily said to use, to use well, you need to know a lot. Therefore, this article is from a bunch of simple optimization recommendations, gradually digging out the reasons behind. Only understand the reason, can be very clear to do the optimization, in the end is for what, how much effect.   from the earliest Lua pure reflection call C #, and the cloud wind team trying to implement the pure C # LUA virtual machine, has evolved into the current various luajit+c# static LUA export scenarios, lua+unity to achieve a functional level of performance. But even so, the actual use we will find, compared to the COCOS2DX era Luajit, now lua+unity performance still has a considerable bottleneck. Only from the "performance comparison" Test1 can be seen, iphone4s 200,000 times position assignment is already required 3000ms, if the COC is a type of game, do not deal with other logic, a frame only thousands of location assignment (such as hundreds of units, effects and blood bars) need 15ms , which is obviously somewhat high. What causes Lua+unity's performance not to reach the extreme, how to better use? We'll start with some examples and gradually dig into the details behind it.   Since the main use of our project is Ulua (integrated Topameng Cstolua, but due to continuous performance improvement, has done a lot of changes in the following), most of the conclusions of this paper are based on Ulua+cstolua test, Slua are based on their source code analysis (according to our analysis of the situation, the two principles are basically consistent, only in the implementation of the details of some differences), but did not do in-depth testing, if there is a problem, welcome to communicate.    since it is lua+unity, that performance is good, basically look at two points: how the performance of Lua interacting with C # How the performance of the pure LUA code itself, because these two parts have their own needs to be explored in depth, so we will be divided into several chapters to explore the whole lua+ Exactly how unity is optimized.   lua and C # Interactive chapter  1. From the deadly gameobj.transform.position = pos start talking like gameobj.transform.position = pos thisKind of writing, in unity is a common thing but in Ulua, the use of this kind of writing is very bad. Why is it?   Because of a short line of code, but there are a lot of things, in order to be more intuitive, we have this line of code called key LUAAPI and Ulua related key steps listed (in Ulua+cstolua export, Gameobj is the Gameobject type , POS is Vector3): The first step:Gameobjectwrap.get_transform Lua wants to get transform from gameobj, corresponding to Gameobj.transform luadll.luanet_rawnetobj the game in Lua Obj becomes C # recognizable ID Objecttranslator.trygetvalue uses this ID to get C # Gameobject objects from Objecttranslator gameobject.transformprepared so much that here finally really executes C # get gameobject.transform upObjecttranslator.addobject assigns an ID to transform, which is used in Lua to represent the transform,transform to be saved to objecttranslator for future lookups. Luadll.luanet_newudata assigns a userdata to LUA and saves the ID in order to represent the transform luadll.lua_setmetatable that will return to Lua to this US  Erdata attached metatable, so you can transform.position so use it luadll.lua_pushvalue back to transform, back to do some finishing Luadll.lua_rawseti Luadll.lua_remove Step Two:Transformwrap.set_position Lua wants to set the POS to Transform.positionluadll.luanet_rawnetobj the transform in Lua into a C # recognizable ID.Objecttranslator.trygetvalue uses this ID to get a C # Transform object from Objecttranslator LUADLL.TOLUA_GETFLOAT3 Get Vector3 's 3 float value from Lua back to C # Lua_getfield + lua_tonumber 3 times the value of XYZ, rewind Lua_pop Transform.posi tion = new Vector3 (x, Y, z)prepared so much, and finally executed transform.position = pos AssignmentJust such a line of code, unexpectedly did so a lot of things! If it is C++,A.B.C = x This is optimized to take the address and then memory assignment. But here, the frequent value, the stack, the C # to LUA type conversion, each step is full of CPU time, regardless of the middle produced a variety of memory allocations and the subsequent gc! Below we will gradually explain that some of these things are actually unnecessary, can be omitted. We can finally optimize him to: Lua_isnumber + lua_tonumber 4 times, all done
2. Referencing the C # object in Lua is expensive from the above example, just want to get a transform from gameobj, there is already a very expensive C # object, cannot be used as a pointer to the C operation (in fact, can be done by GCHandle pinning, but the performance is not tested, and the object is pinning can not be managed by GC), so the mainstream lua+unity is an ID to represent C # objects, in C # The ID and object are passed through dictionary. Also, because of this dictionary reference, the C # object will not be garbage collected if Lua has a reference. Therefore, each time the parameter has an object, to convert from the ID in Lua to the object in C #, it is necessary to do a dictionary lookup, each call to a member of the object method, you must first find this object, you will do dictionary find. If this object was previously useful in LUA and was not GC, it would be a matter of checking the dictionary. But if the discovery is a new object that has not been used in Lua, that's the whole bunch of prep work in the example above. If the object you are returning is only temporarily used in Lua, the situation is even worse! The UserData and dictionary indexes that have just been allocated may be removed by the GC because of the LUA reference, and then the next time you use this object you will have to do all kinds of preparation again, resulting in a poor performance of repeated allocations and GC. The gameobj.transform in the example is a huge trap, because. Transform is just a temporary return, but you are not quoted at all, and will soon be released by Lua, causing you to go back every time. Transform once, can mean one allocation and GC.   3. Passing Unity-Unique value types (vector3/quaternion, etc.) between Lua and C # is more expensive now that the LUA call C # object is slow, if you go through C # every time vector3.x, the performance is basically crashing. , so the mainstream solution is to implement VECTOR3 and other types as pure Lua code, and Vector3 is a table of {x, y, z}, so it's fast to use in Lua. But after doing this, the VECTOR3 expression in C # and Lua is completely two, so the argument involves the LUA type and C # type conversion, such as C # passing Vector3 to Lua, the entire process is as follows: 1.c# to get Vector3 x, Y, Z Three values 2.push these 3 float to LUA stack 3. Then constructs a table that assigns a value of x, Y, Z of the table to 4.Push this table to the return value a simple pass parameter will complete 3 push parameters, table memory allocation, 3 table INSERT, the performance is conceivable. So how to optimize it? Our tests show that passing three floats directly in a function is faster than passing Vector3. For example void SetPos (Gameobject obj, Vector3 pos) changed to Void setpos (Gameobject obj, float x, float y, float z) Specific effects can look at the test data behind, lifting ten are clearly divided. When    4.lua and C # are passed and returned, try not to pass the following types: Critical class:  vector3/quaternion and other Unity value types, array secondary severity class: Bool string   Various object recommendations pass: int float Double Although it is a reference to Lua and C #, but from the point of reference, LUA and C # actually hold a layer of C (after all, LUA itself is the C implementation), LUA, C, C # Because the representation of many data types and the memory allocation strategy are different, so the data is passed among the three, often need to be converted (the term parameter mashalling), this conversion consumption will vary greatly depending on the type. First of all, the bool string type in the severity class, involving the interaction performance consumption of C and C #, according to Microsoft official documentation, in the processing of data types, C # defines the blittable Types and Non-blittable Types, where bool and string belong to non-blittable Types, meaning that their memory representation in C and C # is different, which means that conversion from C to C # requires type conversions, reduced performance, The string also considers memory allocations (copying the memory of a string to the managed heap, and UTF8 and UTF16). You can refer to https://msdn.microsoft.com/zh-cn/library/ms998551.aspx for more detailed performance tuning guidelines for C and C # interactions. But the serious class, basically is the Ulua and so on the scheme to try the Lua object and the C # object to the timely bottleneck caused. Vector3 equivalent type of consumption, which has been mentioned earlier. And the array is even more, because the array in Lua can only be represented in table, this and C # is completely different, there is no direct correspondence, so the array from C # to Lua table can only be copied, if it involves object/string, and so on, it is to be transformed. &nbsp  5. Frequently called functions, the number of parameters to control whether it is Lua pushint/checkint, or C to C # parameter passing, parameter conversion is the most important consumption, and is a parameter, so LUA calls C # performance, In addition to the parameter type is related, but also with the number of parameters have a very large relationship. In general, the frequently called function does not exceed 4 parameters, and the function of the more than 10 parameters can be frequently called, you will see a noticeable performance degradation, the phone may be a frame call hundreds of times to see the 10ms level of time.   6. Preferential use of the static function to export, reducing the use of member method export as mentioned earlier, an object that accesses member methods or member variables requires a reference to the Lua UserData and C # objects, or metatable, which takes a lot of time to find. You can reduce this consumption by directly exporting the static function. Like obj.transform.position = pos. The method we recommend is written as a static export function, similar to class luautil{  static void SetPos (Gameobject obj, float x, float y, float z) {Obj.transform.posi tion = new Vector3 (x, y, z); }} then Luautil.setpos (obj, pos.x, Pos.y, pos.z) in Lua, which can be a lot better, because it eliminates the frequent return of transform and avoids the fact that transform often temporarily returns a GC that causes LUA.   7. Note that Lua's reference to C # objects can cause C # objects to not be freed, which is a common cause of memory leaks before, C # object is returned to Lua by dictionary UserData and C # object is associated with, as long as the UserData in Lua are not recycled, C # object will be referenced by this dictionary, resulting in the inability to recycle. The most common is gameobject and component, and if Lua references them, even if you do destroy, they will still remain in the mono heap. However, since this dictionary is the only association between LUA and C #, it is not difficult to find this problem, so it is easy to find out if you traverse this dictionary. Ulua under this dictionary in Objecttranslator class, Slua in ObjectCache class   8. Consider using only the ID that you manage in Lua, rather than referencing the C # object directly, one of the ways in which Lua refers to the various performance issues with C # object is to assign an ID to index object, and the relevant C # The export function does not pass an object as an argument, but instead passes an int. This brings several benefits to:  1. Function calls perform better;  2. Explicitly manage the life cycle of these objects, avoid having Ulua automatically manage references to them, and if these objects are incorrectly referenced in Lua, they can cause the object to be released and memory leaks   3.c#object back into Lua, if Lua doesn't have a reference, it's easy to GC immediately and remove the Objecttranslator reference to object. By managing this referential relationship on its own, such GC behavior and allocation behavior does not occur frequently. For example, the above Luautil.setpos (Gameobject obj, float x, float y, float z) can be further optimized for Luautil.setpos (Int objid, float x, float y, FL Oat z). Then we record the correspondence between ObjID and Gameobject in our own code, and if we can, use the array to record instead of dictionary, there will be a quicker search efficiency. This can further eliminate the time that Lua calls C #, and the management of objects is more efficient.   9. The rational use of the Out keyword returns complex return values in C # to LUA returns the various types of things that are similar to the arguments, and also have a variety of consumption. For example, Vector3 GetPos (Gameobject obj) can be written as void GetPos (Gameobject obj, out float x, out float y, out float z) on the surface the number of parameters increased, but based on the generated Export code (we take Ulua), from: LUADLL.TOLUA_GETFLOAT3 (contains Get_field + tonumber 3 times) into Isnumber + tonumber 3 times Get_field is essentially a table lookup, Must be slower than the Isnumber access stack, so doing so will have better performance.    measured well, said so much, not to take a bit of data to see or too obscure in order to see the pure language itself more realistically, we directly do not use the example of gameobj.transform.position, because there is a part of the timeis wasted inside unity. We have rewritten a simplified version of GameObject2 and Transform2. Class transform2{  public Vector3 position = new Vector3 (); Class gameobject2{   public Transform2 transform = New transform2 ();} Then we use several different calling methods to set the position way of the transform 1:gameobject.transform.position = vector3.new (three-way) 2:gameobject:setpos ( Vector3.new (3:GAMEOBJECT:SETPOS2) mode 4:goutil.setpos (Gameobject,vector3.new (+/-))Mode 5:goutil.setpos2 (Gameobjectid, vector3.new (three-way)) method 6:goutil.setpos3 (Gameobjectid, three-way) respectively carried out 1 million times, The results are as follows (the test environment is the Windows version, the CPU is I7-4770,luajit JIT mode off, the phone will be different due to the Luajit architecture, il2cpp and other factors, but this will be further elaborated in the next article): Way 1:903ms Way 2:539ms way 3:343ms way 4:559ms Way 5:470msWay 6:304msAs you can see, every step of the optimization is obvious, especially the removal. Transform acquisition and Vector3 conversion promotion is huge, we just change the way to export, do not need to pay a high cost, it can alreadySave 66% of your time。 Can we actually go any further? Still can! On the basis of mode 6, we can do it again only 200ms! Here to sell a Xiaoguanzi, the next Luajit integration we explain further. In general, we recommend that the level of mode 6 is sufficient. This is only one of the simplest cases, there are many kinds of common exports (such as getcomponentsinchildren this performance pit, or a function to pass more than 10 parameters of the case) all need to be based on their own use of the situation to optimize,  With the performance rationale behind the LUA integration scenario we provide, it should be easy to think about what to do.   The next article will write the second part of Lua+unity performance optimization, Luajit integrated performance pit compared to the first part of this look at the export code can probably know the problem of performance consumption, Luajit integration of the problem is much more complicated and obscure. C # code attached to the test case:
 Public classtransform2{ PublicVector3 position =NewVector3 ();}  Public classgameobject2{ PublicTransform2 transform =NewTransform2 ();  Public voidSetPos (Vector3 pos) {transform.position=POS; }      Public voidSetPos2 (floatXfloatYfloatz) {transform.position.x=x; TRANSFORM.POSITION.Y=y; Transform.position.z=Z; }}  Public classgoutil{Private StaticList<gameobject2> MOBJS =NewList<gameobject2>();  Public StaticGameObject2 GetByID (intID) {if(Mobjs.count = =0)        {             for(inti =0; I < +; i++) {Mobjs.add (NewGameObject2 ()); }        }        returnMobjs[id]; }      Public Static voidSetPos (GameObject2 go, Vector3 pos) {go.transform.position=POS; }      Public Static voidSetPos2 (intID, Vector3 POS) {mobjs[id].transform.position=POS; }      Public Static voidSETPOS3 (intIdfloatXfloatYfloatz) {vart =Mobjs[id].transform; T.position.x=x; T.POSITION.Y=y; T.position.z=Z; }}

Use good lua+unity, let performance fly up--lua and C # Interactive Chapter

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.