(8) New Features of Unity5.0 ------ IL2CPP Internals: A Journey of generated code, unity5.0

Source: Internet
Author: User
Tags mscorlib

(8) New Features of Unity5.0 ------ IL2CPP Internals: A Journey of generated code, unity5.0

Sun Guangdong 2015.5.25

Reprinted, please indicate the source

This is the second blog article in The IL2CPP Internals series. In this article, we will explore the c ++ code generated by il2cpp.exe. Along the way, we will see how managed types are represented in the local code and how runtime checks are used to support. NET virtual machines. For more information, see How to generate them cyclically!
We will encounter some code that is very specific to the version, and the Unity of a later version will definitely change. However, the concept remains unchanged.
Example project:
I will use the latest Unity5.0.1p1 version for this example. In the first article of this series, I will start with an empty project and add a script file. This time, it has the following content:
Using UnityEngine;
Public class HelloWorld: MonoBehaviour {
Private class Important {
Public static int ClassIdentifier = 42;
Public int InstanceIdentifier;
}
Void Start (){
Debug. Log ("Hello, IL2CPP! ");
Debug. LogFormat ("Static field: {0}", Important. ClassIdentifier );
Var importantData = new [] {
New Important {InstanceIdentifier = 0 },
New Important {InstanceIdentifier = 1 }};
Debug. LogFormat ("First value: {0}", importantData [0]. InstanceIdentifier );
Debug. LogFormat ("Second value: {0}", importantData [1]. InstanceIdentifier );
Try {
Throw new InvalidOperationException ("Don't panic ");
}
Catch (InvalidOperationException e ){
Debug. Log (e. Message );
}
For (var I = 0; I <3; ++ I ){
Debug. LogFormat ("Loop iteration: {0}", I );
}
}
}

On Windows, I will run the Unity editor WebGL to create this project. I chose the Development Player option in Build Settings, so that we can get a relatively good name to generate c ++ code. I have set Enable Exceptions to Full in WebGL Player Settings.

An overview of the generated code:
After the WebGL build is complete, the generated c ++ code is in the Temp \ StagingArea \ Data \ il2cppOutput directory of my project directory. Once the editor is disabled, the directory will be deleted. As long as the editor is open, this directory remains unchanged, so we can check it.

The number of files generated by the Il2cpp.exe utility, or even small projects. 4625 header files and 89 c ++ source code files. To get all the handles of this code, I like to use a text editor Exuberant CTags. CTags usually generate a label file for this code quickly, which makes it easy to navigate.

Initially, you can see a lot of generated c ++ files, rather than from simple script code, but the opposite conversion version code in the standard library, such as mscorlib. dll code. As mentioned in the first article in this series, the IL2CPP script backend uses the same standard library code as the Mono script backend. Note: We Convert mscorlib. dll and other standard library sets, and code in il2cpp.exe running at each time. This may seem unnecessary because the code will not change.

However, the backend of the IL2CPP script always uses byte code strip to reduce the size of the executable file. Therefore, even small changes in the script code can cause many different parts to use the standard library code, depending on the specific situation. Therefore, we need to convert the mscorlib. dll assembly every time. We are studying how to make incremental generation better, but we do not have any good solutions yet.


How to map managed code to the generated c ++ code:

For each worker in the authorization code, il2cpp.exe will generate a c ++ header file to define the type and type method declaration in another header file. For example, let's take a look at the converted UnityEngine. Vector3 content. The header file is named UnityEngine_UnityEngine_Vector3.h. A namespace and type name followed by UnityEngine. dll is created based on the Assembly name. The Code is as follows:
// UnityEngine. Vector3
Struct Vector3_t78
{
// System. Single UnityEngine. Vector3: x
Float ___ x_1;
// System. Single UnityEngine. Vector3: y
Float ___ y_2;
// System. Single UnityEngine. Vector3: z
Float ___ z_3;
};

The Il2cpp.exe utility has converted three instance fields and reorganized the names to avoid conflicts with reserved words. We use reserved names in c ++ by using leading underlines, but so far we have not seen any conflicts with the c ++ standard library code.
All Vector3 method declarations contained in the UnityEngine_UnityEngine_Vector3MethodDeclarations.h file. For example, Vector3 overwrites the Object. ToString method:

// System. String UnityEngine. Vector3: ToString ()
Extern "C" String_t * Vector3_ToString_m2315 (Vector3_t78 * _ this, MethodInfo * method) IL2CPP_METHOD_ATTR
 
Note that the annotation indicates the managed method represented by this local declaration. I often find that the output hosting method in the search file is useful in this format, especially for common names such as tostring.
The method for converting notifications from il2cpp.exe is interesting:

· In c ++, these are not member functions. All methods are free functions, and the first parameter is the "this" pointer. For static functions in managed code, IL2CPP always sets this first parameter as NULL. By always declaring a pointer with "this" as the first example, we have simplified il2cpp.exe to generate code methods and make code generated by calling methods of other methods (such as delegation) easier.

· The type of an additional parameter for each method is MethodInfo *, which includes metadata about methods used for class virtual method calls. The Mono script backend uses platform-specific trampolines to pass this metadata. About IL2CPP, we have decided to avoid using trampolines to facilitate portability.

· All methods declare extern "C" so that il2cpp.exe can sometimes lie to the c ++ compiler and treat all methods if they have the same type.

· Type with the suffix "_ t. The method named after "_ m. A unique number is appended to each name after the name conflict is resolved. If these numbers are changed in any user script code, you cannot count on them during build.

The first two points indicate that each method has at least two parameters: the "this" pointer and the MethodInfo pointer. Will these additional parameters cause unnecessary overhead? Although they will increase the overhead, we have never seen any additional parameters that may cause performance problems. Although it seems they may cause it, the analysis shows that the difference in performance cannot be measured too small.

We can use the Ctags tool to jump to the definition of the ToString method. It is in the Bulk_UnityEngine_0.cpp file. The definition of this method in the Code does not look like the Vector3: ToString () method in the C # code. However, if you use a tool like ILSpy to reflect the code of the Vector3: ToString () method, you will see that the generated c ++ code looks very similar to the IL code.

Why does il2cpp.exe not generate a separate c ++ file for the same method declaration of each type? The Bulk_UnityEngine_0.cpp file is quite large, in fact, there are 20,481 rows! We found that the c ++ compiler we are using has a lot of trouble with source code files. Compilation of four thousand. cpp files takes far more than 80. cpp source code files. Therefore, the batch method definition of the il2cpp.exe type group generates a c ++ file for each group ,.

Jump back to the header file declared by the method and notice the line near the top of the file:
# Include "codegen/il2cpp-codegen.h"


The il2cpp-codegen.h file contains the code generated to access the interface of the libil2cpp runtime service. We will discuss some code generated by the method used during runtime.

Method prologues
Let's take a look at the definition of Vector3: ToString. Specifically, it has a common prologue section, represented by all methods of il2cpp.exe emitted.


StackTraceSentry _ stackTraceSentry (& Vector3_ToString_m2315_MethodInfo );
Static bool Vector3_ToString_m2315_init;
If (! Vector3_ToString_m2315_init)
{
ObjectU5BU5D_t4_il2cpp_TypeInfo_var = il2cpp_codegen_class_from_type (& ObjectU5BU5D_t4_0_0_0 );
Vector3_ToString_m2315_init = true;
}
 

In this way, the first line of prologue creates a local variable type StackTraceSentry. This variable is used to track managed call stacks, so IL2CPP can report it in calls like Environment. StackTrace. This code is actually optional and enabled in this case -- the Enable stack trace option is passed to il2cpp.exe (because I set Enable Exceptions to Full in WebGL Player Settings ). For small functions, we find that the overhead of this variable has a negative impact on performance. Therefore, for iOS and other platforms, we can use platform-specific stack tracing information, where we will never issue this line to the generated code. WebGL, we do not have platform-specific stack tracking support, so it is necessary to allow hosted code exceptions to work properly.

The second part of prologue has no delayed initialization array or generic type metadata used in the method body. Therefore, the ObjectU5BU5D_t4 type is System. Object []. This part of prologue is executed only once and often does what if the type has been initialized elsewhere, so we have not seen any negative performance impact from the generated code.

But is this code thread secure? If two threads call Vector3: ToString ()? In fact, this code is not a problem, because all libil2cpp runtime code initialization for the type is safely called from multiple threads. It is possible (or even possible) to call the il2cpp_codegen_class_from_type function more than once, but its actual work only once occurs on a thread. Method execution will not continue until the initialization is complete. Therefore, this method is thread-safe.

Runtime checks Runtime check
In the next part of this method, create an object array and the value of the x field of Vector3 is stored locally, and then add it to the array with the index starting from scratch. The generated c ++ code (using some annotation functions) is as follows ):
 
// Create a new single-dimension, zero-based object array
ObjectU5BU5D_t4 * L_0 = (ObjectU5BU5D_t4 *) SZArrayNew (ObjectU5BU5D_t4_il2cpp_TypeInfo_var, 3 ));
// Store the Vector3: x field in a local
Float L_1 = (_ this-> ___ x_1 );
Float L_2 = L_1;
// Box the float instance, since it is a value type.
Object_t * L_3 = Box (InitializedTypeInfo (& Single_t264_il2cpp_TypeInfo), & L_2 );
// Here are three important runtime checks
NullCheck (L_0 );
IL2CPP_ARRAY_BOUNDS_CHECK (L_0, 0 );
ArrayElementTypeCheck (L_0, L_3 );
// Store the boxed value in the array at index 0
* (Object_t **) SZArrayLdElema (L_0, 0) = (Object_t *) L_3;

The three runtime checks that there is no IL code, but the code is injected by il2cpp.exe.
• The NullCheck code will cause NullReferenceException if the array value is null.
• The IL2CPP_ARRAY_BOUNDS_CHECK code will cause IndexOutOfRangeException if the array index is incorrect.
• The ArrayTypeMismatchException is thrown by the ArrayElementTypeCheck code. If the type of the element added to the array is incorrect.

These three runtime checks are all guaranteed by the. NET virtual machine. Instead of injecting code, the Mono script backend uses platform-specific signal transfer mechanisms to process these same runtime checks. For IL2CPP, we want more platforms to get an unrecognized and supported platform, such as WebGL, where there is no platform-specific signal transfer mechanism, so il2cpp.exe injects these checks.
 
Will these runtime checks cause performance problems? In most cases, we do not see any adverse impact on performance, they provide good and. NET Virtual Machine security. However, in a few specific cases we see these checks, resulting in performance degradation, especially in a compact loop. We are currently working on comments that allow hosted Code to remove these runtime checks when il2cpp.exe generates c ++ code. Please pay attention to this aspect.
 
Static Fields Static field
Now, we can see how to use the instance field (Vector3 type). We can see static field conversion and access. The HelloWorld_Start_m3 method definition is defined in the Bulk_Assembly CSharp_0.cpp file I generated. From there, jump to the Important_t1 type (in the theAssemblyU2DCSharp_HelloWorld_Important.h file ):

Struct Important_t1: public Object_t
{
// System. Int32 HelloWorld/Important: InstanceIdentifier
Int32_t ___ InstanceIdentifier_1;
};
Struct Important_t1_StaticFields
{
// System. Int32 HelloWorld/Important: ClassIdentifier
Int32_t ___ ClassIdentifier_0;
};
 

Notice that il2
Notice that il2cpp.exe has generated a separate C ++ struct to hold the static field for this type, since the static field is shared between all instances of this type. so at runtime, there will be one instance of the Important_t1_StaticFields type created, and all of the instances of the Important_t1 type will share that instance of the static fields type. in generated code, the static field is accessed like this:
Please note that il2cpp.exe has generated a separate c ++ struct for this type of static field because all instances of this type of static field need to be shared. Therefore, an instance of the Important_t1_StaticFields type will be created at runtime. All Important_t1 instances will share the static field type of the instance. The generated code accesses static fields as follows:

Int32_t L_1 = (Important_t1_StaticFields *) InitializedTypeInfo (& Important_t1_il2cpp_TypeInfo)-> static_fields)-> ___ ClassIdentifier_0 );

The Type metadata of Important_t1 holds a pointer to an instance of the Important_t1_StaticFields type, which is used to obtain the value of a static field.

Exceptions

An error occurred while converting il2cpp.exe to c ++. We chose this path to avoid platform-specific solutions. When il2cpp.exe requires a hosted exception caused by emit code, it calls the il2cpp_codegen_raise_exception function.
The code for triggering and capturing hosted exceptions in our HelloWorld_Start_m3 method is as follows:
Try
{// Begin try (depth: 1)
InvalidOperationException_t7 * L_17 = (InvalidOperationException_t7 *) il2cpp_codegen_object_new (InitializedTypeInfo (& InvalidOperationException_t7_il2cpp_TypeInfo ));
InvalidOperationException _ ctor_m8 (L_17, (String_t *) & _ stringLiteral5,/* hidden argument */& InvalidOperationException _ ctor_m8_MethodInfo );
Il2cpp_codegen_raise_exception (L_17 );
// IL_0092: leave IL_00a8
Goto IL_00a8;
} // End try (depth: 1)
Catch (Il2CppExceptionWrapper & e)
{
_ Prediction_local = (Exception_t8 *) e. ex;
If (il2cpp_codegen_class_is_assignable_from (& InvalidOperationException_t7_il2cpp_TypeInfo, e. ex-> object. klass ))
Goto IL_0097;
Throw e;
}
IL_0097:
{// Begin catch (System. InvalidOperationException)
V_1 = (InvalidOperationException_t7 *) _ exception_local );
NullCheck (V_1 );
String_t * L_18 = (String_t *) returns funcinvoker0 <String_t *>: Invoke (& Exception_get_Message_m9_MethodInfo, V_1 );
Debug_Log_m6 (NULL/* static, unused */, L_18,/* hidden argument */& Debug_Log_m6_MethodInfo );
// IL_00a3: leave IL_00a8
Goto IL_00a8;
} // End catch (depth: 1)
 

All hosted exceptions are encapsulated in the c ++ Il2CppExceptionWrapper type. When the generated code captures exceptions of this type, it unpacks the c ++ representation of the hosted exception of its type Exception_t8. In this case, we expect that only the data can be reversed, so if we cannot find a copy of the c ++ exception of this type, we will throw it back. If the correct type is found, the catch handler that the Code jumps to will execute and write the exception message.

Goto !?!
This Code presents an interesting point. What are these labels and goto statements? These structures are unnecessary structured programming! However, IL does not have structured programming concepts, such as loops and if/then statements. Because Delimiter is a low-level code, il2cpp.exe follows the low-level concept generation.
For example, let's look at the for Loop Method in HelloWorld_Start_m3:
 
IL_00a8:
{
V_2 = 0;
Goto IL_00cc;
}
IL_00af:
{
ObjectU5BU5D_t4 * L_19 = (ObjectU5BU5D_t4 *) SZArrayNew (ObjectU5BU5D_t4_il2cpp_TypeInfo_var, 1 ));
Int32_t L_20 = V_2;
Object_t * L_21 =
Box (InitializedTypeInfo (& Int32_t5_il2cpp_TypeInfo), & L_20 );
NullCheck (L_19 );
IL2CPP_ARRAY_BOUNDS_CHECK (L_19, 0 );
ArrayElementTypeCheck (L_19, L_21 );
* (Object_t **) SZArrayLdElema (L_19, 0) = (Object_t *) L_21;
Debug_LogFormat_m7 (NULL/* static, unused */, (String_t *) & _ stringLiteral6, L_19,/* hidden argument */& Debug_LogFormat_m7_MethodInfo );
V_2 = (int32_t) (V_2 + 1 ));
}
IL_00cc:
{
If (int32_t) V_2) <(int32_t) 3 )))
{
Goto IL_00af;
}
}
 
 
The V_2 variable here is a circular index. Is the first value of 0, and then increments the cycle in the following line:
V_2 = (int32_t) (V_2 + 1 ));

Check the end condition of the loop here:
If (int32_t) V_2) <(int32_t) 3 )))

As long as V_2 is less than 3, the goto statement jumps to the IL_00af label, which is the top of the loop body. You may be able to guess that the il2cpp.exe is currently generating c ++ Code directly from IL without using the intermediate abstract syntax tree representation. If you guess this, you are correct. You may have noticed that some of the above generated code looks like this at runtime:
Float L_1 = (_ this-> ___ x_1 );
Float L_2 = L_1;
 
Obviously, using the L_2 variable is not necessary here. Most c ++ compilers can optimize this extra task, but we want to avoid emitting it at all. We are currently studying the possibility of using AST to better understand IL code and generate a for loop where better c ++ Code involves local variables.

Conclusion
We just caught the surface of the c ++ code generated by the IL2CPP script backend of a very simple project. If you have never done so, I encourage you to come to the code generated in your project. When you are exploring, please remember that the c ++ code generated by the IL2CPP script backend that we are constantly striving to improve the build and runtime performance will look different in future versions.

By converting the IL code into c ++, we have achieved a good balance between portable and high-performance code. We can have a lot of good developer friendly feature hosting code, while still getting the benefits of c ++ compiler providing quality machine code on various platforms.

In future positions, we will explore more generated code, including method calls, shared method implementations, and packaging of calls to the local library. But next time we will debug some code generated for the 64-bit internal version of Xcode iOS.

Source Address of the article:
Http://blogs.unity3d.com/2015/05/13/il2cpp-internals-a-tour-of-generated-code/


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.