[24th back: Understanding metadata and IL (I)], we have made necessary preparations for the relationship between these concepts and metadata and IL in PE files, assembly, and hosting modules, at the same time, I am familiar with the basic method cognition of decompilation with ildasm tool. Next I will learn about metadata and IL. Let's continue.
I have been talking about metadata (metadata) and IL (intermediate language) ideas for a long time. At the beginning of this article, I have been very down-to-earth and devoted myself to these two class brothers, although it is not as fast as "First time: resentment: Is and as", metadata and Il are absolutely heavyweight content that deserves our attention at any time. This article is the beginning.
3. What is metadata?
Metadata is the data that describes the data. This concept is not unique in CLR. metadata exists in any relationship between data and data. For example, the Assembly list information is also called Assembly metadata. The metadata of different systems also has its own characteristics, as does. Net metadata. So what does CLR metadata describe? As described above, after compilation, the type information will be saved as metadata in the PE format file .. Net is based on object-oriented, so the main purpose of metadata description is the basic elements of object-oriented: classes, types, attributes, methods, fields, parameters, features, etc., mainly including:
The definition table describes the types and member information defined in the source code, including typedef, mehoddef, fielddef, moduledef, and propertydef.
The reference table describes the types and member information referenced in the source code. The reference element can be other modules of the same assembly or modules of different assembly, including: assemblyref, typeref, moduleref, and methodsref.
A pointer table that uses a pointer table to reference unknown code, including methodptr, fieldptr, and paramptr.
Heap: Information heap stored in the form of stream, mainly including: # string, # blob, # us, # guide, etc.
As described in the previous article, we can use ildasm.exe to decompile and execute the CTRL + M shortcut to obtain the metadata information list used by the Assembly. In. net, each module contains 44 CLR metadata tables, as shown below:
Table records |
Metadata table |
Description |
0 (0) |
Moduledef |
Description current Module |
1 (0x1) |
Typeref |
Description reference type, which saves a record for each reference type |
2 (0x2) |
Typedef |
Description type definition. Each type will save a record in the typedef table. |
3 (0x3) |
Fieldptr |
Description field pointer, which is used to find the intermediate table when defining fields of the class. |
4 (0x4) |
Fielddef |
Description field Definition |
5 (0x5) |
Methodptr |
Describes the method pointer, which is used to find a table in the middle of a class method. |
6 (0x6) |
Methoddef |
Description method definition |
7 (0x7) |
Paramptr |
Description parameter pointer. It refers to the intermediate lookup table when defining the parameters of the class. |
8 (0x8) |
Paramdef |
Parameter definition of the description method |
9 (0x9) |
Interfaceimpl |
Describes the types of interfaces implemented. |
10 (0xa) |
Memberref |
Describes the situation of referencing members. The referenced members can be methods, fields, and attributes. |
11 (0xb) |
Constant |
Describes the constant values of parameters, fields, and attributes. |
12 (0xc) |
Customattribute |
Describes the definition of a feature. |
13 (0xd) |
Fieldmarshal |
Describes the transmission methods of parameters and fields when interacting with unmanaged code. |
14 (0xe) |
Declsecurity |
Describes the security of classes, methods, and assembly. |
15 (0xf) |
Classlayout |
Describes the layout information when a class is loaded. |
16 (0x10) |
Fieldlayout |
Describe the offset or serial number of a single Field |
17 (0x11) |
Standalonesig |
Description of the signature not referenced by any other table |
18 (0x12) |
Eventmap |
Description class event list |
19 (0x13) |
Eventptr |
Describes the event pointer and defines the intermediate lookup table for an event. |
20 (0x14) |
Event |
Description event |
21 (0x15) |
Propertymap |
Description class attribute list |
22 (0x16) |
Propertyptr |
Describes the attribute pointer, which is the intermediate lookup table when defining the attributes of a class. |
23 (0x17) |
Property |
Description attribute |
24 (0x18) |
Methodsemantics |
Describes the association of events, attributes, and methods. |
25 (0x19) |
Methodimpl |
Implementation of the description method |
26 (0x1a) |
Moduleref |
Description external module reference |
27 (0x1b) |
Typespec |
Describes the description of typedef or typeref. |
28 (0x1c) |
Implmap |
Describes the methods of all the unmanaged code used by the Assembly. |
29 (0x1d) |
Fieldrva |
The extension of the field table. RVA provides the original value position of a field. |
30 (0x1e) |
Enclog |
Describes which metadata has been modified in the edit-and-continue mode. |
31 (0x1f) |
Encmap |
Description ing in edit-and-continue Mode |
32 (0x20) |
Assembly |
Description Assembly definition |
33 (0x21) |
Assemblyprocessor |
Unused |
34 (0x22) |
Assemblyos |
Unused |
35 (0x23) |
Assemblyref |
Description referenced assembly |
36 (0x24) |
Assemblyrefprocessor |
Unused |
37 (0x25) |
Assemblyrefos |
Unused |
38 (0x26) |
File |
Description of external files |
39 (0x27) |
Exportedtype |
Describes the types of modules in the same assembly. |
40 (0x28) |
Manifestresource |
Description Resource Information |
41 (0x29) |
Nestedclass |
Description nested Type Definition |
42 (0x2a) |
Genericparam |
Describes the generic parameters used by generic type definitions or generic method definitions. |
43 (0x2b) |
Methodspec |
Describe the instantiation of generic methods |
44 (0x2c) |
Genericparamconstraint |
Describes the constraints of each generic parameter. |
Then there are six named stacks:
Heap |
Description |
# String |
An ASCII string array referenced by the metadatabase to indicate the method name, field name, class name, variable name, and resource-related string, but does not contain string literals. |
# Blob |
Binary objects that contain metadata references, but do not contain user-defined objects |
# Us |
A Unicode string array contains a string (string literals) defined in the Code. These strings can be directly loaded and obtained by the ldstr command. Do you still remember? Do we discuss the character string creation process in "22nd back: String resident (top)-thinking with questions? |
# Guid |
The GUID value of bytes is saved and referenced by the metadata table. |
#~ |
A special heap that contains all metadata tables references other heaps. |
#- |
Uncompressed #~ Heap. Except for the #-heap, the other heap is compressed. |
Note: A simple difference between # string and # us is:
String Hello = "Hello, world ";
The variable Hello name is saved in # string, while the string "Hello, world" in the Code is saved in # us.
For a detailed description of metadata information, such as the columns in each table and the relationships between different tables, see [standard ECMA-335] and [the. NET file format].
In the PE file format, metadata has a complex structure. I try to understand the structure and relationship of metadata from the perspective of database management data, so the logical structure of metadata is changed into a metadata table, similar to a database table with a primary key and a sechema, the meta-data table uses the RID (Table index) and metadata to represent a similar concept. The typedef table is used as an example, the Data Reference relationship is also associated with tables such as field, method, and typeref. Other tables have a similar relationship to form a complex database-like structure:
Therefore, metadata stores the compiled data of the type. net program running, we can dynamically obtain metadata information in a reflection manner during the runtime. in the. NET Framework. type, methodinfo, and so on. For example, a simple example of interclass relationship in msdn is taken:
For each CLR type, the object. GetType method can be used to return its type, so as to obtain all the runtime metadata information at will:
// Release : code04, 2009/02/21
// Author : Anytao, http://www.anytao.com
// List : Program.cs
private static void ShowMemberInfo()
{
var assems = AppDomain.CurrentDomain.GetAssemblies();
foreach (Assembly ass in assems)
{
foreach (Type t in ass.GetTypes())
{
foreach (MemberInfo mi in t.GetMembers())
{
Console.WriteLine("Name:{0}, Type:{1}", mi.Name, mi.MemberType.ToString());
}
}
}
}
Execute the above method to get a long list and see many familiar symbols :-)
4 What is Il?
Il, also referred to as the pencil or msil, is translated into Chinese as an intermediate language and is fully defined and standardized by the ECMA Organization (Standard ECMA-335. As the name implies, as the name implies, any CLR-compatible compiler generates intermediate language code, which is one of the infrastructure for implementing CLR cross-language. Il is like a bridge. Its Instruction Set exists independently of CPU instructions. It can be translated into local code execution by the JIT compiler at runtime, connecting any advanced language that complies with CLS specifications, is. NET platform provides the most basic support. In [what you must know. net] in the book, I used an entire chapter (chapter 3rd "Everything starts with IL") to introduce the basic content of Il, therefore, the basic content of Il, such as basic types, il analysis methods, Common commands, and basic operations, will not be described in this article. I will only summarize the basic content of Il:
Il is an object-oriented machine language. Therefore, it has all the features of object-oriented language. Class, object, inheritance, polymorphism, and so on are still the basic concepts of IL language.
The IL command is independent of the CPU command, and the CLR converts it to local code through the JIT compilation mechanism.
Il and metadata are important to understanding the operating mechanism of CLR, which is of great significance for us to open the secret of CLR.
For example, in the previous article, I first talked about how to use the ildasm.exe or reflector tool to decompile the hosted code to view its il code. In many cases, the Il code analysis can solve syntactic sugar games hidden by many advanced languages, for example, the automatic attributes, implicit type, anonymous type, and extension method proposed by C #3.0 can quickly find the answer from the Il analysis, so it is necessary to properly understand the Il. Let's take a part in the JIT compilation below to understand the role of IL code in the execution of managed programs.
In addition, metadata describes the static structure, while IL explains the dynamic execution, and the Il code references the metadata table through a 4-byte address. This reference is called a metadata symbol (metadata token, which also records the information of the metadata table. In the ildasm.exe tool, select "show token values" to see the situation in the Il code that the Il Code uses metadata token to reference the metadata table:
.method /*06000003*/ private hidebysig static
void Main(string[] args) cil managed
{
.entrypoint
// Code size 36 (0x24)
.maxstack 2
.locals /*11000002*/ init ([0] int32 id,
[1] class Anytao.Insidenet.MetadataIL.One/*02000004*/ one,
[2] class Anytao.Insidenet.MetadataIL.Two/*02000002*/ two)
IL_0000: nop
IL_0001: ldc.i4.1
IL_0002: stloc.0
IL_0003: newobj instance void Anytao.Insidenet.MetadataIL.One/*02000004*/::.ctor() /* 06000007 */
IL_0008: stloc.1
IL_0009: ldloc.1
IL_000a: ldloc.0
IL_000b: callvirt instance void Anytao.Insidenet.MetadataIL.One/*02000004*/::set_ID(int32) /* 06000006 */
IL_0010: nop
IL_0011: newobj instance void Anytao.Insidenet.MetadataIL.Two/*02000002*/::.ctor() /* 06000002 */
IL_0016: stloc.2
IL_0017: ldloc.2
IL_0018: callvirt instance string Anytao.Insidenet.MetadataIL.Two/*02000002*/::SayHello() /* 06000001 */
IL_001d: call void [mscorlib/*23000001*/]System.Console/*01000012*/::WriteLine(string) /* 0A000011 */
IL_0022: nop
IL_0023: ret
} // end of method Program::Main
According to the specifications defined by ECMA, the first byte of metadata indicates the referenced metadata table, while the other three bytes indicate the records in the corresponding metadata table, for example, 06000003 indicates that the 000003 main method of the methoddef (06) Table is referenced.
You can use the metadatatoken attribute of type to obtain metadata symbols of the type during runtime reflection. For example:
static void Main(string[] args)
{
Console.WriteLine(typeof(One).MetadataToken);
}
With all the preparations above, we can start to analyze the roles and associations of metadata and Il during program execution.