Understanding metadata and IL (medium) <Article 4>

Source: Internet
Author: User
Tags field table

[24th back: Understanding metadata and IL (I)], we have made necessary preparations for the relationship between these concepts and metadata and IL in PE files, assembly, and hosting modules, at the same time, I am familiar with the basic method cognition of decompilation with ildasm tool. Next I will learn about metadata and IL. Let's continue.

I have been talking about metadata (metadata) and IL (intermediate language) ideas for a long time. At the beginning of this article, I have been very down-to-earth and devoted myself to these two class brothers, although it is not as fast as "First time: resentment: Is and as", metadata and Il are absolutely heavyweight content that deserves our attention at any time. This article is the beginning.

3. What is metadata?

Metadata is the data that describes the data. This concept is not unique in CLR. metadata exists in any relationship between data and data. For example, the Assembly list information is also called Assembly metadata. The metadata of different systems also has its own characteristics, as does. Net metadata. So what does CLR metadata describe? As described above, after compilation, the type information will be saved as metadata in the PE format file .. Net is based on object-oriented, so the main purpose of metadata description is the basic elements of object-oriented: classes, types, attributes, methods, fields, parameters, features, etc., mainly including:

The definition table describes the types and member information defined in the source code, including typedef, mehoddef, fielddef, moduledef, and propertydef.

The reference table describes the types and member information referenced in the source code. The reference element can be other modules of the same assembly or modules of different assembly, including: assemblyref, typeref, moduleref, and methodsref.

A pointer table that uses a pointer table to reference unknown code, including methodptr, fieldptr, and paramptr.

Heap: Information heap stored in the form of stream, mainly including: # string, # blob, # us, # guide, etc.

As described in the previous article, we can use ildasm.exe to decompile and execute the CTRL + M shortcut to obtain the metadata information list used by the Assembly. In. net, each module contains 44 CLR metadata tables, as shown below:

 

Table records Metadata table Description
0 (0) Moduledef Description current Module
1 (0x1) Typeref Description reference type, which saves a record for each reference type
2 (0x2) Typedef Description type definition. Each type will save a record in the typedef table.
3 (0x3) Fieldptr Description field pointer, which is used to find the intermediate table when defining fields of the class.
4 (0x4) Fielddef Description field Definition
5 (0x5) Methodptr Describes the method pointer, which is used to find a table in the middle of a class method.
6 (0x6) Methoddef Description method definition
7 (0x7) Paramptr Description parameter pointer. It refers to the intermediate lookup table when defining the parameters of the class.
8 (0x8) Paramdef Parameter definition of the description method
9 (0x9) Interfaceimpl Describes the types of interfaces implemented.
10 (0xa) Memberref Describes the situation of referencing members. The referenced members can be methods, fields, and attributes.
11 (0xb) Constant Describes the constant values of parameters, fields, and attributes.
12 (0xc) Customattribute Describes the definition of a feature.
13 (0xd) Fieldmarshal Describes the transmission methods of parameters and fields when interacting with unmanaged code.
14 (0xe) Declsecurity Describes the security of classes, methods, and assembly.
15 (0xf) Classlayout Describes the layout information when a class is loaded.
16 (0x10) Fieldlayout Describe the offset or serial number of a single Field
17 (0x11) Standalonesig Description of the signature not referenced by any other table
18 (0x12) Eventmap Description class event list
19 (0x13) Eventptr Describes the event pointer and defines the intermediate lookup table for an event.
20 (0x14) Event Description event
21 (0x15) Propertymap Description class attribute list
22 (0x16) Propertyptr Describes the attribute pointer, which is the intermediate lookup table when defining the attributes of a class.
23 (0x17) Property Description attribute
24 (0x18) Methodsemantics Describes the association of events, attributes, and methods.
25 (0x19) Methodimpl Implementation of the description method
26 (0x1a) Moduleref Description external module reference
27 (0x1b) Typespec Describes the description of typedef or typeref.
28 (0x1c) Implmap Describes the methods of all the unmanaged code used by the Assembly.
29 (0x1d) Fieldrva The extension of the field table. RVA provides the original value position of a field.
30 (0x1e) Enclog Describes which metadata has been modified in the edit-and-continue mode.
31 (0x1f) Encmap Description ing in edit-and-continue Mode
32 (0x20) Assembly Description Assembly definition
33 (0x21) Assemblyprocessor Unused
34 (0x22) Assemblyos Unused
35 (0x23) Assemblyref Description referenced assembly
36 (0x24) Assemblyrefprocessor Unused
37 (0x25) Assemblyrefos Unused
38 (0x26) File Description of external files
39 (0x27) Exportedtype Describes the types of modules in the same assembly.
40 (0x28) Manifestresource Description Resource Information
41 (0x29) Nestedclass Description nested Type Definition
42 (0x2a) Genericparam Describes the generic parameters used by generic type definitions or generic method definitions.
43 (0x2b) Methodspec Describe the instantiation of generic methods
44 (0x2c) Genericparamconstraint Describes the constraints of each generic parameter.

 

 

Then there are six named stacks:

 

Heap Description
# String An ASCII string array referenced by the metadatabase to indicate the method name, field name, class name, variable name, and resource-related string, but does not contain string literals.
# Blob Binary objects that contain metadata references, but do not contain user-defined objects
# Us A Unicode string array contains a string (string literals) defined in the Code. These strings can be directly loaded and obtained by the ldstr command. Do you still remember? Do we discuss the character string creation process in "22nd back: String resident (top)-thinking with questions?
# Guid The GUID value of bytes is saved and referenced by the metadata table.
#~ A special heap that contains all metadata tables references other heaps.
#- Uncompressed #~ Heap. Except for the #-heap, the other heap is compressed.

 

Note: A simple difference between # string and # us is:

String Hello = "Hello, world ";

The variable Hello name is saved in # string, while the string "Hello, world" in the Code is saved in # us.

For a detailed description of metadata information, such as the columns in each table and the relationships between different tables, see [standard ECMA-335] and [the. NET file format].

In the PE file format, metadata has a complex structure. I try to understand the structure and relationship of metadata from the perspective of database management data, so the logical structure of metadata is changed into a metadata table, similar to a database table with a primary key and a sechema, the meta-data table uses the RID (Table index) and metadata to represent a similar concept. The typedef table is used as an example, the Data Reference relationship is also associated with tables such as field, method, and typeref. Other tables have a similar relationship to form a complex database-like structure:

Therefore, metadata stores the compiled data of the type. net program running, we can dynamically obtain metadata information in a reflection manner during the runtime. in the. NET Framework. type, methodinfo, and so on. For example, a simple example of interclass relationship in msdn is taken:

 

For each CLR type, the object. GetType method can be used to return its type, so as to obtain all the runtime metadata information at will:

// Release : code04, 2009/02/21
// Author : Anytao, http://www.anytao.com
// List  : Program.cs
private static void ShowMemberInfo()
{
   var assems = AppDomain.CurrentDomain.GetAssemblies();

   foreach (Assembly ass in assems)
   {
     foreach (Type t in ass.GetTypes())
     {
       foreach (MemberInfo mi in t.GetMembers())
       {
         Console.WriteLine("Name:{0}, Type:{1}", mi.Name, mi.MemberType.ToString());
       }
     }
   }
}

Execute the above method to get a long list and see many familiar symbols :-)

4 What is Il?

Il, also referred to as the pencil or msil, is translated into Chinese as an intermediate language and is fully defined and standardized by the ECMA Organization (Standard ECMA-335. As the name implies, as the name implies, any CLR-compatible compiler generates intermediate language code, which is one of the infrastructure for implementing CLR cross-language. Il is like a bridge. Its Instruction Set exists independently of CPU instructions. It can be translated into local code execution by the JIT compiler at runtime, connecting any advanced language that complies with CLS specifications, is. NET platform provides the most basic support. In [what you must know. net] in the book, I used an entire chapter (chapter 3rd "Everything starts with IL") to introduce the basic content of Il, therefore, the basic content of Il, such as basic types, il analysis methods, Common commands, and basic operations, will not be described in this article. I will only summarize the basic content of Il:

Il is an object-oriented machine language. Therefore, it has all the features of object-oriented language. Class, object, inheritance, polymorphism, and so on are still the basic concepts of IL language.

The IL command is independent of the CPU command, and the CLR converts it to local code through the JIT compilation mechanism.

Il and metadata are important to understanding the operating mechanism of CLR, which is of great significance for us to open the secret of CLR.

For example, in the previous article, I first talked about how to use the ildasm.exe or reflector tool to decompile the hosted code to view its il code. In many cases, the Il code analysis can solve syntactic sugar games hidden by many advanced languages, for example, the automatic attributes, implicit type, anonymous type, and extension method proposed by C #3.0 can quickly find the answer from the Il analysis, so it is necessary to properly understand the Il. Let's take a part in the JIT compilation below to understand the role of IL code in the execution of managed programs.

In addition, metadata describes the static structure, while IL explains the dynamic execution, and the Il code references the metadata table through a 4-byte address. This reference is called a metadata symbol (metadata token, which also records the information of the metadata table. In the ildasm.exe tool, select "show token values" to see the situation in the Il code that the Il Code uses metadata token to reference the metadata table:

.method /*06000003*/ private hidebysig static
     void Main(string[] args) cil managed
{
  .entrypoint
  // Code size    36 (0x24)
  .maxstack 2
  .locals /*11000002*/ init ([0] int32 id,
       [1] class Anytao.Insidenet.MetadataIL.One/*02000004*/ one,
       [2] class Anytao.Insidenet.MetadataIL.Two/*02000002*/ two)
  IL_0000: nop
  IL_0001: ldc.i4.1
  IL_0002: stloc.0
  IL_0003: newobj   instance void Anytao.Insidenet.MetadataIL.One/*02000004*/::.ctor() /* 06000007 */
  IL_0008: stloc.1
  IL_0009: ldloc.1
  IL_000a: ldloc.0
  IL_000b: callvirt  instance void Anytao.Insidenet.MetadataIL.One/*02000004*/::set_ID(int32) /* 06000006 */
  IL_0010: nop
  IL_0011: newobj   instance void Anytao.Insidenet.MetadataIL.Two/*02000002*/::.ctor() /* 06000002 */
  IL_0016: stloc.2
  IL_0017: ldloc.2
  IL_0018: callvirt  instance string Anytao.Insidenet.MetadataIL.Two/*02000002*/::SayHello() /* 06000001 */
  IL_001d: call    void [mscorlib/*23000001*/]System.Console/*01000012*/::WriteLine(string) /* 0A000011 */
  IL_0022: nop
  IL_0023: ret
} // end of method Program::Main

According to the specifications defined by ECMA, the first byte of metadata indicates the referenced metadata table, while the other three bytes indicate the records in the corresponding metadata table, for example, 06000003 indicates that the 000003 main method of the methoddef (06) Table is referenced.

You can use the metadatatoken attribute of type to obtain metadata symbols of the type during runtime reflection. For example:

static void Main(string[] args)
{
   Console.WriteLine(typeof(One).MetadataToken);
}

With all the preparations above, we can start to analyze the roles and associations of metadata and Il during program execution.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.