Interaction during runtime (PE, Windows loader, application domain, assembly list, metadata, type, object, thread stack, managed heap)

Source: Internet
Author: User
This article will explain PE, Windows loader, and ApplicationProgramDomain, assembly list, metadata, type, object, thread stack, managed heap, and so on. Therefore, I first wrote a simple demo for debugging.CodeAs follows:
 Using System;

Namespace Clrtest
{
Public Class Circle
{
Public Double Radius { Get ; Set ;}

Public Circle (){}

Public Circle ( Double R)
{
This . Radius = R;
}

Public Double Getcircumference ()
{
Return 2 * Math. Pi * radius;
}

Public Double Getarea ()
{
Return Math. Pi * Math. Pow ( This . Radius, 2.0 );
}

Public Override String Tostring ()
{
Return String . Format ( " RADIUS: {0} perimeter: {1} area: {2} " , This . Radius, This . Getcircumference (), This . Getarea ());
}
}
}
Copy code
UsingSystem;

NamespaceClrtest
{
ClassProgram
{
Static VoidMain (String[] ARGs)
{
Circle =NewCircle (4.0);
Console. writeline (circle. tostring ());
Console. readkey ();
}
}
}
Copy code

1. Load. Net assembly

Programs running on Windows can be started in different ways. Windows is responsible for all related work, including setting the process address space, loading executable programs, and instructing the processor to start execution. When the processor starts executing program commands, it will continue to execute until the process exits.

Now we can expand our understanding of PE files. The PE format is the file format of Windows executable programs. The executable programs include :*. EXE ,*. DLL ,*. OBJ ,*. SYS. To support. net, added support for the assembly in the PE file format. The PE file format is as follows:

To support PE image execution, the header of the PE contains a domain called addressofentrypoint. This field indicates the position of the PE file entry point (entrypoint. In the. NET assembly, this value points to a small segment of the. Text (stub) Code ("JMP _ corexemain "). When the. NET compiler generates an assembly, it adds a data directory item to the PE file. Specifically, the index of this data directory item is 15, which contains the location and size of the CLR header. Then, locate the CLR header in the. Text Segment in the PE file based on this location. The CLR header contains a structure image_cor20_header. This structure contains a lot of information, such as the entry point of the managed code application, the main version number and slave version number of the target CLR, and the strong name signature of the Assembly. According to the information contained in this structure, Windows can know which version of CLR to load and some information about the Assembly itself. The. text section also contains the metadata table of the assembly, Il, And the unmanaged startup stub code. The unmanaged startup stub Code contains the code executed by the Windows loader to start the PE file.

When Windows loads a. Net assembly, the _ corexemain (or _ cordllmain) function of mscorss. dll is called first to start CLR. Mscorss. dll will execute a series of operations when starting CLR:

(1) Check the metadata in the PE file (specifically, majorruntimeversion and minorruntimeversion In the CLR header) to find out which version of the. NET assembly is built based on CLR.

(2) Find the correct CLR version path in the OS.

(3) load and initialize CLR.

After the CLR is initialized, you can find the Assembly entry point (main () in the CLR header of the PE file ()). Then, JIT starts compiling and executes the entry point.

To sum up, the loading steps of the. NET assembly are as follows:

(1) execute a. Net assembly.

(2) view the addressofentrypoint domain in the Windows loader and find the. Text Segment in the PE file.

(3) The byte at addressofentrypoint is a JMP command used to jump to an import function in mscorss. dll.

(4) Transfer the execution control to function _ corexemain in mscorss. dll. This function starts Clr and transfers the execution control to the Assembly entry point.

Note: in Windows XP and later versions, the loader is optimized to identify whether a PE file is a. Net assembly. In this way, when loading a. Net assembly, you no longer need to use the stub function to call the mscorss. dll import function, but to automatically load CLR.

2. Application domain

Windows uses processes to isolate applications. On this basis,. Net further introduces another logic isolation layer, that is, the application domain. The overhead of constructing and managing processes is very high, and the application domain greatly reduces the overhead required when creating and destroying the isolation layer.

The relationship between processes and application domains is as follows:

One or more application domains are defined in any Windows process that starts CLR, including executable code, data, metadata structure, and resources. In addition to the protection mechanism of the process itself, the application domain further attracts the following protection mechanisms:

    • The error code in one application domain does not affect the code running in another application domain in the same process.
    • Code in one application domain cannot directly access resources in another application domain.
    • Each application domain can be configured with code-specific information, such as security settings.

For applications without explicitly creating application domains, CLR creates three application domains: system application domains, shared application domains, and default application domains.

(1) system application domain

The main functions of the system application domain are as follows:

    • Create two other application domains (shared application domains and default application domains ).
    • Load mscorss. DLL to the shared application domain.
    • Record all other application domains in the process, including functions such as loading and detaching application domains.
    • Record the string constants in the string pool, so any string can have a copy in each process.
    • Initialize an exception of a specific type.

(2) Sharing application domains

The shared application domain contains code unrelated to the application domain. Mscorss. dll will be loaded into this application domain, and some basic types (such as eg. String and array) in the system namespace will also be included ). In most cases, non-user code is loaded into the shared application domain. Application domains with CLR enabled can inject user code through the loader's optimized attributes.

(3) default application domain

Typically,. net programs run in the default application domain. All codes in the default application domain are valid only in this domain. Because the application domain implements a logic and reliable boundary, any access operations that span the application domain must be performed through a. NET remote object.

The application domain information of the demo created at the beginning of this article is displayed:

3. Resolution type reference

When an application is running, the CLR loads and initializes it. Then the CLR reads the CLR header of the assembly and finds the methoddeftoken that identifies the method (main () of the application entry. Then, CLR searches the methoddef meta-data table, finds the offset of the Il code in the file of the method, and compiles the Il code JIT into local code. The code is verified during compilation to ensure type security. Finally, the local code is executed. During JIT compilation, CLR checks all references to types and members and loads the Assembly that defines them (if not loaded). CLR must locate and load the assembly. When parsing a referenced type, CLR may find the type in the following three places:

    • Same file
    • Different files, same Assembly
    • Different files, different assembly

If any errors are sent when parsing a type reference, such as file not found, file not loaded, and Hash Value Mismatch, an exception is thrown. Demonstrate the process of Type Binding:

(Note that moduledef, moduleref, and filedef metadata tables use file names and extensions to reference files. Assemblyref uses a file name without an extension to reference an assembly. To bind an assembly, the system tries to locate the file through the probe directory .)

For CLR, all assemblies are identified by names, versions, languages, cultures, and public keys. However, GAC identifies an assembly based on its name, version, language and culture, public key, and CPU architecture. When you search for an assembly in GAC, CLR determines the type of process that the application is currently running (32-bit or 64-bit ). Then, CLR first searches for the CPU architecture-specific version of the Assembly. If no version is found, it searches for the version that does not distinguish the CPU.

Four Types

Type is the Basic Programming Unit in the. NET program. In. NET applications, either use a custom type or use an existing type. There are two types: Value Type and reference type. Value type refers to the type stored on the thread stack, including enumeration, structure, and simple class type (such as int, bool, Char, etc ). Generally, the value type is a type that occupies a small amount of memory space. Another type is the reference type, which is allocated on the stack and managed by the garbage collector (GC. The reference type can also contain the value type. In this case, the value type is also located on the stack and managed by the garbage collector.

The structure of objects on the managed stack is as follows:

Each object instance on the managed stack contains the following information:

    • Sync block: A synchronization block can be a single-bit mask or an index in a synchronized block table maintained by CLR, which contains auxiliary information about the object.
    • Type handle: A type handle is the basic unit of a CLR type system and can be used to fully describe the type on the managed stack.
    • Object instance: After synchronizing the block index and type handle, it is followed by the actual object data.

Displays the contents of the demo's circle object:

(1) synchronize block tables

Each object on the managed Stack has a synchronized block index before it points to the synchronized block table on the private stack in CLR. The synchronization block table contains pointers to each synchronization block, which contains a lot of information, such as Object locks, interoperability data, application domain indexes, and object hash code. Of course, the object may not contain any synchronized block data. At this time, the synchronized block index value is 0. Note that the synchronization block does not necessarily contain only simple indexes, but also other auxiliary information of the object.

(When Using indexes, you must note that CLR can freely move/increase the synchronization block table, but not necessarily adjust the headers of all objects containing the synchronization block .)

(2) Type handle

All instances of the reference type are placed on the managed stack, which is controlled by GC. All instances contain a type handle. To put it simply, the type handle points to a type of method table. The method table contains various metadata that fully describe this type. The overall memory layout of the table is described as follows:

A type handle is a binder in a CLR system that associates an object instance with all its associated data types. The Type handle of the object instance is stored on the managed stack. It is a pointer pointing to the type method table. The method table contains a large amount of information about object types, including pointers to other key CLR data structures (such as eeclass. The first type of data pointed to by the Type handle contains information about the type itself (such as the flag, size, number of methods, and parent method table ). The next domain to be noted is a finger pointing to an eeclass. The next part of the method table is also a pointer pointing to the module information related to the type. A virtual table of the type is included in the remaining domain. Note that some method pointers in the method table may point to unmanaged code. The reason for this is that some methods may not be compiled by the JIT compiler. In fact, the JIT stub code that starts the compilation process is an unmanaged code. When the method is not compiled by the JIT compiler, it points to this unmanaged code, after compilation, the execution control is transferred to the newly compiled code.

(3) method Descriptor

The method table contains the virtual method table, which contains some pointers to the code hidden behind the type method. The virtual method table contains pointers to code. These methods can be described by themselves, thanks to the method descriptor. The method descriptor contains detailed information about the method, such as the text representation of the method, its module, tag, and the Code address of the implementation method.

Shows the method table and method descriptor of the demo's circle object:

Check the Il of the getcircumference method:

Further obtain method information:

(4) Module

View the information of the module where the circle type is located:

(5) Metadata tag

CLR metadata is stored as a table in the runtime engine. The metadata tag is a 4-byte value. Its layout is as follows:

View the method table of the circle to see the metadata Tag:

The metadata tag with a value of 02000004 can be interpreted as pointing to the 4th indexes in the Type Definition table.

(6) eeclass

The eeclass data structure can be seen as a logical equivalent of a method table. Therefore, it can be used as a mechanism to implement self-descriptive CLR-type systems. Essentially, eeclass and method tables are two completely different structures, but logically, they all represent the same concept. The two data structures are divided because the CLR uses different types of domains frequently. Frequently Used fields are saved to the method table, and infrequently used fields are saved to eeclass. The general structure of eeclass is as follows:

The hierarchy in C # also applies to eeclass. When the CLR loads a type, an eeclass node hierarchy is created, which contains pointers to the parent and sibling nodes, so that the entire hierarchy can be traversed. The method description block field in eeclass contains a pointer pointing to the first group of method descriptors in the type, so that the method descriptors of any type can be traversed. Each group of method descriptors contains pointers pointing to the next group of method descriptors in the linked list.

View the eeclass of the circle:

5. Interaction between objects, thread stacks, and managed stacks during runtime

When running the demo, a process is started, because the program itself is a single thread and only one thread exists. A thread is allocated to a 1 MB stack when it is created. The stack space is used to pass real parameters to the method and to call local variables defined inside the method.

Now, the Windows process has been started, the CLR has been loaded to it, the managed heap has been initialized, and a thread has been created (along with its 1 MB stack space ). Now that you have entered the main () method, you are about to execute the statement in main immediately, so the status of the stack and heap is shown in (for simplicity, I only drew a custom type ):

When the JIT compiler converts the Il code of the main () method to a local CPU instruction, it will notice all types referenced internally. At this time, the CLR should ensure that all the assemblies that define these types have been loaded. Then, using the Assembly metadata, CLR extracts information related to these types and creates some data structures to represent the types themselves. All objects are created before the thread executes local code. Displays the status after the type object is created when the main is called:

When the CLR determines that all types of objects required by the method have been created and the main code has been compiled, the thread is allowed to start executing the locally compiled code. "Circle = new circle (4.0);" is executed first. This creates a local variable of the circle type and assigns a value to it. When the constructor is called, the circle instance is created in the managed heap. When a new object is created on the stack, CLR will automatically initialize the internal type object pointer member and reference it to the type object corresponding to the object. In addition, CLR initializes the synchronization block index, sets all instance fields of the object to null or 0, and then calls the Type constructor. The new operator returns the memory address of the circle object, which is saved in the local variable circle (on the thread stack ). The status is as follows:

Then run "console. writeline (circle. tostring ());". The tostring () method is a virtual method. When calling a virtual method, the JIT compiler must generate some additional code in the method, which is executed every time the method is called. The code first checks the variables that make the call, and then follows the address to the called object. In this example, the variable circle references an object of the circle type. Then, the code checks the "type handle" member inside the object, which points to the actual type of the object. Then, the Code searches for record items that reference the called method in the method table of the type object, compiles the method with JIT (if needed), and calls the Code Compiled by JIT. In this example, the tostring Implementation of the circle type is called. (When a non-virtual method is called, the JIT compiler will find the type object corresponding to the called object type. If this type does not define that method, the JIT compiler will trace back the class hierarchy to the object and search for this method in each type along the way .)

Writeline (string) is a static method. When a static method is called, CLR locates the type object corresponding to the type of the static method. Then, the JIT compiler searches for the record items corresponding to the called method in the method table of the type object, compiles the method with JIT (if needed), and then calls the Code Compiled by JIT. To sum up, the operation result of "console. writeline (circle. tostring ();" is shown in:

Finally, Run "console. readkey ();", which is similar to writeline (string). We will not repeat it here. We can see that the circle type object also contains the "type handle" member. This is because the type object is essentially an object. When creating a type object, CLR must initialize these members. When CLR starts running in a process, it immediately creates a special type object for the system. Type type defined in mscorlib. dll. The Circle Type object is an instance of this type. Therefore, during initialization, the type handle of the circle type object is initialized to reference the system. Type object. As shown in:

The system. Type object itself is also an object, and the internal type handle points to itself. The GetType method of system. Object returns the type handle (a pointer) stored in the specified object ).

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.