In-depth exploration of the. NET Framework to understand how CLR creates runtime objects

Last Update:2018-12-05 Source: Internet

Author: User

Tags mscorlib

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Content on this page

The domain created by the CLR Startup Program (bootstrap)

System domain)

Shared domain)

Default domain)

Loader heaps)

Type Principle

Object instance

Method table

Base instance size

Method slot table)

Method description (methoddesc)

Interface virtual table diagram and interface Diagram

Virtual dispatch)

Static variables

Eeclass

Conclusion conclusion

As the general language runtime (CLR) is about to become the preferred architecture for developing applications in windows, a deep understanding of it will help you build effective industrial applications. In this article, we will explore the CLR, including object instance layout, method table layout, method assignment, interface-based assignment, and different data structures.

We will use simple code examples written in C # so that any inherent language syntax is the default definition of C. Some data structures and algorithms discussed here may change in Microsoft. NET Framework 2.0, but the main concepts should remain unchanged. We use the Visual Studio. NET 2003 debugger and debugger extension son of strike (SOS) to view the data structure discussed in this article. SOS understands the internal data structure of Clr and outputs useful information. Refer to the "son of strike" Supplement to learn how to mount SOS. DLL into the process space of the Visual Studio. NET 2003 debugger. This article describes the classes implemented in Shared Source CLI (sscli ).Msdn.microsoft.com/net/sscliDownload. Figure 1 will help you find the reference structure in sscli's code in megabytes.

Before we start, note that the information provided in this article is only applicable to those running on the X86 platform. net Framework 1.1 is effective (for Shared Source CLI 1.0 is also applicable in most cases, except for some interactive operations),. net Framework 2.0 may change, so do not rely on immutability of these internal structures when building software.

The domain created by the CLR Startup Program (bootstrap)

Three application domains are created before the first line of code managed by CLR is executed. Two of them are invisible to the hosted code and even the CLR Host Program (CLR hosts. They can only be created by CLR startup processes, while Shim -- mscoree. dll and mscorwks. dll (mscorsvr. dll in a multi-processor system) are available for CLR startup processes ). AsFigure2As shown in, these domains are system domains and shared domains, both of which use Singleton mode. The third domain is the default appdomain, which is an appdomain instance and the only domain with a name. For simple CLR host programs, such as console programs, the default domain name is composed of the names of executable image files. Other domains can be created using the appdomain. createdomain method in the managed code, or using the icorruntimehost interface in the unmanaged code. Complex host programs, such as ASP. NET, create multiple domains for a specific website based on the number of applications.

Figure2The domain created by the CLR Startup Program

System domain)

The system domain is responsible for creating and initializing shared domains and default application domains. It loads the system library mscorlib. DLL into the shared domain and maintains the implicit or explicit string symbols used within the process range.

String interning is an optimization feature in. NET Framework 1.1. It looks clumsy because CLR does not give the Assembly the opportunity to select this feature. However, this feature saves memory space because only one corresponding string is saved for a specific symbol in all application domains.

The system domain is also responsible for generating the interface ID of the Process range and creating the interface interfacevtablemaps for each application domain. The system domain keeps track of all domains in the process and implements the function of loading and detaching application domains.

Shared domain)

All code that does not belong to any specific domain is loaded to the system library shareddomain. mscorlib, which is required for user code in all application domains. It is automatically loaded to the shared domain. The basic types of the system namespace, such as object, valuetype, array, Enum, string, and delegate, are pre-loaded to the local domain during CLR startup. User code can also be loaded into this domain by using the loaderoptimization feature specified by the CLR host program when you call corbindtoruntimeex. The console program can also load code to the shared domain by declaring the main method using the system. loaderoptimizationattribute feature. The shared domain also manages an assembly ing map that uses the base address as the index. This ing chart is used as a query table for managing the dependency between shared Assembly. These assembly is loaded to the default domain (defaultdomain) and other application domains created in the managed code. Non-shared user code is loaded to the default domain.

Default domain)

The default domain is an instance of the application domain (appdomain), where general application code runs. Although some applications need to create additional application domains at runtime (for example, some applications that use plug-ins, plug-in, architecture, or important runtime code generation tasks ), most applications only create one domain during running. All codes running in this domain have context restrictions at the domain level. If an application has multiple application domains, any inter-domain access will be performed through the. NET remoting proxy. The extra intra-domain context restriction information can be created using a type derived from system. contextboundobject. Each application domain has its own security descriptor (securitydescriptor), security context (securitycontext), default context (defaultcontext), and its own loader heap (high-frequency heap, low-frequency heap and proxy heap), handle table, interface virtual table manager and Assembly cache.

Loader heaps)

The loader heap is used to load CLR components and optimize components that exist throughout the lifecycle of the domain during different runtime periods. The growth of these heaps is based on predictable blocks, which can minimize fragments. The loader heap is different from the garbage collection heap (or multiple heap on the symmetric multi-processor). The garbage collection heap saves the object instance, while the loader heap saves the type system at the same time. Frequently accessed components, such as method tables, method descriptions, domain descriptions, and interface diagrams, are allocated to high-frequency stacks. Less accessed data structures such as eeclass, classloaders, and their query tables, allocated in the low-frequency heap. The proxy heap stores the proxy components used for code access security (CAS), such as com encapsulation calls and platform calls (P/invoke ).

After learning about domains at a high level, we are going to look at their physical details in the context of a simple application, as shown in figure 3. We stop at Mc. Method1 () while running the program, and then use the SOS debugger to extend the command dumpdomain to output domain information. (SeeSon of strikeUnderstand the loading information of SOS ). Here is the output after editing:

!DumpDomainSystem Domain: 793e9d58, LowFrequencyHeap: 793e9dbc,HighFrequencyHeap: 793e9e14, StubHeap: 793e9e6c,Assembly: 0015aa68 [mscorlib], ClassLoader: 0015ab40Shared Domain: 793eb278, LowFrequencyHeap: 793eb2dc,HighFrequencyHeap: 793eb334, StubHeap: 793eb38c,Assembly: 0015aa68 [mscorlib], ClassLoader: 0015ab40Domain 1: 149100, LowFrequencyHeap: 00149164,HighFrequencyHeap: 001491bc, StubHeap: 00149214,Name: Sample1.exe, Assembly: 00164938 [Sample1],ClassLoader: 00164a78

Our console program, sample1.exe, is directed to an application domain named "sample sample1.exe. Mscorlib. dll is loaded to the shared domain, but it is also listed in the system domain because it is the core system library. Each domain is allocated with a high-frequency heap, low-frequency heap, and proxy heap. The system domain and shared domain use the same class loader, while the default application uses its own class loader.

The reserved size and submitted size of the loader heap are not displayed in the output. The initial size of the high-frequency heap is 32 KB and 4 kb is submitted each time. The SOS output does not show the interface virtual table heap (interfacevtablemap ). Each domain has an interface virtual table heap (ivmap for short), which is created by its own loader heap in the domain initialization phase. The size of ivmap is 4 kb, and 4 kb is submitted at the beginning. We will discuss the significance of ivmap in subsequent studies on the type layout.

Figure2Displays the default process heap, JIT code heap, GC heap (for small objects), and large object heap (for objects with a size equal to or greater than 85000 bytes ), it illustrates the semantic differences between these heap and the loader heap. The instant (just-in-time, JIT) compiler generates x86 commands and saves them to the JIT code heap. GC heap and large object heap are the garbage collection heap used to Host Object Instantiation.

Type Principle

Type is the basic unit in. NET programming. In C #, class, struct, and interface keywords can be used to declare the type. Most types are explicitly created by programmers. However, in special InterOP scenarios and remote object calls (. net remoting ,. net CLR implicitly produces types, including COM and runtime callable encapsulation and transmission proxy (runtime callable wrappers and transparent proxies ).

We started to study the. Net Type principle through a stack containing object references (typically, a stack is the place where an object instance begins its life cycle ).Figure4The code displayed in contains a simple program, which has a console entry point and calls a static method. Method1 creates an smallclass type instance that contains a byte array to demonstrate how to create objects in the large object stack. Although this is a boring piece of code, it can help us to discuss it.

Figure5Shows the fastcall stack structure when the create method "Return smallobj;" code line breakpoint is stopped (fastcall. net call specification, which indicates that function parameters are passed through registers when possible, while other parameters are pushed to the stack in the order from right to left, then the function is called to complete the stack operation ). The local value type variable objsize is included in the stack structure. Reference type variables such as smallobj are stored in the stack with a fixed size (4-byte DWORD) and include the addresses of objects allocated in the general GC heap. For traditional C ++, this is the object pointer; in the managed world, it is the reference of the object. In any case, it contains the address of an object instance. We will use the term object instance (objectinstance) to describe the data structure of the Object Reference pointing to the address location.

Figure5Stack structure and heap of simpleprogram

Generally, the smallobj object instance on the GC stack contains a byte array named _ largeobj (note that the size shown in the figure is 85016 bytes, which is the actual storage size ). CLR processes objects larger than or equal to 85000 bytes and small objects differently. Large objects are allocated on the large object stack (LOH), while small objects are created on the general GC stack, which can optimize object allocation and collection. Loh will not be compressed, while GC heap will be compressed during GC collection. Also, LOH will only be recycled during full GC recovery.

The object instance of smallobj contains a typehandle and points to a method table of the corresponding type. Each declared type has a method table, and all object instances of the same type point to the same method table. It contains the type of feature information (interface, abstract class, specific class, com encapsulation and proxy), the number of Implemented interfaces, the interface Diagram for interface assignment, and the slot of the method table) number, pointing to the corresponding implementation of the slot table.

The method table points to an important data structure named eeclass. Before creating a method table, the CLR Class Loader creates an eeclass from metadata. In Figure 4, The smallclass method table points to its eeclass. These structures point to their modules and assembly. Method tables and eeclass are generally allocated to the loader heap in the shared domain. The loader heap is associated with the application domain. Once the data structure mentioned here is loaded, it will not disappear until the application domain is detached. Moreover, the default application domain will not be uninstalled, so the lifetime of the Code is until the CLR is disabled.

Object instance

As we have said, all value-type instances are either included on the thread stack or the GC stack. All reference types are created on the GC heap or LOH. Figure 6 shows a typical object layout. An object can be referenced through the following channels: the handle table and register (this pointer and method parameter during method execution) for interactive operations or platform calls based on stack-based local variables ), the Terminator queue of objects that have the finalizer method. Objectref does not point to the starting position of the object instance, but has a DWORD offset (4 bytes ). This DWORD is called an object header. It stores an index pointing to the synctableentry table (the syncblk number counted from 1. Because indexes are used for join, the CLR can move the table in memory when the table size needs to be increased. Synctableentry maintains a reverse weak reference so that CLR can track the ownership of syncblock. Weak references allow GC to recycle objects when no other strong references exist. Synctableentry also saves a pointer to syncblock, which contains useful information that is rarely used by all instances of an object. This information includes the object lock, hash encoding, any conversion layer (thunking) data and application domain indexing. Most object instances do not allocate memory for the actual syncblock, And the syncblk number is 0. This will change when the execution thread encounters a statement such as lock (OBJ) or obj. gethashcode, as shown below:


SmallClass obj = new SmallClass()// Do some work herelock(obj) { /* Do some synchronized work here */ }obj.GetHashCode();

In the above Code, smallobj uses 0 as its initial syncblk number. The lock statement allows the CLR to create a syncblk entry and update the object header with the corresponding value. Because the lock keyword of C # is extended to the try-finally statement and the monitor class is used, a monitor object used for synchronization is created on syncblk. Heap gethashcode calls will use the object's hash code to add syncblk.

There are other domains in syncblock, which are used for com interactive operations and sending delegate (delegation aling delegates) to unmanaged code, but this is irrelevant to the use of typical objects.

The Type handle is followed by the syncblk number in the object instance. To maintain continuity, I will discuss the type handle after describing the instance variables. The instance field Variable list follows the type handle. By default, instance domains are arranged in the most effective memory usage mode, so that you only need to use the minimum number of padding bytes for alignment. The code in Figure 7 shows that simpleclass contains some instance variables of different sizes.

Figure 8 shows a simpleclass object instance in the memory window of the Visual Studio debugger. We set a breakpoint in the return statement in figure 7, and then use the simpleobj address saved in the ECX register to display the object instance in the memory window. The first four bytes are syncblk numbers. Because we didn't use this instance with any synchronization code (and didn't access its hash code), the syncblk number is 0. The object instance stored in the stack variable points to the four-byte offset at the starting position. Byte variables B1, B2, B3, and B4 are arranged one by one. The two short type variables S1 and S2 are also arranged together. The string variable STR is a 4-byte objectref that points to the actual string instance allocated in the GC heap. A string is a special type, because all strings containing the same text symbols will point to the same instance of a global string table when the Assembly is loaded to the process. This process is called string interning and is designed to optimize memory usage. We have mentioned earlier that in Net Framework 1.1, the Assembly cannot choose whether to use this process, although the CLR may provide such capabilities in future versions.

By default, the dictionary sequence of member variables in the source code is not kept in the memory. In interactive operations, the dictionary order must be saved to the memory. You can use structlayoutattribute, which has a layoutkind Enumeration type as a parameter. Layoutkind. sequential can maintain the Lexicographic Order for the encapsulated data, even if. in. NET Framework 1.1, it does not affect the managed layout (however. net Framework 2.0 may do this ). In interactive operations, layoutkind. Explicit can be used with the fieldoffset feature at the domain level if you really need to fill in additional bytes and display the order of control domains.

After reading the underlying memory content, we use SOS to check the object instance. A useful command is dumpheap, which can list all heap content and all instances of a special type. No dependency register is required. dumpheap can display the address of the unique instance we created.

!DumpHeap -type SimpleClassLoaded Son of Strike data table version 5 from"C:\WINDOWS\Microsoft.NET\Framework\v1.1.4322\mscorwks.dll"Address       MT     Size00a8197c 00955124       36Last good object: 00a819a0total 1 objectsStatistics:MT    Count TotalSize Class Name955124        1        36 SimpleClass

The total object size is 36 bytes. No matter how large the string is, simpleclass instances only contain a DWORD object reference. Simpleclass instance variables only occupy 28 bytes. The other 8 bytes include the type handle (4 bytes) and syncblk number (4 bytes ). After finding the address of the simpleobj instance, we can use the dumpobj command to output its content, as shown below:

!DumpObj 0x00a8197cName: SimpleClassMethodTable 0x00955124EEClass 0x02ca33b0Size 36(0x24) bytesFieldDesc*: 00955064MT    Field   Offset                 Type       Attr    Value Name00955124  400000a        4         System.Int64   instance      31 l100955124  400000b        c                CLASS   instance 00a819a0 str<< some fields omitted from the display for brevity >>00955124  4000003       1e          System.Byte   instance        3 b300955124  4000004       1f          System.Byte   instance        4 b4

As mentioned before, the C # compiler uses layouttype. Auto for the default layout of the class (layouttype. Sequential for the structure). Therefore, the Class Loader rearranges the instance domain to minimize the padding bytes. We can use objsize to output the space occupied by STR instances, as shown below:

!ObjSize 0x00a8197csizeof(00a8197c) =       72 (    0x48) bytes (SimpleClass)

If you subtract the simpleclass size (36 bytes) from the global size (72 bytes) of the object graph, you can get the STR size, that is, 36 bytes. Let's output the STR instance to verify the result:

!DumpObj 0x00a819a0Name: System.StringMethodTable 0x009742d8EEClass 0x02c4c6c4Size 36(0x24) bytes

If you add the size of the string instance (36 bytes) to the simpleclass instance (36 bytes), you can get the total size of the objsize Command report 72 bytes.

Note that the objsize does not contain the memory occupied by the syncblk structure. In. NET Framework 1.1, CLR does not know the memory occupied by unmanaged resources, such as GDI objects, COM objects, and file handles. Therefore, they are not reported by this command.

The Type handle pointing to the method table is assigned after the syncblk number. Before creating an object instance, CLR can view the loading type. If no loading type is found, it is loaded to obtain the method table address, create an object instance, and append the type handle value to the object instance. The code generated by the JIT compiler uses the type handle to locate the method table during method dispatch. The CLR uses a type handle when it needs to use a method table to access the load type in reverse direction.

Method table

When each class or instance is loaded into an application domain, it is expressed in the memory through a method table. This is the result of the class loading activity before the first instance of the object is created. The object instance indicates the status, while the method table indicates the behavior. Through eeclass, the method table binds the object instance to the metadata structure (metadata structures) mapped to the memory generated by the Language compiler ). The information contained in the method table and the plug-in information can be accessed through system. type. The pointer to the method table can be obtained through the type. runtimetypehandle attribute in the managed code. The Type handle of the object instance points to the offset at the beginning of the method table. The offset is 12 bytes by default, and contains GC information. We do not intend to discuss it here.

Figure 9 shows the typical layout of a method table. Some important fields of the type handle are described. For a complete list, see this figure. Let's start with the base instance size because it is directly related to the memory status at runtime.

Base instance size

The base instance size is the size of the object calculated by the Class Loader Based on the fields declared in the code. I have discussed it before. CurrentlyGC requires an object instance of at least 12 bytes. If a class does not define any instance domain, it contains at least four additional bytes. The other 8 bytes are occupied by the object header (which may contain the syncblk number) and type handle. Again, the object size is affected by structlayoutattribute.

Take a look at the memory snapshot of the myclass (with two interfaces) method table shown in Figure 3 (Visual Studio. NET 2003 Memory Window) and compare it with the SOS output. In Figure 9, the object size is at a 4-byte offset with a value of 12 (0x0000000c) bytes. The following is the output of the SOS dumpheap command:

!DumpHeap -type MyClassAddress       MT     Size00a819ac 009552a0       12total 1 objectsStatistics:MT  Count TotalSize Class Name9552a0      1        12    MyClass

Method slot table)

The method table contains a slot table pointing to the description of each method (Methoddesc), provides type behavior capabilities. The method slot table is a linear linked list based on methods. It is arranged in the following order: inherited virtual methods, introduced virtual methods, instance methods, and static methods.

The Class Loader traverses the metadata of the current class, parent class, and interface, and then creates a method table. During the arrangement, it replaces all the covered Virtual Methods and hidden parent class methods, creates a new slot, and copies the slot as needed. Slot replication is required, which allows each interface to have its own minimum vtable. However, the replicated slots point to the same physical implementation. Myclass contains interface methods, a class constructor, and an object constructor (. ctor ). The object constructor is automatically generated by the C # compiler for all objects without explicitly defining constructor. Because we define and initialize a static variable, the compiler will generate a class constructor.Figure10Displays the layout of the myclass method table. The layout shows 10 methods, because the method2 slot copies the ivmap interface, which will be discussed below. Figure 11 shows the SOS output of the myclass method table.

The first four methods of any type are always tostring, equals, gethashcode, and finalize. These are virtual methods inherited from system. object. The method2 slot is copied, but all points to the same method description. The Code shows that the defined. cctor and. ctor are divided into static methods and instance methods.

Method description (methoddesc)

Methoddesc is an encapsulation of the methods implemented by CLR. There are several types of methods to describe, in addition to managed implementations, they are used for calling different Interactive Operation implementations. In this article, we only examine the managed method descriptions in Figure 3 code. Method description is generated during class loading and initialized to point to Il. Each method description has a prejitstub agent that triggers JIT compilation. Figure 12 shows a typical layout. the slot of the method table actually points to the proxy instead of the actual method to describe the data structure. For the actual method description, This is the offset of-5 bytes, which is part of the 8 additional bytes of each method. These five bytes contain instructions for calling the pre-compiled agent. The 5-byte offset can be seen from the dumpmt output of SOS, because the method description is always the 5 bytes behind the position pointed to by the method slot table. During the first call, the JIT compilation program is called. After compilation, the five bytes that contain the Call Command will be overwritten by the unconditional jump command that redirects to the x86 Code Compiled by JIT.

Figure12Method description

Disassemble the code pointed to by the method table slot in figure 12, and display the call to the pre-compiled proxy. The following is a simplified display of disassembly before method2 is compiled by JIT.

!u 0x00955263Unmanaged code00955263 call        003C3538        ;call to the jitted Method2()00955268 add         eax,68040000h   ;ignore this and the rest;as !u thinks it as code

Now we execute this method and decompile the same address:

!u 0x00955263Unmanaged code00955263 jmp     02C633E8        ;call to the jitted Method2()00955268 add     eax,0E8040000h  ;ignore this and the rest;as !u thinks it as code

At this address, only the first five bytes are the code, and the remaining bytes contain the data described by the method2 method. "! The U command does not know this, so the generated code is disordered. You can ignore everything after five bytes.

Before JIT compilation, codeoril includes the relative virtual address (RVA) implemented by the Il method ). This field is used as a flag to indicate whether it is Il. After compilation as required, CLR updates this domain using the compiled code address. Let's select one from the listed functions and use the dumpmt command to output the description of the methods before and after JIT compilation:

!DumpMD 0x00955268Method Name : [DEFAULT] [hasThis] Void MyClass.Method2()MethodTable 9552a0Module: 164008mdToken: 06000006Flags : 400IL RVA : 00002068

After compilation, the method description is as follows:

!DumpMD 0x00955268Method Name : [DEFAULT] [hasThis] Void MyClass.Method2()MethodTable 9552a0Module: 164008mdToken: 06000006Flags : 400Method VA : 02c633e8

The encoding of this flag field of the method includes the method type, such as static, instance, interface method, or com implementation. Let's look at another complicated aspect of the method table: interface implementation. It encapsulates all the complexity of the layout process and makes the hosting environment feel simple. Then, we will explain how the interface is laid out and how the interface-Based Method dispatching works exactly.

Interface virtual table diagram and interface Diagram

InThe 12-byte offset is an important pointer. It is an interface virtual table (ivmap ). As shown in Figure 9, the interface virtual table points to a ing table at the application domain level, which uses the interface ID at the process level as the index. The Interface ID is created when the interface type is loaded for the first time. Each interface has a record in the interface virtual table. If myinterface1 is implemented by two classes, there are two records in the interface virtual table. This record points to the start position of the child table in the myclass method table, as shown in figure 9. This is the reference used for interface method dispatch. The interface virtual table is created based on the Interface diagram information contained in the method table. The interface diagram is created based on class metadata during the layout of the method table. Once the type is loaded, only the interface virtual table is used for method assignment.

The 28th-byte interface graph points to the interface information records contained in the method table. In this case, each of the two interfaces implemented by myclass has two records. The first four bytes of the interface information record point to the type handle of myinterface1 (see Figure 9 and figure 10 ). The next word (2 bytes) is occupied by a flag (0 indicates that it is derived from the parent class, and 1 indicates that it is implemented by the current class ). The word after the mark is a start slot, which is used by the class loader to layout sub-tables implemented by the interface. For myinterface2, the start slot value is 4 (starting from 0), so slot 5 and 6 point to implementation. For myinterface2, the start slot value is 6, therefore, slots 7 and 8 point to implementation. The classloader will copy the slot as needed to achieve this effect: each interface has its own implementation, but physical ing to the same method description. In myclass, myinterface1.method2 and myinterface2.method2 point to the same implementation.

Interface-Based Method assignment is performed through the interface virtual table, while direct method assignment is performed by the method description address stored in each slot. As mentioned earlier, the. NET Framework uses the fastcall call Convention. The first two parameters are generally transmitted through the ECX and EDX registers when possible. The first parameter of the instance method is always the this pointer. Therefore, you can see this in the "mov ECx, ESI" statement through ECx register transfer:

mi1.Method1();mov    ecx,edi                 ;move "this" pointer into ecxmov    eax,dword ptr [ecx]     ;move "TypeHandle" into eaxmov    eax,dword ptr [eax+0Ch] ;move IVMap address into eax at offset 12mov    eax,dword ptr [eax+30h] ;move the ifc impl start slot into eaxcall   dword ptr [eax]         ;call Method1mc.Method1();mov    ecx,esi                 ;move "this" pointer into ecxcmp    dword ptr [ecx],ecx     ;compare and set flagscall   dword ptr ds:[009552D8h];directly call Method1

The disassembly shows that the instance method that directly calls myclass does not use an offset. The JIT compiler directly writes the address of the method description to the code. Interface-based dispatch occurs through the interface virtual table, which requires some additional commands than direct dispatch. One command is used to obtain the address of the interface virtual table, and the other is used to obtain the interface implementation start slot in the method slot table. In addition, to convert an object instance to an interface, you only need to copy the this pointer to the target variable. In Figure 2, the statement "Mi1 = mc" uses a command to copy the object reference of MC to mi1.

Virtual dispatch)

Now let's look at the virtual assignment and compare it with the interface-based assignment. Below isIn Figure 3, the disassembly code of myclass. method3's virtual function call is as follows:

mc.Method3();Mov    ecx,esi               ;move "this" pointer into ecxMov    eax,dword ptr [ecx]   ;acquire the MethodTable addressCall   dword ptr [eax+44h]   ;dispatch to the method at offset 0x44

Virtual allocation always occurs through a fixed slot number, and has nothing to do with the hierarchical implementation of the method table pointer in a specific class (type. In the layout of the method table, the Class Loader replaces the implementation of the parent class with the implementation of the override subclass. Result: The method call to the parent object is assigned to the sub-object. The disassembly shows that the distribution occurs through slot 8, which can be seen in the memory window (10) of the debugger and the output of dumpmt.

Static variables

Static variables are an important part of the data structure of the method table. As part of the method table, they are allocated after the slot array of the method table. All the original static types are inline. For static value objects of the structure and reference types, reference is directed to the objects created in the handle table. Object reference in the method table refers to the object reference in the handle table of the application domain, which references the object instance created on the stack. Once created, the object reference in the handle table will survive the object instance on the stack until the application domain is detached. InIn Figure 9, the static string variable STR points to the object reference of the handle table, and the latter points to the mystring on the GC stack.

Eeclass

Eeclass starts to survive before the method table is created. It is combined with the method table and is the CLR version of the type declaration. In fact, the eeclass and the method table are logically a data structure (they represent a type together), but are separated by different frequencies. Frequently Used fields are placed in the method table, and less frequently used fields are in eeclass. In this way, the information (such as name, field, and offset) that needs to be used by the JIT compilation function is in eeclass, but the information required during running (such as the virtual table slot and GC Information) in the method table.

For each type, an eeclass is loaded to the application domain, including interfaces, classes, abstract classes, arrays, and structures. Each eeclass is a node of the tree tracked by the execution engine. CLR uses this network to browse the eeclass structure, including class loading, method table layout, type verification, and type conversion. The child-parent relationship of eeclass is established based on the inheritance hierarchy, while the parent-child relationship is based on the combination of interface hierarchy and class loading sequence. During the execution of the managed code, the new eeclass node is added, the node relationship is supplemented, and the new relationship is established. In the network, the adjacent eeclass has a horizontal relationship. Eeclass has three fields used to manage the node relationships of the loaded type: parent class, sibling chain, and children chain ). For more information about the eeclass semantics in the myclass context in figure 4, see Figure 13.

Figure13Only some fields related to this discussion are displayed. Because we ignore some fields in the layout, we do not show the exact offset in the figure. Eeclass has an indirect reference to the method table. Eeclass also points to the method description block for high-frequency heap allocation in the default application domain. When a method table is created, a reference to the domain description list allocated on the process stack provides the layout information of the domain. The eeclass allocates low-frequency heap Resources in the application domain, so that the operating system can better manage the memory by PAGE, thus reducing the working set.

Figure13Eeclass Layout

The context of other fields in Figure 13 in myclass (Figure 3) is self-evident. Now let's look at the real physical memory of the eeclass Output Using SOS. Run the program in Figure 3 after the breakpoint is set in the MC. Method1 code line. First, run the name2ee command to obtain the eeclass address of myclass.

!Name2EE C:\Working\test\ClrInternals\Sample1.exe MyClassMethodTable: 009552a0EEClass: 02ca3508Name: MyClass

The first parameter of name2ee is the module name, which can be obtained from the dumpdomain command. Now we get the eeclass address, and we output the eeclass:

!DumpClass 02ca3508Class Name : MyClass, mdToken : 02000004, Parent Class : 02c4c3e4ClassLoader : 00163ad8, Method Table : 009552a0, Vtable Slots : 8Total Method Slots : a, NumInstanceFields: 0,NumStaticFields: 2,FieldDesc*: 00955224MT    Field   Offset  Type           Attr    Value    Name009552a0  4000001   2c      CLASS          static 00a8198c  str009552a0  4000002   30      System.UInt32  static aaaaaaaa  ui

Figure13The output is exactly the same as that of dumpclass. The metadata token (mdtoken) indicates the myclass index of the metadata table mapped to the memory in the module PE file. The parent class points to system. object. From the adjacent chain to the eeclass named program, you can see that figure 13 shows the results when the program is loaded.

Myclass has eight virtual table slots (which can be allocated by virtual assignment ). Even if Method1 and method2 are not virtual methods, they can be considered virtual functions and added to the list when being distributed through interfaces. Add. cctor and. ctor to the list and you will get a total of 10 methods. The two static fields of the class are listed at the end. Myclass does not have an instance domain. Other domains are self-explanatory.

Conclusion conclusion

Some of our most important internal exploration of CLR is finally over. Obviously, there are still many issues that need to be discussed at a deeper level, but we hope this will help you see how things work. A lot of information provided here may be changed in the. NET Framework and later versions of CLR. However, although the CLR data structure mentioned in this article may change, the concept should remain unchanged.

Hanu kommalapatiHe is an architect in the Microsoft Gulf Coast region (Houston. His current role at Microsoft is to help customers build scalable component frameworks based on the. NET Framework. You can useHanuk@microsoft.comContact him.

Tom ChristianIs a senior engineer of Microsoft development support, using ASP. NET and. Net debugger extension (SOS/psscor) for windbg ). He's in Charlotte, North Carolina.Tomchris@microsoft.comContact him.

Luke is a software engineer at Microsoft and is used to developing applications using C ++ and C. In his spare time, he enjoys music, travel, and nostalgia games, and is willing to help msdn translate more articles and share them with other developers. You can contact him via ecaijw@msn.com.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More