What is Object?
. NET programmers are dealing with object every day.
If you ask A. NET programmer what is object, he may swear to tell you "object is not simple, is all types of base class"
The answer is right, but it's not enough to say what the object really is.
In this article we will read CORECLR's source code to understand the structure of object in memory and actually to see object in memory
The structure of object in memory
To make it easier to understand what's going on, I'll use a diagram to illustrate the structure of object in memory.
. NET contains these three pieces of object
- Pointer to head
- Pointer to type information
- Field contents
Microsoft has a more complete picture (which illustrates the structure of the. NET Framework, but essentially the same as. NET Core)
Source code parsing of object
Definition of object (abstract)
Source code: Https://github.com/dotnet/coreclr/blob/master/src/vm/object.h
class Object{ PTR_MethodTable m_pMethTab;}
Ptr_methodtable definition, dptr is a pointer to the wrapper class, you can first understand the equivalence of methodtable*
Source code: Https://github.com/dotnet/coreclr/blob/master/src/vm/common.h
typedef DPTR(class MethodTable) PTR_MethodTable;
In the definition of object we see only one member, which is a pointer to the type information , what about the other two parts?
This is the function that gets the pointer to the head , and we can see that the pointer is just in front of the object.
PTR_ObjHeader GetHeader(){ LIMITED_METHOD_DAC_CONTRACT; return dac_cast<PTR_ObjHeader>(this) - 1;}
This is the function that gets the contents of the field , and we can see that the field content is just behind the object.
PTR_BYTE GetData(void){ LIMITED_METHOD_CONTRACT; SUPPORTS_DAC; return dac_cast<PTR_BYTE>(this) + sizeof(Object);}
We can see that although only pointers to type information are defined in object, the runtime is preceded by a pointer to the head and followed by the field contents
Object has an indefinite length in memory, and the starting address is the allocated memory address + the size of a pointer
Object structure is special, so the generation of this object also needs special processing, about the generation of object I will introduce in the following space
The m_pMethTab
additional information is also saved as defined in object because it is a pointer value, so it is always aligned with 4 or 8 so that the last two bits will always be 0
. NET utilizes these two idle bits, respectively, to save GC pinned and GC marking, and I'll cover this in a later space.
Source code parsing of Objheader
Definition of Objheader (abstract)
Source code: Https://github.com/dotnet/coreclr/blob/master/src/vm/syncblk.h
class ObjHeader{// !!! Notice: m_SyncBlockValue *MUST* be the last field in ObjHeader.#ifdef _WIN64 DWORD m_alignpad;#endif // _WIN64 Volatile<DWORD> m_SyncBlockValue; // the Index and the Bits}
m_alignpad
is used to align (let m_SyncBlockValue
in the back 4 bits), the value should be 0
m_SyncBlockValue
The first 6 bits are the tags, and the back 26 bits are the corresponding SyncBlock
indexes in the SyncBlockCache
SyncBlock
is simply used for thread synchronization, for example, the following code will use theSyncBlock
var obj = new object();lock (obj) { }
ObjHeader
Contains only SyncBlock
, so you can see that some of the articles explaining the object structure will be SyncBlock
replacedObjHeader
For a SyncBlock
more specific explanation, you can also view this article
Source code parsing of MethodTable
Definition of methodtable (abstract)
Source code: Https://github.com/dotnet/coreclr/blob/master/src/vm/methodtable.h
Class methodtable{Low WORD is component a size for array and string types (Hascomponentsize () returns True).Used for flags otherwise. DWORD M_dwflags;Base size of instance of this class when allocated on the heap DWORD m_basesize; WORD M_wflags2;Class token if it fits into 16-bits. If this is (WORD)-1, the class token was stored in the Tokenoverflow optional member. WORD M_wtoken;<NICE> in the normal cases we shouldn ' t need a full word for each of these </NICE> word m_wnumvirtuals; WORD m_wnuminterfaces;#ifdef _DEBUG LPCUTF8 Debug_m_szclassname;#endif_debugParent ptr_methodtable If Enum_flag_hasindirectparent is not set. Pointer to indirection cellIf Enum_flag_enum_flag_hasindirectparent is set. The indirection is offset by offsetof (MethodTable, m_pparentmethodtable).It allows casting helpers to go through parent chain natually. Casting helper do not need need the explicit checkFor enum_flag_hasindirectparentmethodtable. Taddr m_pparentmethodtable; Ptr_module M_ploadermodule;Loadermodule. It is equal to the zapmodule in ngened images ptr_methodtablewriteabledata m_pwriteabledata;Union {EEClass * M_PEECLASS; Taddr M_PCANONMT; };M_pperinstinfo and M_pinterfacemap has to is at fixed offsets because of performance sensitive//jited Code and JIT helpers. However, they is frequently not present. The space is used by other //multipurpose slots on first come first served BA SIS if the fixed ones is not present. The other //Multipurpose is Dispatchmapslot, Nonvirtualslots, Moduleoverride (see Enum_flag_multipurposeslotsmask). //the multipurpose slots that does not fit is stored after vtable slots. union {ptr_dictionary * m_pperinstinfo; Taddr m_elementtypehnd; Taddr m_pmultipurposeslot1; }; union {interfaceinfo_t * M_PINTERFACEMAP; Taddr M_pmultipurposeslot2; }; //and then there's a bunch of optional_members, omitted here.
There are a lot of fields here, I will explain in the following section one by one, here first of all the information about MethodTable
- Types of tokens, such as
StaticsMask_Dynamic
and StaticsMask_Generics
so on (m_dwflags)
- If the type is a string or an array also holds the size of each element (Componentsize), for example, string is 2 int[100] is 4
- Type the amount of memory to allocate (m_basesize)
- Type information, such as which members and whether interfaces and so on (M_PCANONMT)
It can be seen that this type is used to save type information, and that reflection and dynamic cast all depend on it.
Actually view the object in memory
The initial analysis of object is over, can you analyze it? Let's actually check what the in-memory object looks like.
VisualStudio has the ability to decompile and view memory, such as
Here I define MyClass
and MyStruct
type, look firstConsole.WriteLine(myClass)
Here we set the first parameter to RCX and call Console.WriteLine
the function, why is RCX please see the reference link in the view fastcall
of the introduction
rbp + 0x50 = 0x1fc8fde110
Jumping into memory can see the selected 8byte is a pointer to the object, let us continue to jump to 0x1fcad88390
Here we can see the true face of the MyClass
instance, the selected 8byte is a MethodTable
pointer to the
The following are pointers to Stringmember and the contents of Intmember, respectively.
The pointer to objheader here is a null pointer, which is normal and Microsoft has comments in the codeThis is often zero
Here is what stringmember points to, respectively, MethodTable
pointers, string lengths, and string contents.
Here it MyClass
is MethodTable
, m_BaseSize
it's 32.
Interested can go and MethodTable
the members of the comparison, here I will not follow up
Let's see how the struct is handled.
You can see that the value is simply copied into the stack space (RBP is the stack base address of the current frame)
Let's take a look at Console.WriteLine
how the struct is handled, and the handling here is pretty interesting.
Because of the need to boxing, first will have to come to a box, the box is placed in therbp+30h
MyStruct
Copy the value in the box, rax+8
8 is the value copied to the MethodTable
following
After the copy, pass the box to the Console.WriteLine
MyClass
same thing.
Also attached is a picture actually viewed ComponentSize
Eggs
Read it. NET in the definition of object, let us look at the Python Squadron object definition
Source code: Https://github.com/python/cpython/blob/master/Include/object.h
#Define Pyobject_head Pyobject ob_base;//each subclass needs to put this at the very beginning typedef struct _object {#ifdef py_trace_refs struct _object *_ob_next; Span class= "CO" >//the previous object in the heap struct _ Object *_ob_prev; //the last object in the heap #endif py_ssize_t ob_refcnt; //reference count struct _ Typeobject *ob_type; //point to type information} pyobject;
The definition is different, but the effect is similar
Reference
Http://stackoverflow.com/questions/20033353/clr-implementation-of-virtual-method-calls-via-pointer-to-base-class
Http://stackoverflow.com/questions/9808982/clr-implementation-of-virtual-method-calls-to-interface-members
Http://stackoverflow.com/questions/1589669/overhead-of-a-net-array
Https://en.wikipedia.org/wiki/X86_calling_conventions
Https://github.com/dotnet/coreclr/blob/master/src/vm/object.inl
Https://github.com/dotnet/coreclr/blob/master/src/vm/object.h
Https://github.com/dotnet/coreclr/blob/master/src/vm/object.cpp
Https://github.com/dotnet/coreclr/blob/master/src/vm/syncblk.h
Https://github.com/dotnet/coreclr/blob/master/src/vm/syncblk.cpp
Https://github.com/dotnet/coreclr/blob/master/src/vm/methodtable.inl
Https://github.com/dotnet/coreclr/blob/master/src/vm/methodtable.h
Https://github.com/dotnet/coreclr/blob/master/src/vm/methodtable.cpp
Https://github.com/dotnet/coreclr/blob/master/src/vm/class.h
Https://github.com/dotnet/coreclr/blob/master/src/inc/daccess.h
Https://github.com/dotnet/coreclr/blob/master/src/debug/daccess/dacfn.cpp
Written in the last
Because it is just beginning to read the code of CORECLR, if wrong, please point out in the message
Next time I will focus on reading and introducing these things
- Generation and destruction of object
- Principle of Object Inheritance (methodtable)
- Principle of Object Synchronization (Objheader, SyncBlock)
- How the GC Works
- Daccess
Please expect
What is Object?