Analysis Tool: UtralEdit is used for metadata, Sos debugging expansion is used for memory layout, and vs2005 memory, registers, and Disassembly window information is used for memory layout.
Step: Use UE to open any. net dll or Exe file and analyze static metadata.
Go to the debugging status, and analyze the execution of CLR Based on the SOS and debugger information (MethodTable layout in 2.0 and 1.1 are greatly changed and cannot be understood. Is there any relevant information ?)
The internal implementation of. net statements can be seen through IL and metadata, while the implementation of IL can only be achieved through disassembly Information
Metadata and metadata table
Generally, "meta" is added to the object created to describe other objects. Metadata is the data that describes other data. net, other data refers.. net objects, referenced objects, and their relationships.
Meta-metadata is the data that describes metadata. They describe the composition of metadata. With them, we can locate each record in the metadata table.
The logical structure of metadata is similar to the table in the database, so it is called a metadata table. The metadata table has columns and rows. Each row is unique and has a unique RID (Table Index). The RID is like the primary Creation of a database table. The database has a Schema, such as the type and size of each column, and metadata is similar. They are called "meta-metadata", which contains the record size and column size in the table, offset.
Relatively fixed part of PE (32-bit System)
IMAGE_DOS_HEADER occupies 40 h byte from 00 h --- 39 h. From 0x 3c The last four bytes at the beginning are e_lfanew, which is the file pointer to pe signature. Pe signature is 00 00 45 50, which occupies four bytes. The real-mode residual process is between e_lfanew and pe signature, from 40 h to 79 h. 80-83 is a four-byte pe signature.
Followed by IMAGE_FILE_HEADER (coff header) from 84 h-97 h, accounting for 14 h
Peheader from 98h-178h E0h (224 bytes)
In Pe header: the 96-byte offset (32-bit pe header) starting from 98h is _ IMAGE_DATA_DIRECTORY. A total of 16, each table occupies eight-character section.
Followed by the region header, each of which occupies 28 h (40). There are three in total (the number is defined by Numberofsections with the Coff header): from 178 h- 1a 0 h is. text
1a 0 -1c 8 is rsrc
1c 8 -1f 0 is. rsloc
File pointer and RVA
Before the file is loaded to memory, the offset of the item in the file, Rva, and Va are the relative address (offset) and address after the file is loaded to memory.
RID and Token
The RID is a row index of a metadatabase table and can only be referenced between metadatabase tables. For example, in the TypeDef table, the RID of the first field contained in this type is located in the Filed table.
You can use the column type code (meta-Metadata defined) to determine the table referenced by the column (implicit). Therefore, the RID can also locate the table, but cannot be accessed outside the metadata.
The Token is used to add the RID to the table index. They explicitly determine which table contains the RID, and therefore can be referenced externally. In IL, variables, constants, and so on are referenced through token, and they will be compiled as Token. There are 24 tables with the token type, and the other 20 tables have no token. They cannot be referenced externally and can only be referenced between metadata.
There is a special token type 0x70000000. The RID part of the Token is not the real RID, because they exist in the # US stream, which contains user-defined data, without any metadata information, the metadata will not be referenced to # Us. The RID part is the offset of the user string in # US. It can be used to locate the User-Defined string. For a common Token, you must first locate the RID, and then locate the metadata based on the column type and other information of the metadata table.
Stream and heap
There is no relation between the heap here and the heap in the data structure. It is two concepts. It can be divided into three types: String, GUID, blob
Metadata provides six naming heap types:
# String: The content referenced by the metadata, such as the class name, method name, and variable name.
# US: User-Defined blob heap (not a String Heap), including String constants, which can be directly addressed by the ldstr command. Metadata cannot be referenced directly, but can be referenced by IL and external APIs. For example, string s = "dd"; dd is stored in # us, s is stored in # string
# Blob: binary object referenced by metadata, which cannot contain user-defined objects.
# GUID: Unique Identifier, such as the Mvid of the Modle metadata table.
#~ And #-(only one image file can be included): Metadata data streams, including metadata headers and metadata tables, the most complex heap. It will reference (# String, # blob, # GUID Stream)
Flag and Signature
Flag contains the visibility, Layout (Layout), type semantics, implementation, string formatting, and other flags. With Flag, we can confirm these types of information.
Signature :( blob Stream)
Locate Token
Take the User-Defined string as an example (# US ):
The user string is defined in # US stream. The calculation method is: the first address of the stream (that is, the metadata header address, because the stream header is in the metadata header) + # the offset of the US stream (offset and then) then, the offset of the string (the RID part of the token) is the offset of the string in the # US stream.
Locate the metadata table
First, find the metadata header (metaDataHeader). The method is the same as that of Cor2.0Header, except that Rva is replaced with the Rva of the MetaDataDir table in Cor2.0Header. The metadata header consists of STORAGESIGNATURE, STORAGEHEADE, and stream header. It is followed by STORAGESTREAM, which contains # String, # Blob, # Guid, # US (User String) and #~ Stream
Metadata tables are stored in #~ In the stream, locate the metadata header, and add the offset and metadata to locate the table. For details, see locate Cor2.0Header.
Section Header)
There are three regional headers, each of which occupies 28 h (40) (the number of headers is defined by Numberofsections with Coff headers): from 178 h- 1a 0 h is. text
1a 0 -1c 8 is rsrc
1c 8 -1f 0 is. rsloc
Locate PE Header (data directory table)
The data directory table is located in the PE Header. The 96-byte offset (32-bit pe Header) starting from 98h is _ IMAGE_DATA_DIRECTORY. A total of 16, each table occupies eight-character section.
typedef struct _IMAGE_DATA_DIRECTORY {
DWORD VirtualAddress;
DWORD Size;
} IMAGE_DATA_DIRECTORY, *PIMAGE_DATA_DIRECTORY;
In the preceding three sections, the data pointed to by the data directory table can be determined based on the VirtualAddress (rva) of _ IMAGE_DATA_DIRECTORY, which falls into the Section: VirtualAddress <rva <VirtualAddress + SizeOfRawData.
The top 15th CLI headers are most closely related to the Cli in the sixteen tables. The preceding method can be used to determine the region in which the CLI is located.
Locate Cor2.0Header
If you know the Section, you can locate the Cor2.0Header. According to the calculation, the Section is. text, and the address will be different each time it is compiled. Based on the CliHeader Rva and the virtualAddress and PointerToRawData of this section, the address of COR20Header can be calculated: (rva-virtualAddress) + PointerToRawData
The Cor header contains seven tables, and the last one is a reserved table (2.0 only found the first six, but 2.0 still does not use this table). What is important is the MetaDataDir table, the address can be calculated in the same way.