Ms.net CLR Extended PE Structure Analysis 2

Source: Internet
Author: User
Tags constant execution reflection
Flier Lu <flier_lu@sina.com.cn>

Note: This series of articles in the Water Wood Tsinghua BBS (smth.org) of the. NET version of the start,
Reprint please keep the above information, please contact the author

Metadata article

The first chapter Metadata overview

1.1 What is Metadata

Metadata translated into Chinese is "meta data", which can be interpreted as type of type,
To be blunt is to describe the type of data. Rtti supported from the initial level of the language level
("Modern" programming languages have basically provided sufficient support, such as C++,delphi, etc.,
Some of the more "outdated" languages also provide simulation support in different forms, such as extension libraries.
The "modern" programming language provides powerful support, such as Java and c#< nature is clr>,
To later binary-level COM IDL and type libraries (type libraries are compiled in IDL binary form),
To the present metadata, actually is follows the same design idea. Just out of different needs.
Design, implementation, there are advantages and disadvantages of this. But with the development of language, more and more demands are focused on
Flexibility, so the trend of language development is the use of metadata more and more strong support.
For the simplest example, in the IDE, the ability to dynamically display the current object's methods, property name list
(MS called Intellisense,borland called Codeinsight), it is appropriate to type information
Previously in VC implementation, more trouble, to generate a special symbol library; in VB strong point, you can pass
COM idispatch,itypeinfo,itypelib, such as interface dynamic acquisition, but programming trouble to die;
To the CLR, the library level provides direct support and can be fully controlled via System.Reflection
It even supports dynamic creation at a higher level than a COM type library.
For users, you can fully understand the current program interface, which module, which class,
Which method and so on, which gives developers a huge space for creation. such as Dunit (dotnet
Xunit Unit test Platform) on the use of a large number of reflection mechanism, we will talk about the use of the talks.

The role of 1.2 metadata in the CLR

For the CLR architecture, metadata can be regarded as a core operating object, with almost the majority of functions
All need to refer to their data. From the static IL code (binary encoding directly using the token in metadata)
To the dynamic JIT compiler (using metadata to locate the IL code and its relationships); load execution from simple code
(Class loader through metadata positioning code entry, compilation execution) to complex, different language interoperation
(such as VB.net inherits C # 's classes, actually directly inheriting the classes in metadata in the CLR);
Metadata can be seen in almost every place.

Because the main purpose of this article is to introduce the underlying structure, here is no longer verbose metadata benefits,
Anyway, we will see him again in the article, all the advantages of their own slowly experience it:)

1.3 How to access and use Metadata

Made a advertisement, we must be very concerned about how to use metadata, listen to me slowly
The use of metadata in the CLR can be done at three levels.
The easiest way to do this is directly through the System.Reflection namespace provided by the class library.
A number of classes are accessed, such as

Using System.Reflection;
Using System;

public class Simple
{
public static void Main ()
{
Module mod = assembly.getexecutingassembly (). GetModules () [0];
Console.WriteLine ("Module Name is" + mod.) Name);
Console.WriteLine ("Module FullyQualifiedName is" +
MoD. FullyQualifiedName);
Console.WriteLine ("Module ScopeName is" + mod.) ScopeName);
}
}

This access is the easiest to use and powerful enough to accomplish most of our needs,
Especially in the System.Reflection.Emit namespace, it provides support for dynamic generation and modification.
It's so powerful I can't think of any improvement:) Write. NET virus depends on him, HoHo)
However, this approach must be supported by the CLR environment and limited by library functionality (we'll see a lot of
Information not provided at the reflection level:), so MS provides tools software developers with another set of
The lower-level development library, the Metadata unmanaged API. This set of libraries is passed through a series of COM interfaces
Provides a powerful support for direct access to metadata, System.Reflection should be implemented using it.
Interested friends can refer to Frameworksdk\tool developers Guide\Docs
The metadata unmanaged Api.doc document in the directory contains detailed instructions.
As its name shows, it must be used in unmanaged code, such as the traditional vc,delphi.
It can be said that 99% of the work can be done through the above two sets of libraries, but there are always some people like me,
Like to get to the bottom of the technology, want to hide under the beautiful veil of the underlying structure of the exposed one, hehe
Therefore, there is a third level, the binary level of reverse engineering analysis.
The good news is that MS, in order to standardize its CLI (a subset of the CLR), exposes a large number of documents, and finally doesn't ask me to use
SoftICE sort of a sledgehammer, Partition II Metadata.doc documentation on Metadata's
The binary format implementation gives a more detailed explanation, plus Gnome's mono project has done a lot of work
Therefore, it is not so difficult to analyze the binary level of metadata.
In the next article, I will gradually metadata in PE, the organizational structure gradually stripped away,
So that you can understand what this mysterious CLR core is, what it hides, that we can get through
What he did, why he designed it, and so on ...

The organization structure of 1.4 metadata in PE

After saying a nonsense, back to Roman up, talk about metadata in PE in the organizational structure.

Note: In this chapter I will only introduce the general situation of the structure of metadata, and the next chapter shall be devoted to
Detailed explanation for binary mode analysis. If you only want to understand the underlying structure, you can skip
The next chapter. Later articles will also follow this way to organize, talk about some structure, principle, follow
Some practical data analysis methods.

The last time we mentioned the CLR header information, there is a field that points to the metadata data block,
In fact, this block of data is just a metadata header structure that holds metadata information,
and the actual data of the metadata is saved by several different heap or stream.
Here I uniformly use stream "flow" as his name, but many documents are heap "heap" as
Its salutation, we can understand that he is a binary stream in which the data is organized by the structure of the heap.
The most common in metadata are five of streams, #String, #Blob, #Guid,
#US (User String) and #~ stream ("#" is the prefix of the stream name)
A string stream is a string heap in which all strings used within the metadata, such as classes or methods
Names and so on are stored in this heap in UTF8 code. And the user's string is like a string constant,
is stored in the US (User String) heap in Unicode encoding. It is noteworthy that
The US and string streams are different in binary structure, and we will refer to them in detail later in this analysis.
A GUID stream is an array of GUIDs used in the Save program, such as the Mvid of a module in assembly.
A blob stream is a universal storage space, except for GUIDs and strings, and basically all
Everything in the mess is inside, hehe, such as PublicKey, constant value, and so on.
The most important is the #~ flow, which is where the metadata actual information is stored. #~ on the flow structure to
A number of tables (table) in the form of organization, each table to store a certain aspect of the metadata information,
such as the MethodDef table stores information for all methods. Each table is made up of several rows (row)
Each row has n columns (column), and each column represents one type of information, such as each row in the MethodDef table
There is a method of RVA, type flag, name, signature, etc. information. in which to pass
Various indexes are interrelated, and the whole organization structure and relational database are very similar.
More specifically, all of the tables here are represented by a valid 64bit bitmap.
Each type of table has a number, such as the number of the MethodDef table is 6, then the first (1<< (6-1)) position 1
Thus each row of each table can be represented by a single token. This token is a 32bit
The unsigned integer number, the highest byte representing the number of the table, and the low three bytes representing the index number in the table.
If the 0x06000003 represents the 3rd line in the 0x06 table (METHODDEF) (for example, Myapp::add)
This token concept is used frequently in the CLR, such as IL code calling functions, and using variables using token.
Similar to the coded Index, the next time to talk about the implementation of binary system.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.