Il bytecode Analysis

Source: Internet
Author: User
Tags mscorlib

-1-

Code written in C # and VB. NET will eventually be compiled into an assembly or IL. Therefore, the code written in VB. NET can be modified in C # And then used in Cobol. Therefore, it is necessary to understand Il.

Once you are familiar with IL, there will be no obstacle to understanding. NET technology, because all. NET languages will compile into Il. Il is a neutral language. Il was first invented, followed by C #, VB. NET and other languages.

We will show il in a short and incisive program. We also assume that the reader is familiar with at least one. NET language.

A. Il

. Method void Vijay ()
{

}
Then we wrote a very short il program using IL-it obviously cannot work and named it A. il. So how can we compile it into an executable program? There is no need to worry about this. Microsoft provides an ilasm program whose only task is to create an executable file from the Il file.

Before allowing this command, make sure that your variable path is set to the bin subdirectory in the framework. If not, enter the following command:

Set Path = C:/progra ~ 1/microsoft.net/frameworksdk/binyun#path%

Now, we use the following command:

C:/IL> ilasm/nologo/quiet a. Il

This will generate the following error:

Source file is ANSI

Error: No entry point declared for executable

* ***** Failure *****

In the future, we will not display the first and last lines of output generated by ilasm. We will also remove blank rows between non-blank rows.

In Il, we can use periods. as the beginning of a line. This is an instruction that requires the compiler to execute a function, such as creating a function or class. Any statement starting with a period is an actual Russian compiler instruction.

. Method indicates creating a function (or method) named Vijay, and this function returns void, that is, it does not return any value. Because of the lack of good naming rules, the function name Vijay is very casual.

The assembler obviously cannot understand this program, and thus displays the message of "No Entry Point. This error message is generated because the Il file can contain countless functions, and the assembler cannot tell which function will be executed first.

In Il, the first function to be executed is called the entrypoint function. In C #, this function is main. The syntax of the function is that the name is followed by a pair of parentheses (). The start and end of the Function Code are represented by braces.

A. Il

. Method void Vijay ()
{
. Entrypoint
}
C:/IL> ilasm/nologo/quiet a. Il

Source file is ANSI

Creating PE File

Emitting members:

Global methods: 1;

Writing PE File

Operation completed successfully
No errors are generated now. Directive entrypoint indicates that the execution of a program must start with this function. In this example, we have to use this pseudo command, although in fact this program only has one function. After the Dir command is provided at the DOS prompt, we can see that three files will be created. A.exe is an executable file. You can execute it to view the program output.

C:/IL>

Exception occurred: system. badimageformatexception: exception from hresult: 0x8007000b. Failed to load C:/il/a. EXE.

When we try to execute the above program, we seem unlucky because it will generate the above runtime error. One possible reason is that this function is incomplete, and each function should have a "function end" command in the function body. We apparently did not notice this fact in a hurry.

A. Il

. Method void Vijay ()
{
. Entrypoint
RET
}
The function end command is called ret. All the preceding functions must end with this command.

Output

Exception occurred: system. badimageformatexception: exception from hresult: 0x8007000b. Failed to load C:/il/a. EXE.

When we run this program, we get the same error again. Where is our problem this time?

A. Il

. Assembly mukhi {}
. Method void Vijay ()
{
. Entrypoint
RET
}
The error is that we forgot to use an essential pseudo-command Assembly after the name. We synthesize it in the above Code and use the name mukhi after an empty pair of curly braces. This Assembly pseudo command is used to give the program name. It is also called a deployment unit.

The code above is the smallest program that can be compiled without any errors, although it does not do anything useful during execution. It does not have any function named main. It only has a function Vijay with the entrypoint directive. Compile and run this program without any errors.

In. net, the concept of assembly is extremely important and should be thoroughly understood. We will use this directive in the second half of this chapter.

A. Il

. Assembly mukhi {}
. Method void Vijay ()
{
. Entrypoint
RET
}
. Method void vijay1 ()
{
. Entrypoint
RET
}
Error

* ***** Failure *****

The cause of the above error message is that the above program has two functions, Vijay and vijay1. Each function includes the. entrypoint pseudo command. As mentioned above, this command specifies that the function will be executed first.

Therefore, the function is similar to the main function in C. When the C # code is converted to the Il code, the code contained in the main function is converted to the function in the Il and contains the. entrypoint pseudo command. For example, if the first function executed in a COBOL program is called ABC, the code generated in Il will insert the. entrypoint pseudo command in this function.

In conventional programming languages, the function to be executed must have a specific name, such as main. However, in Il, only one. entrypoint pseudo command is required. Therefore, because a program can only start from one point, only one function can include the. entrypoint pseudo command in the Il code.

It is very difficult to debug this error because no error message number or description is generated.

A. Il

. Assembly mukhi {}
. Method void Vijay ()
{
RET
. Entrypoint
}
The. entrypoint pseudo command must be positioned as the first or last command in the function. It only appears in the function body, and thus its state is declared as the first function to be executed. Pseudo commands are not Assembly commands and can even be placed after any RET commands. To remind you, RET indicates the completion of the function code.

A. Il

. Assembly mukhi {}
. Method void Vijay ()
{
. Entrypoint
Call void system. Console: writeline ()
RET
}
We may have a function written in C # and VB. NET, but the mechanism for executing this function in Il is the same. As follows:

We must use the Assembly command for calling. After a command is called, the following details are displayed in the given order:

Function return type (void)
Namespace (system)
Class (console)
Function Name (writeline ())
The function is called but no output is generated. Because we pass a parameter to the writeline function.

A. Il

. Assembly mukhi {}
. Method void Vijay ()
{
. Entrypoint
Call void system. Console: writeline (class system. String)
RET
}
The above code has a "bright spot ". When a function is called in Il, in addition to its return type, the Data Type of the passed parameter must also be specified. We set writeline to -- we want to get a system. string type as the parameter, but since no string is passed to this function, it will generate a runtime error.

Therefore, when calling a function, there is a significant difference between IL and other programming languages. In Il, when we call a function, we must specify anything we know about the function, including its return type and its parameter data type. Through proper check during running, the assembler can verify the code validity in syntax.

Now we will see how to pass parameters to a function.

A. Il

. Assembly mukhi {}
. Method void Vijay ()
{
. Entrypoint
Ldstr "hell"
Call void system. Console: writeline (class system. String)
RET
}
Output

Hell

The assembler command ldstr places the string on the stack. The name of ldstr is the abbreviated version of the text "load a string on the stack. Stack is a memory area used to pass parameters to functions. All functions receive their parameters from the stack. Therefore, commands like ldstr are indispensable.

A. Il

. Assembly mukhi {}
. Method public hidebysig static void Vijay () Il managed
{
. Entrypoint
Ldstr "hell"
Call void system. Console: writeline (class system. String)
RET
}
Output

Hell

We added some features to method Vijay. Next we will explain them one by one.

Public: known as the accessible feature, it determines who can access a method. Public means that this method can be accessed by any other part of the program.

Hidebysig: the class can be derived from multiple other classes. The hidebysig feature ensures that functions in the parent class are hidden in a derived class with the same name or signature. In this example, it ensures that if the function Vijay appears in the base class, it is invisible in the derived class.

Static: The method can be static or non-static. Static Methods belong to a class rather than an instance. Therefore, just as we only have one separate class, we cannot have multiple copies of a static function. There is no constraint on where static functions can be created. Functions with the entrypoint command must be static. Static functions must have associated entities or source code and reference them using type names instead of instance names.

Il managed: due to its complex nature, our interpretation of this feature is delayed. When the machine is mature, its functions will be clearly explained.

The features mentioned above do not modify the output of the function. Later, you will understand why we need to provide explanations of these features.

Whenever we write a program in C #, we first specify the keyword class before the class name, and then we close the source code in a pair of curly brackets. The example is as follows:

A. CS

Class zzz
{

}
Let's introduce the Il command called class:

A. Il

. Assembly mukhi {}
. Class zzz
{
. Method public hidebysig static void Vijay () Il managed
{
. Entrypoint
Ldstr "hell"
Call void system. Console: writeline (class system. String)
RET
}
}
Note that the change in the assembler output is Class 1 methods: 1;

Output

Hell

The name of the class is followed by the pseudocommand. Class. It is optional in Il. Let's enhance the function of this class by adding some class features.

A. Il

. Assembly mukhi {}
. Class private auto ANSI zzz
{
. Method public hidebysig static void Vijay () Il managed
{
. Entrypoint
Ldstr "hell"
Call void system. Console: writeline (class system. String)
RET
}
}
Output

Hell

We added three features to the pseudo commands of the class.

PRIVATE: This indicates that the access to the members of the class is restricted to be only in the current class.
Auto: This indicates that the layout of the class in the memory is determined only by the runtime, not by our program.
ANSI: source code is usually divided into two main categories: managed code and unmanaged code.
Code written in C language is called unmanaged code or Untrusted code. We need a feature to handle the interoperability between unmanaged code and managed code. For example, this feature is used when we want to transfer strings between hosted and unmanaged code.

If we span the boundary of managed code and drill into the field of unmanaged code, then a string-an array consisting of 2-byte Unicode characters, it is converted into an ANSI string, an array consisting of 1-byte ANSI characters, and vice versa. The modifier ANSI is used to eliminate conversions between hosted and unmanaged code.

A. Il

. Assembly mukhi {}
. Class private auto ansi zzz extends system. Object
{
. Method public hidebysig static void Vijay () Il managed
{
. Entrypoint
Ldstr "hell"
Call void system. Console: writeline (class system. String)
RET
}
}
Output

Hell

Class ZZZ is derived from system. object. In. net, to define the consistency of types, all types are ultimately derived from system. object. Therefore, all objects have a common base class object. In Il, classes are derived from other classes, which are the same as C ++, C #, and Java,

A. Il

. Module aa.exe
. Subsystem 3
. Corflags 1

. Assembly extern mscorlib
{
. Originator = (03 68 91 16 D3 A4 AE 33)
. Hash = (52 44 F8 C9 55 1f 54 3f 97 D7 AB ad E2 DF 1D E0
F2 9d 4f BC)
. Ver 1: 0: 2204: 21
}

. Assembly a as ""
{
. Hash Algorithm 0x00008004
. Ver 0: 0: 0: 0
}

. Class private auto ansi zzz extends system. Object
{
. Method public hidebysig static void Vijay () Il managed
{
. Entrypoint
Ldstr "hell"
Call void system. Console: writeline (class system. String)
RET
}

. Method public hidebysig specialname rtspecialname instance void. ctor () Il managed
{
. Maxstack 8
Ldstr "hell1"
Call void system. Console: writeline (class system. String)
Ldarg.0
Call instance void [mscorlib] system. Object:. ctor ()
RET
}
}
Output

Hell

You must know why we have compiled such an ugly program. You need to be patient before the fog is dispelled, and everything begins to make sense. We will explain the newly introduced functions and features one by one.

. Ctor: we introduced a new function. ctor, which calls the writeline function to display hell1, but it is not called .. Ctor involves constructors.

Rtspecialname: This feature tells the runtime that the function name is special and is treated in a special way.

Specialname: This feature prompts compilers and tools-functions are special. You may choose to ignore this feature during runtime.

Instance: A common function is called by an instance function. Such a function is associated with an object. Unlike static methods, the latter is associated with a class.

When appropriate, the reason for selecting a specific name for the function becomes clearer.

Ldarg.0: This is a assembler command that loads the address of this pointer or 0th parameters to the execution stack. Ldarg.0 will be explained in detail later.

Mscorlib: In the above program, function. ctor will be called by the base class system. object. Generally, the function name is prefixed with the name of the library containing the Code. The Library name is placed in square brackets. In this example, it is optional -- Because mscorlib. dll is the default library, and it includes most of the classes required by. net.

. Maxstack: This pseudo command specifies the maximum number of elements that can appear on the computing stack when a method is called.

. Module: All il files must be part of a logical entity or their combination bodies. These entities are called modules ). The file is added to the module that uses the. Module directive. The name of the executable file can be set to aa.exe, but the name of the executable file must be the same as that of a.exe.

. Subsystem: this command is used to specify the operating system on which the executable body runs. This is another way to specify the types of executable bodies. Some numeric values and their corresponding operating systems are as follows:

2-A Windows character subsystem.

3-a Windows GUI subsystem.

5-old systems like OS/2.

. Corsflags: This pseudo command is used to specify a unique identifier for a 64-bit computer. Value 1 indicates that it is an executable file created from Il, and value 64 indicates a library.

. Assembly: Previously, we had a simple instruction named. assembly. Now let's conduct in-depth research.

Whatever we create, it is part of an object called a list (manifest .. The Assembly directive marks the starting position of a list. At a level, a module is the smallest entity in the list .. The Assembly directive specifies the Assembly to which the module belongs. The module can only contain a single. Assembly pseudocommand.

For EXE files, the existence of this pseudo command is required, but for modules in. dll, it is optional. This is because we need to use this pseudo command to create an assembly. This is the basic requirement of. net. Assembly pseudocommands include other pseudocommands.

. Hash: Hash computing is a common technology in the computer world. There are a lot of hash methods or algorithms used here. This pseudo command is used for hash calculation.

. Ver:. Ver: A pseudo command contains four numbers separated by colons. In the order given below, they represent the following information:

Main version No.
Version Number
Internal version number
Revision version number
Extern: if there is a need to involve other assembly, use the extern pseudo command .. . Net core class code is located in mscorlib. dll. In addition to this DLL, when our program needs to involve a large number of other DLL, the extern pseudo command will come in handy.

Originator: This is the last pseudodirective to be studied before it is transferred to explain the nature and meaning of the above program. This pseudocommand reveals the identifier for creating the DLL. It contains 8 bytes of the DLL owner's public key. It is obviously a hash value.

Let's review what we have done in a different way step by step.

(A) We started with the simplest program we can write. This program is called a. CS and includes the following code:
A. CS

Class zzz
{
Public static void main ()
{
System. Console. writeline ("hi ");
}
}
(B) then run the C # compiler using the following command.
> Csc a. CS

For this reason, an EXE file named a.exe will be created.

(C) In the executable body, we run a program named ildasm, which is provided by Microsoft:
> Ildasm/out0000a.txt a.exe

This creates a TXT file with the following content:

A.txt

// Microsoft. NET Framework il discycler. Version 1.0.2204.21
// Copyright Microsoft Corp. 1998-2000

// Vtablefixup directory:
// No data.
. Subsystem 0x00000003
. Corflags 0x00000001
. Assembly extern mscorlib
{
. Originator = (03 68 91 16 D3 A4 AE 33) //. H... 3
. Hash = (52 44 F8 C9 55 1f 54 3f 97 D7 AB ad E2 DF 1D E0
F2 9d 4f BC) // RD... U. T? O.
. Ver 1: 0: 2204: 21
}
. Assembly a as ""
{
. Hash Algorithm 0x00008004
. Ver 0: 0: 0: 0
}
. Module aa.exe
// Mvid: {89cfad60-f5bd-11d4-a55a-96b5c7d61e7b}
. Class private auto ANSI zzz
Extends system. Object
{
. Method public hidebysig static void Vijay () Il managed
{
. Entrypoint
// Code size 11 (0xb)
. Maxstack 8
Il_0000: ldstr "hell"
Il_0005: Call void system. Console: writeline (class system. String)
Il_000a: Ret
} // End of method ZZZ: Vijay

. Method public hidebysig specialname rtspecialname
Instance void. ctor () Il managed
{
// Code size 17 (0x11)
. Maxstack 8
Il_0000: ldstr "hell"
Il_0005: Call void system. Console: writeline (class system. String)
Il_000a: ldarg.0
Il_000b: Call instance void [mscorlib] system. Object:. ctor ()
Il_0010: Ret
} // End of method ZZZ:. ctor

} // End of class zzz

***********************

When we read the above document, you will understand that all its content has been explained previously. We started with a simple C # program and compiled it into an executable file. In a normal environment, it will be converted into a machine language or an assembly program of the computer/microprocessor where the program runs. Once an executable is created, we use ildasm to disassemble it. The reverse compilation output is saved to a new file a.txt. This file may be named A. Il, and then we can re-create this executable body by running ilasm on it.

Let's take a look at the smallest VB. NET program. We name it one. VB, and its source code is as follows:

One. VB

Public module modmain
Sub main ()
System. Console. writeline ("hell ")
End sub
End Module

After writing the above Code, we run the visual. Net compiler vbc as follows:

> Vbc one. VB

The one.exe file is stored.

Run ildasm as follows:

> Ildasm/out0000a.txt one.exe

The following file a.txt is generated:

A.txt

// Microsoft. NET Framework il discycler. Version 1.0.2204.21
// Copyright Microsoft Corp. 1998-2000

// Vtablefixup directory:
// No data.
. Subsystem 0x00000003
. Corflags 0x00000001
. Assembly extern mscorlib
{
. Originator = (03 68 91 16 D3 A4 AE 33) //. H... 3
. Hash = (52 44 F8 C9 55 1f 54 3f 97 D7 AB ad E2 DF 1D E0
F2 9d 4f BC) // RD... U. T ?. O.
. Ver 1: 0: 2204: 21
}
. Assembly extern Microsoft. VisualBasic
{
. Originator = (03 68 91 16 D3 A4 AE 33) //. H... 3
. Hash = (5B 42 1f D2 5E 1A 42 83 F5 90 B2 29 9f 35 A1 be
E5 5E 0d E4) // [B... ^. B.). 5.
. Ver 1: 0: 0: 0
}
. Assembly one as "one"
{
. Hash Algorithm 0x00008004
. Ver 1: 0: 0: 0
}
. Module one.exe
// Mvid: {1ed19820-f5c2-11d4-a55a-96b5c7d61e7b}
. Class public auto ANSI modmain
Extends [mscorlib] system. Object
{
. Custom instance void [microsoft. VisualBasic] Microsoft. VisualBasic. globals/globals $ standardmoduleattribute:. ctor () = (01 00 00)
. Method public static void main () Il managed
{
// Code size 11 (0xb)
. Maxstack 1
. Locals Init (class system. object [] V_0)
Il_0000: ldstr "hell"
Il_0005: Call void [mscorlib] system. Console: writeline (class system. String)
Il_000a: Ret
} // End of method modmain: Main

} // End of class modmain

. Class private auto ANSI _ vbproject
Extends [mscorlib] system. Object
{
. Custom instance void [microsoft. VisualBasic] Microsoft. VisualBasic. globals/globals $ standardmoduleattribute:. ctor () = (01 00 00)
. Method public static void _ main (class system. String [] _ s) Il managed
{
. Entrypoint
// Code size 6 (0x6)
. Maxstack 8
Il_0000: Call void modmain: Main ()
Il_0005: Ret
} // End of method _ vbproject: _ main
} // End of class _ vbproject
***********************
You will be surprised to see that the output generated by two different compilers is almost the same. I showed you this example to prove the language's independence. In the end, the source code will be converted to the Il code. Whether we use VB. NET or C #, the same writeline function will be called.

Therefore, the difference between programming languages is now a superficial problem. The endless debate that language is optimal is meaningless. Thus, il allows programmers to freely use the language they choose.

Let's unveil the secrets of the code above.

Every VB. Net program needs to be included in a module. We call it modmain. All modules in Visual Basic end with the keyword, so we can see the end module. This is where VB syntax is not different from C # -- C # doesn't understand what the module is.

In VB. NET, a function is called a subroutine. We need a subroutine to mark the starting position of program execution. This subroutine is called Main.

The VB. NET code is not only associated with mscorlib. dll, but also the Microsoft. VisualBasic file.

In Il, a class named _ vbproject is created, because the class name in VB is not required.

A function called _ main is the beginning of a subfunction because it has an entrypoint pseudo command. The name is preceded by an underscore. These names are selected by the VB compiler to generate the Il code.

This function will pass a string array as a parameter. It has the concept of a custom pseudo command to process metadata.

Next, we have a complete prototype of this function, ending with a series of optional bytes. These bytes are part of the metadata specification.

The module modmain is converted into a class with the same name. Like before, this class also has the same pseudo command. m and a main function. This function uses a pseudo command named. Locals to create a variable on the stack that can only be used in this method. This variable only exists during method execution. When the method stops running, it will "disappear ".

Fields are also stored in the memory, but it takes a longer time to allocate memory for them. The keyword init indicates that these variables should be initialized as their default values during creation. The default value depends on the type of the variable. The value is always initialized to zero. The keyword init is the data type and name of these variables.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.