[CLR via C #] CSC compiles the source code into a hosting module,

Last Update:2016-12-06 Source: Internet

Author: User

Tags types of tables intel pentium

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

[CLR via C #] CSC compiles the source code into a hosting module,

Demonstrate the process of compiling source code files ., Source code files can be created in any language that supports CLR. Then, use a corresponding compiler to check the syntax and analyze the source code. No matter which compiler is used, the result is a managed module ). The hosted module is a standard 32-bit Microsoft Windows portable execution body (PE32) file 6, or a standard 64-bit Windows portable execution body (PE32 +) file, all of them require CLR to be executed. By the way, managed assemblies always use Windows Data Execution Protection (DEP) and Address space layout Randomization (Address SpaceLayout Randomization, ASLR ); these two functions are designed to enhance the security of the entire system.

Components of the hosting module

　　 PE32 or PE32 + header:The standard Windows PE File header is similar to the "Common Object File Format (COFF)" header. If this header uses the PE32 format, the file can be run on Windows 32-bit or 64-bit versions. If this header uses the PE32 + format, the file can only run in the 64-bit Windows version. This header also identifies the file type, including GUI, CUI, or DLL, and contains a time mark to indicate the file generation time. For modules that only contain IL code, most of the information in the PE32 (+) header is ignored. For a module that contains the local CPU code, this header contains information related to the local CPU code.

　　CLR header:Contains information that makes this module A managed module (which can be explained by CLR and some utilities ). The header contains the required CLR version, some flags, and MethodDef metadata token of the managed module entry method (Main method, the module metadata, resources, strong names, and positions/sizes of some flags and other less important data items

　　Metadata:Each managed module contains metadata tables. There are two types of tables: one is to describe the types and Members defined in source code, and the other is to describe the types and members referenced by source code.

　　IL (intermediate language) code:The code generated when the compiler compiles the source code. During running, CLR compiles the IL to the local CPU command cost.

Native code compilers generates code for specific CPU architectures (such as x86, x64, or IA64. On the contrary, each CLR-oriented compiler generates IL (intermediate language) code. IL code is sometimes called managed code because CLR needs to manage its execution.

In addition to generating IL, each CLR-oriented compiler also generates complete metadata in each managed module. In short, metadata is a group of data tables. Some data tables describe the content defined in the module, such as the type and its members. Some metadata tables describe the reference content of the hosting module, such as the import type and its members. Metadata is a superset of some old technologies. These old technologies include the "Type Library" and "Interface Definition Language (IDL)" files of COM. Note that CLR metadata is far more complete than they are. In addition, unlike the Type Library and IDL, metadata is always associated with files containing IL code. In fact, metadata is always embedded in the same EXE/DLL file as the code, which makes the two inseparable. Since the compiler generates metadata and code at the same time, binds them together and embeds them into the final managed module, the metadata and the IL code it describes will never be lost. Metadata can be used for multiple purposes. Only one part is listed below.

* During compilation, metadata eliminates the need for local C/C ++ headers and library files, because in the IL code file responsible for implementing types/members, it contains all information related to the referenced type/member. The compiler can read metadata directly from the managed module.

* Microsoft Visual Studio uses metadata to help you write code. Its "intelligent perception" technology can parse metadata to identify the methods, attributes, events, and fields that a type provides. For a method, you can also specify the parameters required by the method.

* The code verification process of CLR uses metadata to ensure that the Code only performs "type security" operations. (Verification will be discussed later .).

* Metadata allows you to serialize an object field to a memory block, send it to another machine, deserialize it, and recreate the object state on a remote machine.

* Metadata allows the Garbage Collector to track the lifetime of an object. The garbage collector can determine the type of any object and know which fields in the object reference other objects from the metadata.

Merge managed modules into an assembly

The CLR does not actually work with the module. Instead, it works with the Assembly. Assembly is an abstract concept. It is difficult for beginners to grasp its essence. First, an assembly is a logical grouping of one or more modules/resource files. Second, an assembly is the minimum unit for reuse, security, and version control. Depending on your choice of the compiler or tool, you can generate a single file assembly,
You can also generate multi-file assembly. In the CLR world, an assembly is equivalent to a "component ".

Helps you understand the assembly. In this figure, some managed modules and resource (or data) files are ready to be processed by one tool. This tool generates a separate PE32 (+) file to indicate the logical grouping of the file. Actually, this PE32 (+) file contains a data block named "manifest. A list is another set of metadata tables. These tables describe the files that constitute the assembly, the publicly exported type 7 Implemented by the files in the Assembly, and the resources or data files associated with the Assembly.

By default, the compiler converts the generated managed modules into an assembly. In other words, the C # compiler generates a managed module containing a list. The list indicates that the Assembly is composed of only one file.

Loading public Language Runtime

Each assembly you generate can be either an executable application or a DLL (which contains a group of types used by executable programs ). Of course, the CLR manages the code execution in these sets. This means that. NETFramework must be installed on the target machine.

C # The Assembly generated by the compiler either contains a PE32 header or a PE32 + header. In addition, the compiler also specifies the required CPU architecture in the header (if the default value is anycpu, it is not explicitly specified ). Microsoft released SDK command line utilities DumpBin.exe and CorFlags.exe to check the embedded information of the hosted modules generated by the compiler.

Code for executing the Assembly

To execute a method, you must first convert its IL to the local CPU command. This is the responsibility of the clr jit (just-in-time or "instant") compiler. Shows the events that occur when a method is called for the first time.

Just before the Main method is executed, CLR will detect all types referenced by the Main code. This causes the CLR to allocate an internal data structure, which is used to manage access to the referenced type. In the figure, the Main method references a Console type, which causes the CLR to allocate an internal structure. In this internal data structure, each method defined by the Console type has a corresponding record item 10. Each record item contains an address. You can find the implementation method based on this address. During the initialization of this structure, CLR sets each record item to an undocumented function included in the CLR. I call this function JITCompiler.

When the JITCompiler function is called, it knows the method to be called and the type of the method defined. Then, JITCompiler searches for the IL of the called method in the metadata of the defined (this type) assembly. Then, JITCompiler verifies the IL code and compiles the IL code into local CPU commands. The local CPU command is saved to a dynamically allocated memory block. Then, JITCompiler returns the internal data structure created by CLR for the type, finds the record corresponding to the called method, and modifies the original reference to JITCompiler, let it point to the address of the memory block (including the locally compiled CPU command. Finally, the JITCompiler function jumps to the Code in the memory block. These codes are exactly the specific implementation of the WriteLine method (the version that gets a single String parameter. When the code is executed and returned, it is returned to the Code in Main and continues to be executed as usual. Now, Main needs to call WriteLine for the second time. This time, because the WriteLine code has been verified and compiled
Directly execute the code in the memory block and completely skip the JITCompiler function. After the WriteLine method is executed, Main is returned again.
The second call to WriteLine occurs.

A method may cause some performance loss only when it is called for the first time. All future calls to this method will run at full speed in the form of local code, without re-Verifying the IL and compiling the local code. The JIT compiler stores local CPU commands in dynamic memory. Once the application is terminated, the compiled code is also discarded. Therefore, if you run the application again in the future, or start two instances of the application at the same time (using two different operating system processes), the JIT compiler must re-compile the IL compilation cost command. For most applications, the performance loss caused by JIT compilation is not significant. Most applications call the same method repeatedly.
During application running, these methods only affect the performance at one time. In addition, the time spent inside the method is likely to be much longer than the time spent on calling the method. It should also be noted that the clr jit compiler will optimize the local code, which is similar to the work done by the backend of the unmanaged C ++ compiler. Similarly, it may take a lot of time to generate optimized code. However, compared with the absence of optimization, the code will achieve better performance after optimization.

There are two C # compiler switches that will affect code optimization:/optimize and/debug. The following describes how these switches generate C # compilers.
The quality of the IL code, as well as the impact on the quality of the local code generated by the JIT compiler.

Although this is hard to convince, many (including me) think that the performance of hosted applications is actually higher than that of unmanaged applications. We are convinced of this for many reasons. For example, when the JIT compiler compiles the IL code at runtime, the compiler has a deeper understanding of the execution environment than the unmanaged compiler. The following lists the advantages of hosted code over unmanaged code:

1. the JIT compiler can determine whether an application runs on an Intel Pentium 4 CPU and generate local code to take advantage of any special commands supported by Pentium 4. On the contrary, unmanaged applications are generally compiled for CPUs with a minimal set of functions, and do not use special commands that can improve application performance.

2. the JIT compiler can determine whether a specific test always fails on the machine on which it runs. For example, assume that a method contains the following code:
If (numberOfCPUs> 1 ){
...
}
If the host has only one CPU, the JIT compiler does not generate any CPU commands for the above Code. In this case, the local code will be optimized for the host, and the final code will become smaller and run faster.

3. When the application is running, the CLR can evaluate the code execution and re-compile the IL code at the cost. The re-compiled code can be reorganized to reduce incorrect branch prediction based on the observed execution mode. Although the current version of CLR cannot do this, the future version may be enough.

In addition to these reasons, there are other reasons for us to believe that in terms of execution efficiency, future managed code will be better than the current unmanaged code. The performance of most hosted applications is already quite good, and we hope to further improve it in the future.

IL and Verification

IL is stack-based. This means that all its commands need to push the operands to an execution stack and pop the result from the stack. Because IL does not provide instructions for operation registers, it is easy to create new languages and compilers to generate code for CLR.

The IL command is still "non-type" (typeless. For example, IL provides an add command to add the last two operands pushed into the stack. The add command is not 32-bit or 64-bit. When the add command is executed, it determines the type of the operand in the stack and performs appropriate operations.

I personally think that the biggest advantage of IL is not its abstraction of the underlying CPU. The biggest advantage of IL is Application robustness 11 and security. When you compile a local CPU command with IL, the CLR executes a process called verification. This process checks advanced IL Code to ensure that everything the Code does is safe. For example, verification will verify that each method called has a correct number of parameters, each parameter passed to each method has a correct type, and the return value of each method is used correctly, each method has a return statement, and so on. The metadata of the managed module contains information about all methods and types to be used by the verification process.

Local Code Generator: NGen.exe

Using the NGen.exe tool provided by the. NET Framework, You can compile the IL code locally when an application is installed on your computer. Since the code has been compiled during installation, the clr jit compiler does not need to compile the IL code at runtime, which helps improve the application performance. NGen.exe plays an important role in two situations:

1. Speeding up application startup NGen.exe can speed up startup because the code has been compiled into local code and does not need to be compiled during runtime.

2. Reduce the working set of the application 13. If a program assembly is loaded to multiple processes at the same time, running NGen.exe on the Assembly reduces the working set of the application ). NGen.exe compiles the code at a cost and saves the code to a separate file. This file can be mapped to multiple process address spaces through "memory ing", so that the code is shared, so that each process needs to copy the Code separately.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More