Deep understanding of CLR class loading mechanism

Last Update:2018-12-07 Source: Internet

Author: User

Tags configuration settings

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1 CLR Loader

CLR loaders are responsible for loading and initializingProgramSet, module, resource, and type. CLR loaders load as few resources as possible. Unlike Win32 loaders, CLR loaders do not parse and automatically load sub-modules or assembly. On the contrary, submodules are loaded only when they are actually needed. This not only shortens program initialization time, but also reduces the resources consumed by running programs.

In CLR, the load is generally based on the type and triggered by JIT. When the JIT compiler tries to compile a method from a common intermediate language into a machine code, it needs to use the declared type definition and field definition of the type. In addition, the JIT compiler also needs to use the type definitions used by local variables or parameters of any methods being compiled by JIT. Loading a type means loading a set of programs and modules that contain the type definition.

Load the type policy on demand, which means the unused parts of the programCodeWill never be loaded into the memory. It also means that a running application will often search for new loaded programs and modules, these Assembly sets and modules contain the files of the desired type over time during execution. If this is not a required feature, you have two options. One choice is to simply declare hidden fields of those types, which you want to ensure that a piece of loading is required when your type is loaded. Another option is to use the loader explicitly.

The loader typically performs the function implicitly based on your behavior. Developers can use the Assembly loader explicitly. The Assembly loader is exposed to developers through the loadfrom static method in the system. reflection. Assembly class. This method accepts a codebase string, which can be a file system path or a URL that identifies the module contained in the assembly list. If the specified file does not exist, the loader will throw a system. filenotfoundexception. If the specified file exists but is not a CLR module included in the assembly list, the loader will throw a system. badimageformatexception. Finally, if codebase is a URL that uses a non-"file:" scheme, the caller must have the permission to access webpermission. Otherwise, a system. securityexception exception will be thrown. In addition, the Assembly that uses URLs instead of the "file:" protocol will be loaded to the local download cache before loading.

Table 2.2 shows a simple C # program. This program starts from file: // C:/user/bin/xy.pdf. DLL loads an assembly and creates an acmecorp. lob. an instance of the customer type. The caller only provides the physical path of the Assembly. When a program uses the Assembly loader in this way, the CLR ignores the name of the Assembly consisting of four parts, including the version number.

Table 2.2 uses an explicit codebase to load an assembly

 Using system; using system. reflection; public class utilities {public static object loadcustomertype () {Assembly A = assembly. loadfrom ("file: // C:/usr/bin/xy.pdf. DLL "); return. createinstance ("acmecorp. lob. customer ");}}

Although using a path to load an assembly is a bit interesting, most of the Assembly uses an assembly parser to load it with a name. The Assembly parser uses a four-part name to determine which file is loaded to the memory by the Assembly loader. As shown in 2.9, this name-to-path parsing takes into account a series of factors, including the host Application Path, Version Policy, and other detailed configurations.

Figure 2.9 Assembly parsing and Loading

The Assembly parser is exposed to developers through the static method load of the system. reflection. Assembly class. As shown in table 2.3, this method accepts a name composed of four parts (either a string or an assemblyname reference), and it is similar to the loadfrom method on the surface, they are all exposed by the Assembly loader. In fact, the similarities between the two are superficial, because the load method will first use the Assembly parser to look for a suitable file using a series of rather complex operations. The first of these operations is to use a version policy to precisely determine the version of the assembly to be loaded.

Table 2.3 uses the Assembly parser to load an assembly

 Using system; using system. reflection; public class utilities {public static object loadcustomertype () {Assembly A = assembly. load ("xyture, version = 1.2.3.4," + "Culture = neutral, publickeytoken = 9a33f2763167fcc"); return. createinstance ("acmecorp. lob. customer ");}}

The Assembly parser is parsed by applying any valid version policy. The version policy is used to allow the Assembly parser to redirect the requested assembly to another version. A version policy can map one or more versions of a given assembly to another. However, a version policy cannot redirect a parser to an assembly with different names. It is important to note that the version policy is only used for the assembly that is completely specified by four parts. If the Assembly name only specifies a part (for example, the public key, version, or culture is lost), the version policy is not applied. At the same time, if assembly. loadfrom is called directly to bypass the Assembly parser, the version policy will not be applied because you only specify a physical path instead of an assembly name.

The version policy is specified through the configuration file. This includes a machine configuration file and an application-related configuration file. The name of the configuration file on the machine side is always machine. config, which is located in the % SystemRoot % \ Microsoft. NET \ framework \ v1.0.nnnn \ config folder. Configuration files related to application sets are always in the appbase folder of the program. For CLR-based. EXE programs, appbase is the URI of the path of the loaded main execution program. For ASP. NET references, appbase is the heel path of the Web application's virtual path. The name of the configuration file of the CLR-based. exe program is always ". config" suffix added to the name of the executable file. For example, if the running CLR program is c: \ MyApp \ app.exe, the corresponding configuration file will be c: \ MyApp \ app.exe. config. For ASP. NET applications, the configuration file is always web. config.

The configuration file is in XML format and always has a configuration root node. The configuration file is used by the Assembly parser, remote call infrastructure, and ASP. NET. Figure 2.10 shows the basic structure of the node used to configure the Assembly parser. All related nodes are on the assemblybinding Node Based on the urn: Schemas-Microsoft-com: ASM. V1 namespace. It also controls the Probe Path and release vendor version Mode settings. In addition, the dependentassembly node is used to specify the version and location of each dependent assembly.

Figure 2.10 Assembly parser configuration Node

Table 2.4 shows a simple configuration file that contains two version policies for an assembly. The first policy redirects the Assembly acme. Healthcare of version 1.2.3.4 to 1.3.0.0. The second policy redirects 1.0.0.0 to 1.2.3.399 to 1.2.3.7.

Table 2.4 sets version policies

 <? XML version = "1.0"?> <Configuration xmlns: ASM = "urn: Schemas-Microsoft-com: ASM. V1"> <runtime> <ASM: assemblybinding> <! -- One dependentassembly per unique assembly name --> <ASM: dependentassembly> <ASM: assemblyidentity name = "Acme. Healthcare" publickeytoken = "38218fe715288aac"/> <! -- One bindingredirect per redirection --> <ASM: bindingredirect oldversion = "1.2.3.4" newversion = "1.3.0.0"/> <ASM: bindingredirect oldversion = "1-1.2.3.399" newversion = "1.2.3.7"/> </ASM: dependentassembly> </ASM: assemblybinding> </runtime> </configuration>

The version policy can be specified at three levels: each application, each build, and each machine. Each basic has the opportunity to process version numbers. It uses a level of results as adjacent basic input for processing. 2.11. Note that if both the application and the machine configuration file have a version policy for the specified assembly, the application policy will be executed first, the generated version number is executed in the program-side policy, and the actual version number is generated for locating the assembly. In this example, if the machine configuration file is redirected to Acme. healthcare versions 1.3.0.0 to 2.0.0.0. When you request version 1.2.3.4, the Assembly parser uses version 2.0.0.0 because the Version Policy of the application maps version 1.2.3.4 to version 1.3.0.0.

Figure 2.11 Version Policy

In addition to application-related and machine-side configuration settings, a given Assembly also has a publisher policy. A publisher policy is a description used by the component developer to specify which version of the component is compatible with another version.

The publisher policy is used as the Global Assembly Cache where the configuration file is stored on the machine. The structure of these files is exactly the same as that of the application and machine configuration files. However, for installation on the machine used, the publisher policy configuration file must be packaged as a custom resource into an assembly DLL. Assume that the foo.configfile contains the Configuration Policy of the publisher. Run the following command to connect the application set to machine al.exe and create a suitable publisher policy assembly for acmecorp. Code 2.0.

Al.exe/link: Foo. config

/Out: policy.2.0.acmecorp. Code. dll

/Keyf: pubpriv. SNK

/V: 2.0.0.0

The publisher policy File follows the policy. Major. Minor. assmname. dll format. Because of this naming convention, an assembly of any given major. Minor version can only have one publisher policy file. In this example, all requests to acmecorp. Code of the main version 2.0 are routed to policy.2.0.acmecorp. Code. dll through the policy file. If the Assembly does not exist in GAC, there is no publisher policy. As shown in 2.11, the publisher policy is used after the application-related version policy, but earlier than the machine-side version policy.

Considering the inherent vulnerabilities of versionized components, CLR allows developers to disable the publisher version policy based on application configurations. To achieve this, developers must use the publisherpolicy node of the configuration file. Table 2.5 shows the nodes in the simple configuration file. When this node has the "apply =" no "attribute, the publisher policy of the application will be ignored. When this attribute is set to apply = "yes" or is not specified at all, the publisher policy will be used as described. As shown in Figure 2.10, The publisherpolicy node can start or disable the publisher policy on the application side or an assembly-based assembly.

Table 2.5 sets the application to Security Mode

 <? XML version = "1.0"?> <Configuration xmlns: rT = "urn: Schemas-Microsoft-com: ASM. v1 "> <runtime> <RT: assemblybinding> <RT: publisherpolicy apply =" no "/> </RT: assemblybinding> </runtime> </configuration>

2. Resolve the name to a location

After the Assembly parser determines which version of the Assembly is loaded, it must locate a suitable file to pass to the underlying Assembly loader. CLR first searches for the folder specified by the devpath operating system environment variable. This environment variable is generally not set on the development machine. Instead, it is used only for programmers and is used to allow delayed signature assembly to be loaded from the shared file directory. In addition, the devpath environment variable is considered only when the following xml configuration file node has machine. config.

 <Configuration> <runtime> <developmentmode implements installation = "true"/> </runtime> </configuration>

Because the devpath environment variable is not used for deployment, the following section ignores its existence.

Figure 2.12 shows the entire process of the Assembly parser to find the appropriate Assembly file. In normal deployment scenarios, the first position of an assembly is GAC. GAC is a machine-side code cache that contains the installed Assembly used by the machine. GAC allows administrators to install all applications on each machine once. To avoid system crashes, GAC only accepts those assembly with valid signatures and public keys. In addition, GAC projects can only be deleted by administrators, which prevents non-administrator users from deleting and moving key system-level components.

Figure 2.12 Assembly Parsing

To avoid ambiguity, the Assembly parser queries GAC only when the requested Assembly contains the public key. This prevents requests with common names such as utilities from being satisfied by incorrect implementations. The public key can be provided explicitly as an assembly reference, or the Assembly. Load parameter, or implicitly provided through the configuration file qualifyassembly configuration node.

GAC is controlled by system-level components (fusion. dll) and it provides caching in the % winnt % \ Assembly folder. Fusion. dll manages the level of this directory and provides a public access to the storage file based on a name consisting of four parts, such as Table 2.4. Although we can traverse the hidden folders, the structure of fusion used to cache DLL is to ensure the implementation of changes as CLR evolves. In contrast, you can use the gacutil.exe tool or some other fusion API-based tools to interact with GAC. One such tool is shfusion. dll, a window browser shell extension, which provides a friendly interface for interaction with GAC.

Table 2.4 Global Assembly Cache

Name	Version	Culture	Public Key token	Mangled path
Yourcode	1.0.1.3	De	89abcde...	T3s \ E4 \ yourcode. dll
Yourcode	1.0.1.3	En	89abcde...	A1x \ BB \ yourcode. dll
Yourcode	1.0.1.8	En	89abcde...	VV \ A0 \ yourcode. dll
Libzero	1.1.0.0	En	89abcde...	IG \ U \ libzero. dll

If the Assembly parser cannot find the requested assembly in GAC, the Assembly parser attempts to use a codebase command to access the assembly. A codebase command maps an assembly name to a file name or a URL that specifies the location of the module contained in the Assembly. Similar to version policies, codebase commands are included in application and machine configuration files. Table 2.6 shows the configuration files of two codebase commands. The first command maps the acme. Healthcare assembly of version 1.2.3.4 to C: \ acmestuff \ acme. Healthcare. dll. The second directive maps the assembly version 1.3.0.0 to http://www.acme.com/bin/acme.healthcare.dll.

Assume that a codebase command is provided, the Assembly parser simply loads the corresponding Assembly file, and the Assembly loading process is like an assembly using an explicit codebase using assembly. load from is the same. However, if the codebase command is not provided, the Assembly parser must start a potentially expensive process to find an assembly that matches the request.

 <? XML version = "1.0"?> <Configuration xmlns: ASM = "urn: Schemas-Microsoft-com: ASM. V1"> <runtime> <ASM: assemblybinding> <! -- One dependentassembly per unique assembly name --> <ASM: dependentassembly> <ASM: assemblyidentity name = "Acme. Healthcare" publickeytoken = "38218fe715288aac"/> <! -- One codebase per version --> <ASM: codebase version = "1.2.3.4" href = "file: // C:/acmestuff/Acme. healthcare. DLL "/> <ASM: codebase version =" 1.3.0.0 "href =" http://www.acme.com/Acme.HealthCare.DLL "/> </ASM: dependentassembly> </ASM: assemblybinding> </runtime> </configuration>

If the Assembly parser cannot use the GAC or codebase command to search for an assembly, it performs a search through a sequential path relative to the application root path. This search is called a probe. The test only searches in the appbase directory or its sub-directories (the appbase directory contains the application configuration file ). For example, given a 2.13 directory structure, only M, common, shared, and q are eligible to be tested. It means that the Assembly parser only detects directories explicitly specified in the configuration file. Table 2.7 shows a configuration file example, which sets the relative directories shared and common. All appbase subdirectories that are not configured in the configuration file will be excluded from the probe process.

Figure 2.13 appbase and relative search path

Table 2.7 Sets Relative search paths

 <? XML version = "1.0"?> <Configuration xmlns: ASM = "urn: Schemas-Microsoft-com: ASM. v1 "> <runtime> <ASM: assemblybinding> <ASM: Probing privatepath =" shared; common "/> </ASM: assemblybinding> </runtime> </configuration>

When detecting an assembly, the Assembly parser builds the codebase URLs based on the simple name of the Assembly, according to the relative search path and the culture of the requested assembly. Figure 2.14 demonstrates an example of parsing a codebase URLs referenced by a culture assembly that is not specified. In this example, the simple name of the Assembly is yourcode and the relative search path is the shared and common directories. The Assembly parser first searches for the yourcode. dll file in the appbase directory. Without this file, the Assembly parser then assumes that the Assembly is in a directory with the same name and searches for files with the same name in the yourcode folder. If the file is not found, the probe process repeats in each project in the relative path until the yourcode. dll file is found. If the file is found, the detection stops. If no, the detection process will continue again and again. This time, you will find the yourcode.exe file in the same region. Assuming a file is found, the Assembly parser verifies that the file matches all the attributes of the Assembly referenced by the specified Assembly name, and then loads the assembly. If one of the attributes of the Assembly name does not match all the Assembly reference attributes, assembly. Load fails to be called. Otherwise, the Assembly is loaded and used.

Figure 2.14 culture-neutral Detection

If the Assembly reference contains a cultural identifier, the probe will be slightly more complex. 2.15, the previousAlgorithmIt will be expanded by searching subdirectories that match the requested culture. In general, the application should be the search path as small as possible to avoid excessive loading delay.

Figure 2.15 dependency culture Detection

Version 3 hazards

The previous section describes how the Assembly parser determines which version of the assembly to load is mainly used by the CLR. What is not discussed is what policies should a developer use to determine when, how, and why the assembly is versioned. Considering that the platform for this writing description is not available, it is difficult to describe the effective best practices obtained based on rare experiences. However, it is reasonable to understand CLR knowledge and infer a sequence of guidance.

It is important to note that the Assembly is a version-based unit. Unexpected problems may occur if you try to change the Assembly file without changing the version number. To this end, the remaining part of this section will look at versionization. versionization only considers the Assembly as a whole rather than every file of the Assembly.

Changing the version number is an interesting question. Obviously, if a type public contract is changed, the type Assembly must change a new version number. Otherwise, it depends on a version of the type signature program. When a different signature type is loaded, a runtime scenario is generated. This means that if you add a public or protected member, you must change the version of the assembly of this type. If you change the public type to a public or protected member (such as adding a method parameter or changing the field type), you also need a new assembly version. This is an absolute principle. Violating these principles will lead to unpredictable consequences.

The more difficult question to answer is that it does not affect the modification of Public signatures of the assembly type. For example, changing a member marked as private or internal is considered as a change without damaging behavior only when the member is concerned about signature matching. Because no Code except your Assembly can depend on private or internal members, signature mismatch is not a problem at runtime because it will not happen. However, Type mismatch is only the tip of the iceberg.

Changing the version number during the construction of each Assembly is a reasonable reason, even if no visible signature is publicly changed. The fact is that this approach is supported, that is, even a change that seems harmless to a method may have an indecisive but ripple effect on programs using the assembly. If a developer uses a unique version number for each build of an assembly, the Code tested with a specified build version will not be abnormal during deployment.

The argument that each build of an assembly has a unique version number is that programs that are not recompiled for the new version of the Assembly won't have a "secure" fix. This argument is not reasonable if you do not consider the publisher policy document. Developers who use a unique version number for each compilation are good at providing publishers policy files that describe which versions of the Assembly are backward compatible. By default, this automatically updates earlier versions to the new version of the Assembly. When an assembly developer thinks it is an error, each application can use the publisherpolicy node in the configuration file to disable automatic upgrade, so that the application is in security mode in general.

As discussed earlier, the CLR Assembly parser supports parallel installation of multiple versions of an assembly through codebase commands, private probe paths, and GAC. This allows multiple versions of an assembly to coexist in the file system. However, if more than one version of these assembly is loaded into memory by some independent programs or a single program at any time, things will become a little unpredictable. Parallel Execution is more difficult to process than parallel installation.

There are multiple versions in the memory. The main problem is that for runtime, the types contained in those assembly are completely different. That is to say, if an Assembly contains a type named customer, when two versions of a program are loaded, there are two different types in the memory, each of which has its own unique identifier. This has some serious side effects. Among them, each type has any copy of static fields. If a shared state type needs to be tracked and the types of multiple versions already loaded are independent of each other, it obviously cannot use a static field solution. On the contrary, developers must always remember the version to rewrite the code and store the status in a location unrelated to the version. One way is to store the shared state to the location provided at runtime, such as the ASP. NET application object. Another way is to define a separate class that only contains a static field in the shared state. Developers can deploy this type to a separate assembly, which has nothing to do with the version, so that only one static field copy is available for an application.

When the versionized type is passed as a method parameter, another issue related to parallel execution will arise. If the caller of the method and the caller have different opinions on which assembly to load, the caller will pass a parameter of the type that the caller does not know. Developers can solve this problem by defining non-versionless type classes for all public methods. More importantly, these shared types must be deployed to separate assemblies without being versionized.

Appendix: The Assembly metadata has three different attributes to allow developers to specify whether multiple versions of the Assembly can be loaded at the same time. If these attributes do not exist, the Assembly is assumed to be able to be executed in parallel in all scenarios (Multi-version parallel execution ). The nonsidebysideappdomain attribute specifies that only one version of the Assembly can be loaded for each application domain. The nonsidebysideprocess attribute specifies that each process can only load one version of this Assembly. The nonsidebysidemachine attribute specifies that only one version of the Assembly can be loaded at a time on each machine.

4. More in-depth CLR

The construction of osgi. Net plug-in Framework (: http://www.iopenworks.com/Products/SDKDownload) requires a deep understanding of CLR, and a deep understanding of CLR helps us better understand the plug-in loading mechanism of osgi. net. We learned about CLR from the book "essential. net, volume I" and translated important parts into Chinese for the Framework designer to read. If you want to have a better understanding of CLR, you can read this book. It is best to read the English version.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More