Introduction to generics in CLR [reprint]

Source: Internet
Author: User
Tags hosting

Generics are extensions of CLR-type systems that allow developers to define types that do not specify certain details. On the contrary, these details are specified when the user code references the code. Reference generic code to fill in the missing details, and adjust the type according to its specific needs. Generic naming reflects the purpose of this function: allows you to write code without specifying details that may limit the scope of use. The code itself is generic. I will introduce it in more detail later.

How long does it take to provide generics? Microsoft plans to provide generics when the CLR code is codenamed "Whidbey" is released. After this column is published, the beta version of Whidbey CLR should be released. In addition, in the CLR beta version, the language and compiler are expected to be updated to make full use of generics. Finally, to include generic support, Microsoft's research team has modified the Shared Source public language implementation (CLI)-code "Rotor ". The modified Runtime Library (that is, the code is "gyro") is located inHttp://research.microsoft.com/projects/clrgen.

Generic Preview

Just as with any new technology, understanding the benefits of it will help. Users who are familiar with the c ++ template will find that generics have similar purposes in managed code. However, I do not want to compare the CLR generics and C ++ templates too much, because generics have some additional benefits and do not have the following two common problems: code bloated and obfuscated by developers.

CLR generics have some benefits, such as type security during compilation, binary code reuse, performance, and clarity. I will briefly introduce these benefits. You will learn more about them when you read other articles in this column. For example, assume there are two collection classes: sortedlist (set referenced by object) and genericsortedlist <t> (Set of any type ).

Type security? When you add a string to a sortedlist type set, the string is implicitly converted to an object. Similarly, if A String object is retrieved from the list, it must be forcibly converted from the object reference to the string reference at runtime. This will cause a lack of type security during compilation, so that developers are bored and prone to errors. Conversely, if you use genericsortedlist <string> (the T type is set to string), all adding and searching methods will use string reference. This allows you to specify and check the element type during compilation rather than runtime.

Binary code reuse? For maintenance purposes, developers can choose to use sortedlist to derive sortedlistofstring from it to implement type security during compilation. There is a problem with this method, that is, you must write new code for each type that requires the type security list, which will quickly become a very laborious task. To use genericsortedlist <t>, all the operations that need to be performed are to instantiate the type with the required element type as T. There is also an additional value for generic code, that is, it is generated at runtime. Therefore, two extensions of irrelevant element types (such as genericsortedlist <string> and genericsortedlist <filestream>) most of the code that can be re-compiled using the same real-time (JIT. CLR only handles details-the code is no longer bloated!

Performance? The key is: if the type check is performed at the compilation time rather than the running time, the performance is enhanced. In managed code, forced conversions between references and values can cause both packing and unboxing, and avoid such forced conversions that may have the same negative impact on performance. A quick sorting benchmark was recently conducted for an array composed of 1 million integers. The results show that the generic method is three times faster than the non-generic method. This is because it completely avoids packing these values. If you sort arrays composed of string references in the same way, you do not need to perform a type check at runtime. Therefore, the performance of the generic method is improved by 20%.

Clarity? The clarity of generics is reflected in many aspects. Constraints are a function of generics. They disable incompatible extensions of generic code, you will no longer be faced with ambiguous compiler errors that plague C ++ template users. In the genericsortedlist <t> example, the collection class has a constraint that allows the collection class to only process T types that can be compared and sorted accordingly. Similarly, generic methods can be called using the function named type inference without any special syntax. Of course, the type security during compilation can make the application code clearer. I will detail constraints, type inference, and type security in this column.

A simple example

The Whidbey CLR version provides these off-the-shelf benefits through a set of generic collection classes in the class library. However, you can further benefit from generics by defining your own generic code for the application. To explain how this is done, I will first modify a simple link list node class to make it a generic class type.

Figure 1The node class only includes some basic content. It has two fields: m_data (referencing node data) and m_data (referencing the next item in the Link List ). Both fields are set by the constructor method. There are indeed only two other embellishment features. The first feature is to access the m_data and m_next fields through the read-only attributes named data and next. The second function is to overwrite the tostring virtual method of system. object.

Figure 1The code for using the node class is also displayed. This reference code is subject to certain restrictions. The problem is that in order to be used in many contexts, the data must be of the most basic type, namely system. object. This means that when using node, it will lose any form of type security during compilation. Using object means that "any type" in the algorithm or data structure forces the code to be forcibly converted between object reference and actual data type. Any Type Mismatch Error in the application is captured only after running. If a forced conversion is attempted at run time, these errors will be in the form of invalidcastexception.

In addition, if you want to assign any primitive value (such as int32) to the object reference, You need to pack the instance. Packing involves memory allocation, memory replication, and garbage collection of boxed values. FinallyFigure 1As shown in, forced conversion from object reference to value type (such as int32) will cause unboxing (including type check ). Because packing and unboxing will compromise the overall performance of the algorithm, you will understand why object usage means that "any type" has some disadvantages.

Using generic node rewriting is a perfect solution to these problems. Let's take a look.Figure 2You will find that the node type is rewritten to the node <t> type. A type with generic behaviors (such as node <t>) is a parameterized type and can be called parameterized node, node of T, or generic node. I will introduce this new C # syntax later. Let's take a deeper look at the differences between node <t> and node.

The node <t> type is similar to the node type in terms of function and structure. Both support building a link list for any given type of data. However, node uses system. object to represent "any type", and node <t> does not specify this type. Instead, node <t> uses a type parameter named t that acts as a Type placeholder. When the user code uses node <t>, the type parameter named T is finally specified by the node <t> parameter.

Figure 3The code in uses a 32-bit signed integer node <t>, which is achieved by constructing a similar type name: node <int32>. In this example, int32 is the type variable of type parameter T. (By The Way, C # will also accept node <int> to indicate T as int32 .) If the Code requires a link list of another type (such as string reference), this can be done by specifying it as a T type variable, for example: node <string>.

The advantage of node <t> is that its algorithm behavior can be clearly defined, and the data type it operates on remains unspecified. Therefore, the node <t> type is specific in terms of work methods, while the generic type is specific in terms of content to be processed. In short, it is best to leave details such as the data type that the link list should possess to the code of node <t>.

When discussing generics, it is best to define two roles: defining code and referencing code. The definition code includes code that declares both the existence of generic code and the definition of type members (such as methods and fields.Figure 2Shows the definition code of the type node. A reference code is a user code that uses predefined generic code and can be built into another set of programs.Figure 3Is a reference code example of node <t>.

It is very useful to consider defining code and referencing code because both roles play a certain role in actual generic code structures.Figure 3The reference code in uses node <t> to construct a new type named node <t>. Node <int32> is a completely different type, which is constructed from the following two key components: node <t> (created by the definition code ), the Type Variable int32 of parameter T (specified by the reference code ). Only these two components can complete the generic code.

Note that from the perspective of object-oriented derivation, generic types (such as node <t>) and the types constructed from the generic type (such as node <int32> or node <string>) are not related types. Node <int32>, node <string>, and node <t> are of the same generation. They are derived directly from system. object.

C # Generic syntax

CLR supports multiple programming languages. Therefore, CLR generics have multiple syntaxes. However, no matter which syntax is used, generic code written in a CLR-oriented language can be used by programs written in other CLR-oriented languages. I will introduce the C # syntax in this article because the Generic C # syntax is quite stable among the three major hosting languages when writing this article. However, it is not necessary to support generics in Visual Basic. NET and managed C ++ Whidbey versions.

Figure 4Displays the basic C # Syntax of generic definition code and generic reference code. The syntax difference between the two reflects the different responsibilities of both parties involved in generic code.

The current plan is to allow CLR (so that C #) to support generic classes, structures, methods, interfaces, and delegation.Figure 4Shows the C # syntax example for each defined code .. Note that angle brackets indicate the list of type parameters. Angle brackets follow the name of a generic type or member. Similarly, there are one or more type parameters in the type parameter list. Parameters also appear in the entire definition of generic code to replace specific CLR types or parameters used as type constructors.Figure 4The C # syntax example that matches the referenced code is displayed on the right side. Note that, here, type variables are enclosed in angle brackets; generic identifiers and parentheses form a completely different new identifier. Also note that the type variable specifies the type used to construct a type or method from the generic type.

Let's take a moment to define the code syntax. When the compiler encounters a list of type parameters separated by Angle brackets, it can identify the generic type or method you are defining. The angle brackets in the generic definition follow the name of the defined type or method.

Type-the parameter list specifies one or more types that need to remain unspecified in the generic code definition. The names of type parameters can be any valid identifiers in C #. They can be separated by commas. ForFigure 4In the "definition code" section, pay attention to the following:

In each sample code, we can see that the type parameter T or U is used in the entire definition (usually the location of the type name will appear.

In the icomparable <t> interface Example, we can see that both the type parameter t and the general type int32 are used. In the definition of generic code, you can use unspecified types (through type parameters) and specified types (using CLR type names ).

In the node <t> example, we can see that the type parameter t can be used independently as in the m_data definition, and can also be used as part of another type construction as in m_next. A type parameter (such as node <t>) used as a variable of another generic type definition is called an open generic type. A specific type (such as node <system. byte>) used as a type parameter is called a closed generic type.

Like any generic method,Figure 4The sample generic method swap shown in can be a part of a generic or non-generic type, or an instance, virtual, or static method.

In this column, I use single-character names (such as T and U) for type parameters to make the situation easier. However, you can also use descriptive names. For example, in the product code, the node <t> type can be equivalent to node <itemtype> or node <datatype>.

At the time of writing this article, Microsoft has standardized the single character type parameter names in the library code to help distinguish between these names and common type names. I personally prefer to use camelcasing type parameters in product code, because they can be distinguished from simple type names in code, while at the same time being descriptive.

In the generic reference code, unspecified types are converted to specified types. If the referenced Code actually uses generic code, this is necessary. If you viewFigure 4In the "referencing code" section, you will find that in all cases, the new type or method is by specifying the CLR type as a generic type variable, constructed from a generic. In the generic syntax, Code such as node <byte> and pair <byte, string> indicates the type name of the new type constructed from the generic type definition.

Before going into the technology itself, I will introduce the syntax details. InFigure 4When the code calls a generic method (such as a swap <t> method), the fully qualified call syntax includes any type variable. However, you can choose to exclude the type variables from the call syntax, as shown in the following two lines of code:

Decimal d1 = 0, d2 = 2;Swap(ref d1, ref d2);

This simplified call syntax relies on a C # compiler function called type inference. In this function, the compiler uses the parameter type passed to the method to derive type variables. In this example, the compiler derives from the Data Types of D1 and D2. The type variable of the type parameter T should be system. decimal. If there is ambiguity, type inference does not work for the caller, And the C # compiler will generate an error. We recommend that you use the complete call syntax containing angle brackets and type variables.

Indirect

One of my friends liked to point out that most perfect programming solutions are designed around adding another indirect layer. Pointers and references allow a single function to affect multiple instances of a data structure. Virtual functions allow a single call site to send calls to a group of similar methods-some of which can be defined later. These two indirect examples are so common that programmers usually do not notice the indirect itself.

The indirect main purpose is to improve code flexibility. Generic is an indirect form in which the definition does not produce code that can be used directly. Instead, a "code factory" is created in defining generic code ". Then, the reference code uses the factory code to construct the code that can be used directly.

Let's first understand this design idea from the generic method.Figure 5The Code defines and references a generic method named comparehashcodes <t>. The definition Code creates a generic method named comparehashcodes <t>,Figure 5The code displayed in does not directly call comparehashcodes <t>. On the contrary, in Main, the reference code uses comparehashcodes <t> to construct two different methods: comparehashcodes <int32> and comparehashcodes <string>. These constructor methods are comparehashcodes <t> instances, which are called by reference code.

Generally, operations performed by a method are defined directly in the definition of a method. In contrast, in the definition of generic methods, the operations that the constructor instance will perform will be defined. The generic method itself does not perform any operations except to act as a model for constructing a specific instance. Comparehashcodes <t> is a generic method that can be used to compare hash code. Construct an instance (such as comparehashcodes <int32>) to perform actual work. It compares the hash code of integers. On the contrary, comparehashcodes <t> is an indirect layer that is deleted from callable.

The generic type is similar to an indirect layer that is deleted from the corresponding simple copy. The system uses a simple type definition (such as a class or structure) to create objects in memory. For example, the system. collection. Stack type in the class library is used to create a stack object in the memory. In a sense, you can regard the new keyword in C # Or the newobj instruction in the intermediate language code as an object factory. When the object factory creates an object instance, use the managed type as the blueprint for each object.

On the other hand, generic types are used to instantiate closed types rather than object instances. Then, you can create an object using a type constructed from the generic type. Let's reviewFigure 2The node <t> type defined inFigure 3Reference code.

Hosted applications can never create node <t> type objects, even if they are hosted. This is because node <t> lacks adequate definitions and cannot be instantiated as an object in memory. However, during application execution, node <t> can be used to instantiate another type.

Node <t> is an open generic type and is only used to create other construction types. If the structure type created using node <t> is closed (such as node <int32>), it can be used to create objects.Figure 3Using node <int32> in the reference code is the same as using a simple type. It creates node <int32> type objects, calls methods on these objects, and so on.

The generic type provides an additional indirect layer, which is very powerful. A custom hosting type is generated when generic type reference code is used. Think of generic code as an indirect layer that is removed from its simple copy in mind, which helps intuitively understand many of the behaviors, rules, and usage of generics in CLR.

Summary

This article describes the benefits of generic types-how to use them to improve type Security, code reuse, and performance. This article also describes the syntax in C # And how generics lead to another layer of indirect, thus improving flexibility. Please be patient. Next time I will further analyze the generic type.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.