Microsoft. The basic principles of type usage in the net Platform

Source: Internet
Author: User
Microsoft Microsoft. The basic principles of type usage in the net Platform

----Microsoft. NET Platform Series article II

Translation/Zhao Ning

In the last discussion, I introduced a lot of Microsoft. NET platform common language runtime CLR (Common Language Runtime) basic concepts related to types. It focuses on how to derive all the other types from the System.Object type, as well as a variety of coercion type conversion mechanisms that programmers can use, such as the C # operator. Finally, I mentioned how the compiler uses namespaces and how the common language runtime CLR ignores namespaces.
In this article, we will continue the discussion of the last type of foundation. Start with the introduction of simple types, and then quickly enter a discussion about reference types and numeric types. It is particularly important for all developers to have a good command of the application differences between reference types and numeric types. In the process of writing code, improper use of these two types can cause bugs in the program and cause performance problems.

Simple Type
Some commonly used data types, many compilers can handle them through simple syntax. For example, in the C # language, you can use the following syntax to assign an integer variable:

int a = new int (5);


But I'm sure you'll find it awkward to declare and initialize an integer variable with such syntax. Fortunately, many compilers, including the C # compiler, allow you to use the following syntax instead:


int a = 5;


This makes the code more readable. Regardless of the syntax used, the resulting intermediate language is the same.
The data types that are directly supported by the compiler are called simple data types. These simple data types map directly to types that exist in the base Class library. For example, the int type in C # is mapped directly to System.Int32. So the following two lines of code can be the same as the two lines mentioned earlier:


System.Int32 a = new System.Int32 (5);
System.Int32 a = 5;


Figure one is the corresponding table in C # for a simple data type and a type in the base Class library (other languages also provide similar simple data types)

Reference types and numeric types
When an object is allocated from the Tang (managed heap), the new operator returns the memory address of the object. This address is usually stored in a variable. This approach is a variable that refers to a type, because the variable does not contain the bit of the actual object, but rather the bit of the object.
There are some performance issues to consider when working with reference types. First, the memory must be allocated from the managed heap, which forces garbage collection. Second, reference types are always accessed through pointers. So each time you reference the members of an object in the heap, you must generate and execute the code that withdraws the pointer in order to achieve the desired processing. This affects the size of the program and the speed at which it executes.
In addition to reference types, there are lightweight numeric types in the actual object system. A numeric type object cannot be allocated in a garbage-recyclable heap, and a variable that represents an object does not contain a pointer to an object, but rather a variable containing the object itself. Because variables contain objects, processing objects do not have to consider the problem of pointer collection, which improves performance.
The code in Figure II illustrates the difference between a reference type and a numeric type. Rectangle type declarations use structs instead of more mundane classes. In C #, a type declared with a struct is a numeric type, and a reference type is declared using a class. Other languages may use different syntax to describe numeric and reference types, such as the use of _value modifiers in C + +.
Review the lines of code mentioned earlier when discussing simple types:


System.Int32 a = new System.Int32 (5);


When compiling this statement, the compiler discovers that the System.Int32 is a numeric type and optimizes the resulting intermediate language (IL) code so that the "object" is not allocated from the heap, and the object is placed on the thread stack in local variable a. 、
When possible, you should use a numeric type instead of a reference type, which can make your application performance better. In particular, when using the following data types, you should declare the variable as a numeric type:

* Simple data type.
* No data types are required to inherit from other types.
* There is no data type derived from it.
* Type objects are not routinely passed as method parameters because they can cause frequent memory copy operations and thus impair performance. This is explained in more detail in the following discussion of boxes and boxes.

The primary advantage of numeric types is that they are not allocated in the managed heap. However, there are several limitations to using numeric types compared to reference types. The following is a comparison of numeric types and reference types.
There are two representations of a numeric type object: The form of the box and the form of the box. A reference type object is always represented as a box in form.
Numeric types are implicitly derived from the System.ValueType type. This type provides the same method as the System.ValueType definition. However, System.ValueType overloads the Equals method to return True when two object instance fields match. In addition, System.ValueType overloads the GetHashCode method to generate a hash code value in an object instance field using an algorithm that participates in these values. When defining your own numeric types, it is highly recommended that you overload and provide an external equals and GetHashCode method implementation.
Because you cannot declare a new numeric type or a new reference type by using a numeric type as a base class, the numeric type should not have a virtual function, cannot be abstracted, and is encapsulated implicitly (the wrapper type cannot be used as the base class for the new type).
The reference type variable contains the address of the object in the heap memory. By default, a reference type variable is initialized to NULL when created (NULL), which means that the reference type variable does not currently point to a valid object. Attempting to use a reference type variable with a null value causes a NullReferenceException exception. In contrast, for a numeric type variable, it always contains a potential type of value, by default, all members of this numeric type are initialized with 0 (zero). It is not possible to produce NullReferenceException exceptions when accessing numeric types.
When you assign the contents of a numeric type variable to another value type variable, the value of the variable is copied. When you assign the contents of a reference type variable to another reference type variable, only the memory address of the variable is copied.
From the above discussion, it can be concluded that a single object in the heap can involve more than two reference type variables. This allows the action on one variable to affect the object referenced by another variable. On the other hand, each numeric type variable has its own copy of the object data, and operations on one of the numeric type variables do not affect other numeric type variables.
It is rare that the runtime must initialize numeric types and cannot call their default constructors, such as occurs in the following cases, which must be assigned and initialized to the thread-local numeric type when the pipe code is first executed by the pipeline. In this case, the runtime cannot invoke the type's constructor, but still guarantees that all members are initialized to zero or null. For this reason, it is recommended that you do not define a parameterless constructor for a numeric type. In fact, the C # compiler (and other compilers) will think that there is an error and no longer compiles the code. This problem is rare and does not occur on reference types. There are no these restrictions for parameterized constructors of numeric types and reference types.
Because the box's numeric type is not allocated in the heap, as long as the method that defines the instance of the type is no longer active, it is possible to allocate the storage area to them gracefully. This means that the memory of the value type object being retracted is not notified. However, a value type that is in a box has its Finalize method invocation when it is reclaimed as garbage. You must not implement a numeric type using the Finalize method. Like a parameterless constructor, C # considers this to be an error and no longer compiles the source code.

box into and box out
In many cases, a numeric type is used as a reference type to facilitate problem handling. Suppose you want to create a ArrayList object (it is a type defined in the System.Collections namespace) to hold some points (Points). See figure Three.
Each loop point numeric type in the code is initialized, and then the points are stored in the ArrayList. But think about what is actually stored in the ArrayList? Is the point structure or the address of the point structure, the tilt or something else? To get the answer, you have to look at the Add method of ArrayList to see what kind of parameters it is defined. In this code, you can see that the Add method is prototyped in the following ways:


public virtual void Add (Object value)


Obviously, the parameter of the Add method is an object. Objects are always considered a reference type. But actually I'm passing in the code a P, which is a point numeric type. To run this code, you must convert the point value type to a real heap managed object, and you must be able to get a reference to that object.
Converting a numeric type to a reference type is called a box entry. Its internal conversion mechanism can be described as:
1. Allocates memory from the heap, the memory size equals the memory cost of the value type, plus the memory overhead for the object, and the additional cost includes the virtual table pointer and the memory needed to synchronize the block pointers.
2, the bit of the numeric type is copied to the newly allocated heap memory.
3, the address of the object is returned. This address is both the current reference type.
Compilers in some languages, such as C #, automatically generate the intermediate language code (IL) required for a numeric type, but it is important to understand the internal mechanism of the box into the transformation to understand the code volume and performance issues.
When the Add method is invoked, the point object is allocated memory in the heap. The members residing in the current point numeric type (p) are copied to the newly assigned point object. The Point object Address (reference type) is returned and then passed to the Add method. This point object will be kept in the heap until it is reclaimed as garbage. The point value type variable (p) can be flushed or released because a rraylist never knows any information about the point numeric type variable. box to make the type uniform, and any type of value can basically be treated as an object.
In contrast to the box, the box is used to regain a reference to the numeric type (data field) contained in the object, and its internal mechanism can be described as:
1, the CLR (Common Language Runtime) First guarantees that the reference type variable is not empty and that it is the box value of the desired numeric type, and if neither of these conditions is true, a InvalidCastException exception is generated.
2. If the type does match, the numeric type pointer contained in the object is returned, and the numeric type that the pointer refers to does not contain the overhead that is usually associated with the real object: the virtual table pointer and the synchronization block pointer.
Note that the box entry always creates a new object and copies the box out of the bit to this object. and the box out simply returns a box to the object's data pointer: a copy of the memory does not occur. However, it is usually the case that the code causes the data to be copied from the referenced type being framed.
The following code demonstrates the box in and box::

public static void Main () {
Int32 v = 5; Create a box out of a numeric type variable
Object o = v; O is a box version of V
v = 123; Change the box out to a value of 123

Console.WriteLine (v + "," + (Int32) o); Show "123, 5"
}

From the above code can you imagine how many box-entry operations have occurred? You will be surprised to find that the answer is 3! Let's take a closer look at the code to really understand what's going on.
The first creation is a Int32 box of the value type V, the initial value is 5. Then create an object reference type O and try to point to V. However, a reference type must always point to an object in the heap, so C # produces the corresponding intermediate language code to frame the variable V and stores the address of the box in V in O. Now 123 is framed and the referenced data is copied to the box out of the value type V, and it does not affect the box version of V, so the box in version keeps its value at 5. Note that this example demonstrates how O is framed (returns the pointer to the data in O), and the value of the data in O is copied to the box.
Call WriteLine now. It requires a string object to pass to it, but you do not have a string object, but you have three known items: a Int32-bit box with a numeric type V, a string (","), and a Int32 reference type (or box-type) O. They must be grouped together to form a string.
To construct a string object, the C # compiler produces code that invokes the static Concat method of the string object. There are several overloaded versions of the Concat method. They implement the same functionality, except that the number of parameters is different. If you want to format a string with three known items, the compiler will select the following concat method:


public static String Concat (object arg0, Object Arg1, Object arg2);


The first parameter is arg0, which is used to pass v. But V is the value parameter of the box, and arg0 is an object, so V has to be framed and arg0 to pass the address of the V in the box. The second argument is Arg1, which is the address of the string "," which is the address of a string object. The last parameter is Arg2,o (an object reference) that is cast to Int32. It creates a temporary Int32 numeric type that receives the box version of the value currently referenced by O. This temporary numeric type must be used again in the memory address box passed by the arg2.
Once concat is invoked, it invokes the ToString method of each specified object and links the string values of each object. The string object returned from Concat is then passed to the WriteLine to display the final result.
It should be noted that the resulting intermediate code (IL) is more efficient if WriteLine is invoked in the following form:


Console.WriteLine (v + "," + O); Show "123, 5"


This line of code is the same as the previous version, except that the "Int32" cast is removed from the front of O. It is more effective because O is already a reference type of an object and its address is passed directly to the Concat method. This avoids the one-box operation and avoids the one-time box-entry operation.


Here is another example of box entry and box:
public static void Main () {
Int32 v = 5; Create a box out of a numeric type variable
Object o = v; O is a box version of V

v = 123; Changing the box out of the numeric type is 123
Console.WriteLine (v); Show "123"

v = (Int32) o; Box out o to V
Console.WriteLine (v); Show "5"
}

In this code, do you calculate how many box-entry operations there are? The answer is once. The only time a box in operation is because there is a WriteLine method that receives the Int32 type as a parameter.

public static void WriteLine (Int32 value);

In two WriteLine calls, the variable V (Int32 box out numeric type) is passed by value. WriteLine may be inside the frame, and you can't control it. The important thing is that you've done your best and excluded the box from the code.
When you know that the code you write will cause the compiler to generate a lot of code in the box, you will get smaller and faster code if you switch to the manual method box in the numeric type, as shown in Figure four
The C # compiler automatically generates box-entry and box-out code. It makes programming easier, but it hides overhead for programmers who care about performance. As with the C # language, other languages may also hide box entry and box details. But some languages may force programmers to explicitly write boxes and boxes out of code. For example, C + + Managed Extensions require programmers to explicitly use the __box operator to box a numeric type, which is done by using dynamic_cast. Casts the box into the type with its equivalent box type.
Finally, note that if a numeric type does not overload a virtual method defined by System.ValueType, this method can only be called formally in the box of that numeric type. This is because only the box in the form of this object has a virtual table pointer. Methods that are directly defined with numeric types can be invoked in boxes and boxes of this value by two versions.


Conclusion
The concepts discussed in this article are for. NET developer is critical. You should really understand the difference between a reference type and a numeric type. You must also understand which operation requires a box, and whether the compiler you are using automatically frames the numeric type (like C # and Visual Basic). If so, you should also understand when the compiler is in the box and how it affects the code. The emphasis on these concepts is not excessive, any misunderstanding can easily lead to a decline in the performance of the program or even imperceptible bugs.



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.