Analysis on the reference type and value type in. Net (below)

Source: Internet
Author: User

In the previous article, I briefly talked about it. the differences between the. Net Value Type and the reference type are analyzed. The memory layout and implementation method of the reference type are analyzed. In the initial example, the advantages of the Value Type compared with the reference type are briefly analyzed. In normal development, many people use class as soon as they come up, and seldom think about whether to use class or struct. This article describes in detail the value types in. NET and issues that should be paid attention to during use. In some cases, the use of value type can significantly reduce memory usage and GC pressure compared with the reference type, and improve the program execution efficiency. For more information, see Pro. NET performance, CLR via C #, and advanced. Net debugging.

Internal Implementation of value types

Compared with the reference type, the value type has a relatively simple memory layout, but this simple layout also introduces some restrictions, especially when you want to use the value type as the reference type, you need to perform the packing operation.

As mentioned in the previous article, the main reason for using value types is that value types have good memory allocation density and do not have any complicated structures. When you create your own value type, each byte can actually be used.

For the convenience of discussion, the point2d type is described below:

public struct Point2D{    public int X;    public int Y;}

When we instantiate this object as X = 5, y = 7, its memory layout is as follows, there is no additional field like the reference type.

In a few cases, we may need to develop the layout of value type fields in the memory. The most typical example is that during interoperability, the field must be passed to the unmanaged code in a sequence defined by the programmer. To send commands to CLR, we can use the system. runtime. interopservices. structlayoutattribute attribute to implement this requirement. The structlayout attribute can be used to define the layout of fields of the type in the memory. We can pass the layoutkind through its constructor. auto: enables the CLR to automatically arrange fields and layoutkind. sequential allows CLR to maintain our field layout, or layoutkind. explicit uses fieldoffset to customize the layout. If this parameter is not set, CLR selects the layout method that it considers as the best. Generally, the drop CLR selects layoutkind. Auto by default for the reference type and layoutkind. Sequential for the value type. Explicitly specified by the fieldoffset attribute, which allows us to create a "union" type similar to the C style. The fields after the custom offset may overlap (overlap ), the following example shows how to convert a floating point to a four-byte representation using the structure type.

[StructLayout(LayoutKind.Explicit)]public struct FloatingPointExplorer{    [FieldOffset(0)]    public float F;    [FieldOffset(0)]    public byte B1;    [FieldOffset(1)]    public byte B2;    [FieldOffset(2)]    public byte B3;    [FieldOffset(3)]    public byte B4;}

When a float type is assigned to the f field of the object, it modifies the B1-B4 field at the same time, and vice versa. F fields and B1-B4 fields overlap in memory.

Because the value type instance does not have object header bytes and method table pointers, it cannot provide as rich semantics as the reference type. Next let's take a look at the limitations of this simple memory layout that make the value type and what will happen if the value type is used in some places like the image reference type.

Limitations of value types

First, consider the object header bytes. If the program tries to use value-type instances for synchronization, this is usually a bug, but should it be considered illegal during runtime and throw an exception? In the following code, what if two threads call the increase method of the counter instance at the same time?

class Counter{    private int _i;    public int Increment()    {        lock (_i)        {            return ++_i;        }    }}

In vs, the C # compiler does not allow the use of the lock keyword on the value type. However, we know that lock is a syntactic sugar provided by C #. It will be converted to the monitor method, so we rewrite the above Code:

class Counter{    private int _i;    public int Increment()    {        bool acquired = false;        try        {            Monitor.Enter(_i, ref acquired);            return ++_i;        }        finally        {            if (acquired) Monitor.Exit(_i);        }    }}

In this way, you can compile it. This introduces a bug in the program. The result is that multiple threads can simultaneously enter the lock and modify the _ I variable. An exception will be thrown when the monitor. Exit call is performed. The problem is that the monitor. Enter method accepts a system. object parameter of the reference type, but the value type is passed in. Even if we pass the reference type as required, monitor. the parameter value in enter and monitor. the values in exit are also different. Similarly, they are transmitted to monitor in a thread. enter and monitor in another thread. the parameters in the enter method are different. If we pass the value type in, there is no way to obtain the correct locking semantics.

The Value Type semantics is not suitable for reference as an object. Another example is when a type is returned in a method. See the following code:

object GetInt(){    int i = 42;    return i;}object obj = GetInt();

Type of the return value of the getint method. However, the return type of the method is expected to be an object-type reference. The method can directly return the reference of the position where the I value is stored in the thread stack. Unfortunately, an invalid reference to the memory address is generated because the stack frame of the method is recycled when the value is returned. This indicates the copy value semantics, which is not suitable for object reference.

Virtual method of Value Type

So far, we have not taken into account the method table pointer of the value type. However, there are still many insurmountable problems when we use the value type as a first-class citizen. Now let's take a look at how value types Implement Virtual Methods and interface methods. CLR prohibits inheritance between value types, which makes it impossible to define new Virtual Methods on value types. This is lucky because if a new virtual method can be defined in the value type, the method table pointer is required to call these virtual methods, but the value type does not. This is not a major limitation, because the copy semantics of the reference type makes them more suitable for polymorphism, because this requires object reference.

However, the value type inherits virtual methods from the system. Object type. These methods include equals, gethashcode, tostring, and finalize. We will first discuss the first two, and the following virtual methods will also be discussed. Let's look at their signatures:

public class Object{    public virtual bool Equals(object obj) ...    public virtual int GetHashCode() ...}

Each type in. Net implements these virtual methods, including value types. This indicates that, given a value type instance, we can successfully call its virtual method, even if they do not have a method table pointer.

The third example shows how the spatial layout of the Value Type affects some simple operations on the value type, for example, you can convert a value type to an object that provides more functions.

Packing of value types

When the language compiler detects that the value type needs to be processed as the reference type, it will generate a boxed il command. Then, the JIT compiler interprets these commands, calls the method to allocate space on the hosting stack, and then copies the content of the Value Type instance to the stack, then wrap the object header (Object Header pointer and method table pointer) for the value type ). Any place where the value type needs to be used as the reference type will produce a packing operation. It should be noted that the boxed object has no relationship with the original value type instance, and changing one has no impact on the other.

.method private hidebysig static object GetInt() cil managed{    .maxstack 8    L_0000: ldc.i4.s 0x2a    L_0002: box int32    L_0007: ret}

Packing is a very expensive operation. It involves the distribution of memory, copying, And the GC will be under pressure due to the need to reclaim the temporarily created packing objects. In addition to reflection and other rare cases, generics introduced in CLR 2.0 can effectively avoid packing operations. In any case, packing has obvious performance problems in many applications. We will see in "how to correctly use value types, if you do not fully understand the method call operations in the value type, it is difficult to avoid various packing operations.

Without considering performance issues, packing provides a solution for some of the problems we encountered earlier. For example, the getint method returns a reference to the 42-value-type packing. This boxed object will always exist as long as there is a reference, and will not be affected by the lifecycle of the local variable of the method call stack. Similarly, when the monitor. Enter method needs to reference the type, it will pack the value type at runtime, and then use the boxed object for synchronization. Unfortunately, some reference objects generated by binning value-type instance objects may be different in different places of the Code. Therefore. the reference type and monitor. the reference types after the value types in enter are not the same. The reference type after the value types passed in monitor. Enter in one thread are packed is different from the objects after the same value types in the same method in the other thread are packed. This means that using the value type as the monitor-based synchronization policy is inherently incorrect, regardless of whether the value type is boxed into the reference type.

Another key problem left over is the virtual method inherited from system. object. In fact, value types are not directly inherited from the system. Object type. On the contrary, all value types are indirectly inherited from system. valuetype.

System. valuetype overwrites the two virtual methods inherited from the system. Object Type equals and gethashcode. This makes sense. The equality of value types and the equality of reference types have different semantics, which must be implemented somewhere. For example, overwrite system. the equals method in valuetype ensures that the value types can be compared based on the content they contain. the equal method in the object type compares whether the object references are the same.

Regardless of how system. valuetype overwrites these virtual methods, consider the following scenarios. You store tens of millions of point2d objects in list <point2d>, and then use the contain method in this collection to find whether a specific point2d object exists. However, contains can only perform linear lookup on 10 million pieces of data and compare them with the provided objects one by one.

List<Point2D> polygon = new List<Point2D>();//insert ten million points into the listPoint2D point = new Point2D { X = 5, Y = 7 };bool contains = polygon.Contains(point);

It may take some time to traverse tens of millions of objects and compare them one by one, but this is still a relatively fast operation. The number of accessed bytes is approximately 8 million (each point2d object occupies 8 bytes), and then the comparison operation is fast. However, to compare two point2d objects, you must call the equals virtual method:

Point2D a = ..., b = ...;a.Equals(b);

There are two problems. First, even if the equals virtual method inherited from system. valuetype, it also accepts a reference type parameter of system. object. If you use point2d as the reference type, you need to bind the object. Therefore, B needs to be boxed. Further, to call the equals virtual method on an object, a needs to be boxed to obtain the header pointer of its method table.

Note:The JIT compiler will actually generate code that calls equals directly, because the value type is sealed and whether or not point2d overwrites the equals method, the method of the object called during compilation is determined. However. valuetype is a reference type. The equals method accepts the first this parameter internally, that is, it is a reference type for itself. Therefore, the equals method is called on value type, you still need to pack B once.

In short, if optimization of the JIT compiler is not taken into account, the equals Method on each point2d instance object needs to be packed twice. The above 10 million comparisons will produce 10 million packing operations. On 32 machines, each operation requires 16 bytes of space, and a total of 320,000,000 bytes are allocated, and 160,000,000 will be copied to the managed stack. These allocation operations take much longer than simply comparing the two fields of point2d.

Avoid packing by calling the value type equal Method

So how can we completely eliminate this packing operation? One way is to override the equals method inherited from system. value, and provide equal logic for our own value types.

public struct Point2D{    public int X;    public int Y;    public override bool Equals(object obj)    {        if (!(obj is Point2D)) return false;        Point2D other = (Point2D)obj;        return X == other.X && Y == other.Y;    }}

Even if JIT optimization is taken into account,. the equals (B) method still needs to pack B, because the inherited method accepts a system. parameter of the object type reference type, but a does not need to be packed. To remove the second packing operation, we need to consider from the packing operation to provide an equals Method for overloading:

Public struct point2d {public int X; Public int y; Public override bool equals (Object OBJ )... // same as public bool equals (point2d other) {return x = Other. X & Y = Other. Y ;}}

In this way, when the compiler encounters a. Equals (B), it will first select the second one because its parameter type is more specific. Here, we have several methods to reload-normally, we use = and! = Symbol to compare the types, so you need to reload these two operators.

public struct Point2D{    public int X;    public int Y;    public override bool Equals(object obj) ... // as before    public bool Equals(Point2D other) ... //as before    public static bool operator==(Point2D a, Point2D b)    {        return a.Equals(b);    }    public static bool operator!= (Point2D a, Point2D b)    {        return !(a == b);    }}

This is basically done. One extreme case is that when CLR implements generics, it still needs to be packed when calling the equals method of the point2d object in list <point2d>, point2d is an implementation of generic type parameters (t. So here, the point2d object also needs to implement the iequatable <point2d> interface, in this way, the list <t> and equalitycomparer <t> objects can correctly call the overloaded equals method through the interface (the only pity is that it takes a little performance to call the javasitycomparer method. <t>. equal abstract method ). In this way, the execution speed is 10 times faster than before, and the memory allocation introduced due to the boxing of a specific object in the 1000000 point2d objects is completely eliminated.

public struct Point2D : IEquatable<Point2D>{    public int X;    public int Y;    public bool Equals(Point2D other) ... //as before}

Now we can start to think about the implementation of the value type interface. As we have seen in the previous article, a typical interface method call requires the method table pointer of the object, which needs to be boxed for the value type. In fact, the conversion from a value type instance to an interface type variable requires packing, because the interface is used as the reference type and purpose.

Point2d point =...; iequatable <point2d> equatable = point; // requires packing

However, when calling an interface method through a static value type variable, it does not need to be boxed. As discussed above, this is a little optimization that JIT compilation helps us.

Point2d point =..., anotherpoint =...; point. Equals (anotherpoint); // The point2d. Equals (point2d) method is called.

Using the value type through the interface may cause some potential problems when the value type is variable, such as point2d objects. The modified value type does not affect the original value type, which can lead to unpredictable behavior.

Point2d point = new point2d {x = 5, y = 7}; point2d anotherpoint = new point2d {x = 6, y = 7}; iequatable <point2d> equatable = point; // bind equatable. equals (anotherpoint); // falsepoint. X = 6; point. equals (anotherpoint); // trueequatable. equals (anotherpoint); // false, the value after packing does not change

In this regard, we strongly recommend that you set the value type to an immutable type, and then create a new copy when you need to change it. system. datetime is a typical example of the unchanged value type.

The last problem is the actual execution method of valuetype. Equals. It is difficult to compare the equality of the two value types through the content of the value type. The following describes how to use reflector to view the equals method of the valuetype system:

public override bool Equals(object obj){    if (obj == null) return false;    RuntimeType type = (RuntimeType) base.GetType();    RuntimeType type2 = (RuntimeType) obj.GetType();    if (type2 != type) return false;    object a = this;    if (CanCompareBits(this))    {        return FastEqualsCheck(a, obj);    }    FieldInfo[] fields = type.GetFields(BindingFlags.NonPublic |    BindingFlags.Public | BindingFlags.Instance);    for (int i = 0; i < fields.Length; i++)    {        object obj3 = ((RtFieldInfo) fields[i]).InternalGetValue(a, false);        object obj4 = ((RtFieldInfo) fields[i]).InternalGetValue(obj, false);        if (obj3 == null && obj4 != null)            return false;        else if (!obj3.Equals(obj4))            return false;    }    return true;}

For a brief analysis, if the cancomparebits method returns true, execute the fastequalscheck Method for Equality comparison. Otherwise, the method uses reflection to search for all fields and call the equals method recursively. Undoubtedly, reflection-based cyclic operations are performance bottlenecks. Reflection is an extremely expensive operation. Cancomparebits and fastequalscheck are internal implementation calls of CLR, but they are not called by IL. Therefore, we cannot easily see them, but we can analyze the results. If the value type structure is relatively compact, cancomparebits returns true if it does not contain references to other objects.

The fastequalscheck method looks amazing, but it is actually a memcmp operation executed, compared to the storage of byte-based value-type instances in the memory. Both methods are internal implementation details. It is not a good way to use this comparison method to meet the above harsh conditions.

Gethashcode Method

The last important method to be overwritten is the gethashcode method. Before we overwrite an appropriate implementation, we will briefly discuss how this is useful. The most common hash code is used with the hash table. The hash table is a data structure that can be inserted, searched, or deleted within a constant time .. The most common hash table classes in the. NET Framework include dictionary <tkey, tvalue>, hashtable, and hashset <t>. A typical hash is composed of a group of dynamic length buckets arrays. Each bucket contains a linked list. When placing data in a hash table, he first calls gethashcode to calculate the value, then uses the hash function to calculate the ing to that buckets, and then inserts this element into the chain table of the buckets.

The performance of a hash table is heavily dependent on the hash function used when implementing the hash table. The hash function should meet the following requirements:

  1. If two objects are equal, their hash values must be equal.
  2. If the two objects are not equal, their hash values should be as unequal as possible.
  3. The gethashcode method must be fast, although it is often the linear size of the object.
  4. The object's hash value should remain unchanged.

A typical Implementation of gethashcode is the dependent object field. For example, a better implementation of gethashcode of the int type is to directly return this int value. For a point2d object, we can consider linear combination of two coordinates, or extract some digits from the two coordinates, and then combine them. It is difficult to define a common hash algorithm.

The hash value should remain unchanged. Suppose there is a point (5, 5) that stores it in a hash table, and further assumes that its hash value is 10. If you change this vertex to point (6, 6), the hash value is changed to 12. Now, you can't find the previously inserted vertex because the hash value is changed. But this is not a problem in the value type, because we cannot modify the objects already inserted into the hash table. The hash table stores a copy, and our Code cannot be accessed.

Then how is the reference type implemented? For the reference type, it is usually based on the Content equality, considering the implementation of the gethashcode method of the following type:

public class Employee{    public string Name { get; set; }    public override int GetHashCode()    {        return Name.GetHashCode();    }}

This seems to be a good idea. The hash value is based on the class capacity of the object, and we use string. gethashcode, so we do not need to implement a good hash function for strings, but considering that after we insert this type into the hash table, we change this field, what will happen:

HashSet<Employee> employees = new HashSet<Employee>();Employee kate = new Employee { Name = “Kate Jones” };employees.Add(kate);kate.Name = “Kate Jones-Smith”;employees.Contains(kate); //false!

The object's hash value has changed because its content has changed. We cannot find this object in the hash table. This is what we expected. Maybe we cannot remove the Kate object from the hash table at all, although we still access the original object.

CLR provides a default gethashcode implementation for the reference type. It is based on the principle that objects are equal in comparison. If the references of two objects are equal and only when the object is referenced, the hash value can be stored in the object itself, so that the object will not be modified and accessible easily. In fact, when an instance of the reference type is created, CLR stores the hash value of this object in the header bytes of the object (for optimization, it is generally generated when the hash value is accessed for the first time, after all, most objects never use Hash Table keys ). To calculate the hash value, you do not need to generate a random number or the content of an object. You can use a simple counter.

Note:How does the object's hash value coexist with the synchronization block in the object's header bytes? As we can see above, most objects do not use the first byte to store synchronization blocks, because they are not used for synchronization. In rare cases, objects are used for synchronization and must be stored in the header bytes to synchronize block silver fragments. The hash value is copied to the synchronized block index, until the synchronization block index is removed from the object header bytes. To determine whether the hash value or synchronized block index is currently stored in the object header bytes, a flag can be used for determination.

The reference type is implemented using the default equals and gethashcoe, without considering the four attributes mentioned above. They have all been implemented. However, if the reference type needs to overwrite the default equality behavior, and if you need to use the reference type as the key of the hash table, it should ensure its immutability.

 

Precautions for using value types

After some discussions above, we suggest using the value type in CLR via C # if all of the following requirements are met:

  • The type has the behavior of the primitive type, that is, the type is relatively simple, and no members go back to modify the instance field of the type. The field modification method is not provided, and the type is unchangeable.
  • Types do not need to be inherited from other types and do not derive from other types.

In addition, to consider the cause of Type Copy Replication, after the above two points are met, we also need to meet one of the following requirements:

  • Small instances (16 bytes or smaller)
  • The instance type is large (larger than 16 bytes), but is not passed as a method parameter or as a return type of the method.

Of course, through the analysis in this article, you can also consider using the value type in the following situations.

  • If the number of objects is small and the number of objects is large, the value type should be used.
  • If high-density memory set allocation is required, the value type should be used.

If the value type is used, pay attention to the following points:

  • The equals method must be overwritten to customize the value type, and the equals method must be reloaded to implement the iequatable <t> interface. The overload = and! = Operator
  • The custom value type should overwrite the gethashcode method.
  • The value type should be "immutable", and the copy of the new object should be re-created.
Conclusion

We analyzed the memory layout of value type and reference type, and how these details affect program performance. The value type has a good memory allocation density, which makes it advantageous to create a large data set, but it lacks the polymorphism and synchronization support of the reference type. CLR provides these two different types for us to improve the performance of the application as needed, but we still need to analyze them to correctly implement the value type.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.