Illustrates the value type, reference type, stack, heap, ref, and out of C #.

Last Update:2018-12-06 Source: Internet

Author: User

Tags define local mscorlib hosting

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The C # type system can be divided into two types: Value Type and reference type. Each C #ProgramEveryone knows. There are also concepts such as hosting heap, stack, ref, and out. They are also concepts that every C # programmer will be exposed to. They are also the knowledge that C # programmers often obtain during interviews. There are also numerous questions for casual searches.ArticleI did not write a value type document to explain the related concepts. The C # programmers who reference the type-related blog are not good. I try to fully understand the related concepts.

Principle of Program Execution

It seems that it is not easy to fully understand those concepts and their relationships, because most C # programmers do not understand managed heaps ("heaps" for short ") and the thread stack ("stack" for short), or you know them, but you don't know them deeply. You only know that the reference type is stored in the managed stack, the value type is usually stored in the stack. To understand the relationships between those concepts, I think we must first understand the basic principles of program execution, so as to understand the role of stacks and managed stacks, so as to clarify their relationships. Consider the following:Code, Main calls Method1 and Method1 calls method2:

Class program {static void main (string [] ARGs) {var num = 120; Method1 (Num);} static void Method1 (INT num) {var num2 = num + 250; method2 (num2); console. writeline (Num);} static void method2 (int I) {console. writeline (I );}}

As we all know, Windows programs usually have multiple threads. multithreading is not considered here. The program starts execution from the main method, and the main thread willAllocated to a thread stack of 1 MB only belongs to it.The 1 m stack space is used to pass parameters to the method and define local variables. Therefore, before the main method enters Method1, you must have a "memory diagram":Push num to thread StackSuch:

Then, pass num as a parameter to the Method1 method, define a local variable num2 in Method1, and call the add method to obtain the final value. Therefore, before entering method2, the "memory diagram" is as follows, num is a parameter, and num2 is a local variable.

The process of calling method2 is the same, then exit the method2 method, return to the way it looks, exit the Method1 method, return to the first figure, and then exit the program. The whole process is as follows:

So the concepts such as if, for, and multithreading are removed and only the concepts related to object memory allocation are retained. The program execution can be summarized as follows:

The program starts from the main method and repeats "defining local variables, calling methods (possibly passing parameters), returning from methods", and finally exiting from the main method. During program execution, parameters and local variables are pushed into the thread stack, and the stack is also pushed out.

Note: In fact, there are methods for pushing back the stack, such as the return address, which is ignored here.

Reference Type and heap

In the above example, I only use a simple int value type to focus only on the pressure stack (growth) and the exit stack (extinction) of the thread stack ). Obviously, C # has another type of reference, which introduces the reference type. Consider the above problem and check the following code:

 Static void main (string [] ARGs) {var user = new user {age = 15}; var num = 23; console. writeline (user. age); console. writeline (Num);} class user {public int age ;}

I think many people should know that the concept of hosting heap should be introduced at this time, but here I want to consider the problem from the stack perspective like above, so before calling writeline, the "memory diagram" should be like this (the address is garbled ):

This is also often said: for the reference type, the stack stores the address (pointer, reference) pointing to the instance object in the heap ). Since it is only an address, the instance for obtaining an object should have a step based on the address or the object to be searched. This is the case if the console. writeline (Num). In this way, if you get the num value in the stack and calculate it for the writeline method, you need to get the Instance Object of the above user in two steps at runtime, that is, the process of searching for the fields or methods of the Instance objects in the managed heap based on the address is added. Il decompile the above main method, delete some irrelevant code:

 // Load local 0 => get the local variable 0 (an address) il_0012: ldloc.0 // load field => push the value of the field in the specified object to the stack. Il_0013: ld1_int32 cildemo. Program/User: ageil_0018: Call void [mscorlib] system. Console: writeline (int32)

 // Load local 1 => get the local variable 1 (a value) il_001e: ldloc.1il _ 001f: Call void [mscorlib] system. Console: writeline (int32)

Before the second writeline method, you only need one ldloc.1 (load local 1) to read the local variable 1 command to get the value to writeline. Before the first writeline method, you need two commands to complete the task, that is, two steps are taken.

Of course, we all know that this is transparent for us, so many people like to draw such a picture to help us understand it. After all, we don't feel the address 0x0612ecb4 exists.

There is also a saying that,The reference type is stored in two segments. One is the value (Instance Object) in the managed heap, and the other is the variable that holds its reference. For local variables (parameters), this reference is in the stack, and as a field variable of the type, the reference will follow this object.

Field and local variable (parameter)

As shown in the figure above, the value of age is stored in the managed heap instead of in the stack, this is also a lot of errors made by C # beginners: Values of the value type are stored in the stack.

Obviously, they do not know the conclusion that this is the conclusion of the specific scenario in which we discuss the program running principle, the local variable (parameter) Pressure stack and the exit stack. We need to clarify that, just like the code above, apart from defining the INT-type num, the local variable to store the value 23, we can also define an int type age field member in a type to store an integer. This age is obviously not stored in the stack, so the conclusion should be:A value of the value type is stored at the position it declares. That is, the value of the local variable (parameter) will be in the stack and will follow the object as a type member.

Of course, the value of the reference type (Instance Object) is always in the managed heap. This conclusion is correct.

Ref and out

C # There is a difference between the value type and the reference type, and the keyword ref and out when passing parameters makes people's understanding of related concepts more vague. To understand this problem, we still need to understand it from the stack perspective. We can discuss in four cases: Pass the value type normally, pass the reference type normally, pass the value type REF (out), and pass the reference type REF (out.

Note: For runtime, ref and out are the same. The difference between them is that the C # compiler has different requirements on them. Ref requires initialization and out is not required. Because out does not require initialization, The called method cannot read the out parameter, and the value must be assigned before the method is returned.

Normal transfer value type

 Static void main (string [] ARGs) {var num = 120; Method1 (Num); console. writeline (Num); // output => 120} static void Method1 (INT num) {console. writeline (Num); num = 180 ;}

Everyone is familiar with this scenario,The value assignment in Method1 does not work.If you want to draw a picture, it is similar to the second figure above:

That is to say, the parameter isCopy the value in the stack to the num parameter of Method1Method1 operates on its own parameters and has no effect on the local variables of main, that is, it does not affect the data in the stack of the main method.

Normal transfer reference type

Static void main (string [] ARGs) {var user = new user (); User. Age = 15; method2 (User); debug. Assert (user! = NULL); console. writeline (user. Age); // output => 18} static void method2 (User user) {user. Age = 18; user = NULL ;}

Pay attention to the method2 code here. setting age to 18 affects the user of the main method, but setting user to null does not affect. To analyze this problem, we still need to look at the stack from the stack perspective. The stack diagram is as follows (the address is garbled ):

When you see the second figure, you should probably understand the fact:Regardless of the value type or reference type, normally passing parameters copies the values in the stack to the parameters. From the stack perspective, C # transmits parameters by value by default.

Since they are all "passing parameters by value", why does the reference type affect the performance of local variables in the method called, which is different from the value type? If you think about it, it is not difficult to find out the different performance.It is not caused by different parameter passing methods, but by different values and reference types of local variables (parameters) stored in the memory.The local variables user and the parameter user of method2 of the main method are stored in the stack respectively,The data (addresses, pointers, and references) in the stack do not affect each other, but they all point to the same instance object in the managed stack, while the user. age = 18 is the operation of the instance object in the managed heap, rather than the data (address, pointer, reference) in the stack ).Num = 180 is used to operate on the data in the stack, while user. Age = 18 is used to operate on the managed heap, which leads to different performances.

The user = NULL clause does not respond to the local variable of Main. It is easy to understand the third figure. User = NULL is different from user. Age = 18,User = NULL is to set the data (address, pointer, reference) in the stack to null.So it does not affect the main user.

For reference types, VAR user = NULL, VAR user = new user (), and user1 = user2 will affect the data (addresses, pointers, references) in the stack ), the first one will set null, the second will get a new data (address, pointer, reference), and the third is stack data replication, just like the arguments passed above.

Ref (out) passed Value Type

Static void main (string [] ARGs) {var num = 10; Method1 (Num); console. writeline (Num); // output => 10 method3 (ref num); console. writeline (Num); // output => 28} static void Method1 (INT num) {console. writeline (Num); num = 18;} static void method3 (ref int num) {console. writeline (Num); num = 28 ;}

The code is very simple, and the output should be clear and there is no difficulty. The use of ref seems simple and common, but C # actually does most of the work for us. If you draw a picture, the "Stack Chart" is as follows (the address is garbled ):

Many people should be confused when we see this figure. The method3 parameter clearly writes the int type num. How is it a pointer (address, reference) in the stack? This is actually C # "spoofing". Let's look at the Il decompilation:

We can see that the method parameters compiled by method3 with ref (out) are different. Let's take a look at the Il code of the parameter values in the method:

// This is the Method1 code // load Arg 0 => Read the index 0 parameter, which is directly a value of il_0001: ldarg.0 // This is the method3 code // load Arg 0 => Read the index 0 parameter, this is an address il_0001: ldarg.0 // load the int32 value at the preceding address as int32 to the stack. Il_0002: ldind. I4

As you can see, the parameter value is also obtained to writeline. Method1 only needs one command, while method3 requires two, that is, an additional step to find the value based on the address. It is easy to think that the assignment has the same difference:

 // Method1 // put 18 into the stack il_0008: LDC. i4.s 18 // store Arg => assign the value to the parameter variable numil_000a: starg. s num // method3 // load Arg 0 => Read the index 0 parameter, which is an address il_0009: ldarg.0 // put 28 into the stack il_000a: LDC. i4.s 28 // store the int32 value at the given address. Il_000c: STIND. I4

Yes, although it is also a value assignment statement such as num = 5 and there is no ref (out) keyword, what actually happens during running is different. The method with ref (out) is the same as the above value. There is an instruction for the given address and then to operate (here is the value assignment.

We should understand that,After a ref (out) is added to the parameter, the parameter is passed by reference. In this case, the stack address (pointer, reference) is passed. Otherwise, the stack data replication is normal.

Ref (out) Transfer reference type

What are the mysteries of referenced parameters with ref (out. It is certain that, from the stack perspective, there is no difference between the value type and the stack address.

In my opinion, it seems that adding REF (out) to the reference type is useless.

Summary

When considering these many concepts, we must first understand the basic principles of program execution, but only the process of stack growth and extinction. After understanding this process, you must learn to think about problems from the stack perspective, so many things will be solved. Why is it "value" type or "Reference" type? In fact, this "value" and "Reference" are taken into account from the stack perspective. In the stack, the value data is the value, and the reference type is only an address (pointer, reference ). Note that a variable can not only be a local variable (parameter), but also be a field Member of the type. After knowing this, "Where is the value type object stored ?" These questions should be clear. Finally, it is clear that C # transmits parameters by value by default, that is, assigning the data in the stack to the parameter, this is the same as assigning a variable to another variable of the same type in the same method. Why is this magic when REF (out) is added, in fact, C # has done more things and compiled them into different il codes.

Reference: CLR via C #

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More