Drill down into C # memory management to analyze the difference between value types & reference types, boxing & unboxing, stacks of several concept combinations
-c# beginners are often asked a few questions, value type and reference type, boxing and unpacking, stack, the concept of a combination of the different, read this article should be able to dispel doubts.
As the saying goes, the idea of programming is a literary program ape, with experience programming is the ordinary program ape, with copy and paste programming is 2B program APE, open a joke ^_^.
Believe that there is a C # interview experience of the people, the following sentence must not be unfamiliar:
A value type stores its value directly, the reference type stores a reference to the value, the value type exists on the stack, the reference type is stored on the managed heap, the value type is referred to the type called boxing, and the reference type to the value type is called unboxing.
But it's not enough to just recite this sentence.
C # Programmers don't have to manage memory manually, but to write efficient code, they still need to understand what's happening behind the scenes.
In school when the teacher most often say the word is: the concept is unclear. The simplest example, I have memorized all the calculus formula, encountered problems on the set of formulas, but there will be no solution, because I do not know how the formula is deduced, the basic principle is not clear.
(Someone died to keep us alive, someone died, and no one lived well: Newton and Leibniz =.) =).
It's a little far away. Now let's talk about how the C # stack and the managed heap work, and drill down into memory to understand the basic concepts of C #.
The concept of stack and heap in different fields
In the C + + :
Stack is called the stack area, which is automatically allocated by the compiler, storing the parameter value of the function, the value of the local variable, etc.
The heap is called a heap, which is released by the programmer and may be reclaimed by the OS at the end of the program if the programmer does not release it.
And in C # :
stack refers to stacks,heap refers to the managed heap, different languages are different, the concept of a slight difference. (If there is an error here, please correct it).
The most obvious thing to know here is that in the language, stack and heap refer to an area in memory, which is different from the stack in the data structure (the last-in-first-out linear table), and the heap (a sort of two-fork tree).
Before you speak a concept, you must first explain the context in which it is located.
If not specifically stated, the stack referred to in this article is stack , and the managed heap refers to the heap.
Second, how the C # stack works
Windwos uses a virtual addressing system that maps the memory addresses available to the program to the actual address in hardware memory, which can use 4GB of memory per process on a 32-bit processor-no matter how much hard disk space is on the computer (this number is larger on a 64-bit processor). This 4GB memory contains all the parts of the program-executable code, loaded DLLs, all variables. This 4GB of memory is called virtual memory.
Each of the 4GB storage units starts at 0. The value to access memory for a space store. You need to provide the number of the storage unit. In high-level languages, the compiler translates the names we can understand into memory addresses that the processor can understand.
in virtual memory for a process, there is a region called a stack , which is used to store value types . In addition, when a method is called, the stack copy is used to pass all parameters to the method.
Let's take a look at the scope of variables in C #, and if variable a goes into scope before variable B, then B is out of scope first. Look at the following example:
{ int A; // Do something { int b; // Do something }}
After a is declared, B is declared in the internal code block, then the inner code block terminates, B is scoped, and a is out of scope. When you release variables, the order is always the opposite of allocating memory to them, LIFO, is not what makes you think of the data structure in the stack (Lifo--last in first out). That's how the stack works.
We don't know where the stack is in the address space, but C # development doesn't need to know that.
A stack pointer , a variable maintained by the operating system, pointing to the address of the next free space in the stack. The first time the program runs, the stack pointer points to the end of the block of memory reserved for the stack.
The stack is padded down, which is populated from a high address to a low address . When the data is in the stack, the stack pointer is adjusted to point to the next free space. Let's give an example.
, stack pointer 800000, and the next free space is 799999. The following code tells the compiler that some storage units are required to store an integer and a double-precision floating-point number.
{ int a=1; Double 1.1 ; // Do Something}
Both are value types, which are naturally stored in the stack. After declaring a is assigned a value of 1, a goes into scope. An int type requires 4 bytes, and a is stored on the 799996~799999. At this point, the stack pointer is reduced by 4, pointing to the end of the new used space 799996, and the next free space is 799995. After the next line declares B to be assigned a value of 1.1, the double needs to occupy 8 bytes, so it is stored on 799988~799995 and the stack pointer minus 8.
When B is out of scope, the computer knows that the variable is no longer needed. The lifetime of a variable is always nested, and when B is in scope, whatever happens, it is guaranteed that the stack pointer will always point to the space where B is stored.
When you delete this b variable, the stack pointer increments by 8 and now points to the space where B used to be, where the closing curly braces are placed. Then A is also out of scope, and the stack pointer increments by 4.
At this point, if a new variable is placed, the storage unit starting at 799999 will be overwritten.
Second, how the managed heap works
The stack has a high level of performance, but requires that the life cycle of the variable must be nested (last-in, first-out), and in many cases, this requirement is excessive ... Usually we want to use a method to allocate memory, to store some data, and the data is still available for a long time after the method exits. This possibility exists for requesting space with the new operator-for example, all reference types. This is where the managed heap will be used.
If crossing has written C + + code that needs to manage low-level memory, it willbe familiar with the heap, which is different from the heap used by C + +, which works under the control of the garbage collector and has a significant performance advantage over the traditional heap .
The managed heap is another area where processes can be 4GB, and we use an example to understand how the managed heap works and to allocate memory for reference data types. Suppose we have a Customer class.
1 void dosomething () 2 {3 Customer John; 4 New Customer ();
5 }
The third line of code declares a customer reference, John, that allocates storage space on the stack for the reference, but this is only a reference, not the actual customer object. The John reference contains the address where the Customer object is stored-requires 4 bytes to store the address between 0~4GB as an integer-so that the John reference accounts for 4 bytes.
Line four first allocates memory on the managed heap to store the customer instance, and then sets the value of the variable John to the memory address assigned to the Customer object.
The customer is a reference type and is therefore placed in the managed heap of memory. For ease of discussion, assume that the Customer object occupies 32 bytes, including its instance fields and. NET is used to identify and manage some of the information of its class instances. In order to find a storage location in the managed heap where the new customer object is stored. The NET runtime searches the heap for a contiguous, unused 32-byte space, assuming its starting address is 200000.
John refers to the 799996~799999 position of the stack. The memory should be this way before instantiating the John object.
The memory content after allocating space to the Customer object. This is different from the stack, where the memory on the heap is allocated upward, and all free space is above the used space.
As you can see from the example above, it is much more complicated to recommend referencing a variable than to establish a value variable, and it is not possible to avoid performance degradation. The net runtime needs to keep the information state of the heap, and the information needs to be updated when new data is added to the heap (this is mentioned in the garbage collection mechanism of the heap). Despite the performance penalty, there is a mechanism for allocating memory to a variable without being constrained by the stack:
Assigning the value of a reference variable A to another variable B of the same type, both reference variables refer to the same object. When variable B is out of scope, it is deleted by the stack, but the object it refers to remains on the heap because there is also a variable a referencing the object. Only if the object's data is no longer referenced by any variable will it be deleted.
This is the power of the reference data type, we can control the life cycle of the data autonomously , as long as there is a reference to the data, the data must be stored on the heap.
Third, garbage collection of the managed heap
Objects that are no longer referenced in the heap are deleted when the object is no longer referenced. If this is the case, the free space on the heap will spread over time, and allocating memory to the new object will be difficult to handle. The net runtime must search the entire heap to find a chunk of memory large enough to store the entire new object.
But when the garbage collector of the managed heap runs, it compresses other objects as long as it releases the objects that can be freed, pushing them all to the top of the heap to form a contiguous block. When moving an object, you need to update the address of all object references, and there will be a performance penalty. But with the managed heap, you just need to read the value of the heap pointer instead of searching the entire list of linked addresses to find a place to place the new data.
So in. NET instantiates objects much faster because objects are compressed into the same memory area of the heap, and fewer pages are exchanged when accessing objects. Microsoft believes that while the garbage collector needs to do some work to modify all of the object references it moves, it can cause performance degradation, but this will compensate for performance.
Four, box packing and unpacking
With the above knowledge to do the groundwork, look at the following section of code
int 1 object o = i; int j = (int) o; Unpacking
int I=1; Allocates a 4-byte space in the stack to store the variable i.
Object o=i;
Boxing Process : First allocates a 4-byte space in the stack to store the reference variable O,
It then allocates a certain amount of space in the managed heap to store the copy of I, which is slightly larger than I, a method table pointer and a syncblockindex, and returns the memory address.
Finally, assigning this address to the variable o,o is a reference to the object. No matter how the value of O changes, the value of I will not change, instead of the value of your I change, o will not change, because they are stored in different places.
int J=int (o);
Unpacking process: In the stack allocation 4 bytes of space to save the variable j, copy the value of the O instance to the memory of J, that is, to assign the value to J.
Note that only boxed objects can be disassembled, and when O is not boxed int, an exception is thrown if the above code is executed.
Here is a warning that the unboxing must be very careful to ensure that the value variable has enough space to store the value that was obtained after unpacking.
Long 999999999 object b =int c = (int) b;
C#int has only 32 bits, and a invalidcastexecption exception is generated if the long value of the 64 bit is unboxing to int.
---------------------------------------------------------------I'm a split line----------------------------------------------------- ---------
The above for personal understanding, if there is any problem, please correct me. I hope this is helpful to you crossing understand some basic concepts.
According to the tips of the Dragon Cat classmate, found an interesting phenomenon. I look at the following piece of code, assuming we have a member class with name and num in the field:
Member Member1 =NewMember {Name ="Marry", Num ="001" }; Member Member2=Member1;member1. Name="John";Console.WriteLine ("Member1. Name={0} member2. Name={1}", Member1. Name,member2. Name);inti =1;Objecto =i;ObjectO2 =O;o=2; Console.WriteLine ("o={0} o2={1}", O, O2);stringSTR1 ="Hello";stringSTR2 =STR1;STR1="hello,world! "; Console.WriteLine ("str1={0} str2={1}", str1, str2); Console.readkey ();
According to our previous theory, Member1 and member2 refer to the same object in the heap, modify one, and the other will inevitably change.
So first the output should be member1. Name=John member2. Name=John There is no doubt about it.
What happens if the object and string are the only two reference types predefined by C #?
By inference, the expected result would be o=2 o2=2 and str1=hello,world! str2=hello,world! . Run, OMG, wrong.
The result iso=2 o2=1str1=hello,world! Str2=< Span style= "color: #800000;" >hello
The explanation for this phenomenon is that (as explained in the link given by the Dragon Cat) the string type is special because the amount of space it occupies in the heap is determined at the beginning of a string variable being created.
Modify a string variable, such asstr1="hello,world!", you must reallocate the appropriate space to store the larger data (as it is smaller), creating new objects and updating the str1 stored address to point to the new object.
< Span style= "color: #000000;" > So str2 still points to the previous object. STR1 points to the newly created object, which is already a reference to a different object.
< Span style= "color: #000000;" > As to why object is so, I know what to say. Maybe it's a virtue because it's two of the default reference types ^_^
< Span style= "color: #000000;" > Thank you _ Dragon Cat classmate. Otherwise I wouldn't have noticed that either.
! Come back, actually ha, object and string are indeed a virtue. object is a base class that can bind all types. Like give him a first.
int i=1;object o=i;
It is clear that the object referenced by O has a size of more than 4 bytes on the heap (and. NET is used to identify and manage some of the information of its class instances: A method table pointer and a syncblockindex, assuming 6 bytes.
What if I bind a long type to O now?
o= (long)100000000;
If you just fill the data into the original memory space, this 6-byte small temple may not tolerate more than the 8-byte Buddha.
You can only reassign new space to save the new object.
A string and an object are two immutable types once initialized. (See C # Advanced programming). The so-called immutable, including the size in memory is immutable. Once the size is fixed, the methods and operators that modify its contents are actually creating a new object and allocating new memory space because the previous size might not be appropriate. At its root, this is an overload of the ' = ' operator.
Drill down into C # memory management to analyze the difference between value types & reference types, boxing & unboxing, stacks of several concept combinations