The past, present, and future of the. NET Garbage Collector
Patrick dussud:
Patrick dussud has been working at Microsoft for 11 years and is responsible forGarbage Collector(Garbage Collector) is currently responsible. Net CLRThe design of the garbage collector. He is an architect of. Net CLR, chief architect of winfx, and a member of the Windows architect group.
Prior to Microsoft, Patrick was the principal designer of the Texas Instrument (Ti) explorer workstation system and chief architect of lucid energize products.
CHARLES: Okay. Today we are back at building 42, interviewed by Patrick dussud, the creator of the garbage collector. Patrick, how are you doing recently?
PATRICK: Good.
CHARLES: You have not been on Channel 9 yet. We have been trying to contact you for some time. What is the garbage collector? Starting from this basic point, what is the responsibility of the garbage collector?
PATRICK: The Garbage Collector automates user memory management. In the previous C ++, you must use "malloc" or "new" to allocate memory, and then release it when appropriate.
Memory. You must ensure that the memory is not used by others before it is released. If you give the memory to others, you are often not sure when to release the memory. When you release the memory, you don't know that someone else is using it.
The program crashes. Therefore, when you explicitly perform "new" and "delete", memory management is a complicated problem and your Code cannot be combined at this time. Either you must
Make sure you have full control over your memory. Therefore, to achieve this isolation, you must make a full copy when passing the memory to another module, other modules only copy the full memory.
Responsible. Either you have to form a unified management of the entire memory pool somewhere, which is automated memory management, which is the work of the garbage collector.
In essence, the garbage collector is responsible for tracking all the places where objects are referenced, paying attention to the situations where objects are no longer referenced, reclaiming the corresponding memory, and doing this in an efficient way.
Rate is even higher than the traditional "new" and "delete" categories. In fact, we try to surpass "new" and "delete" because the garbage collector provides us with a new opportunity, and you will not
Set limits on new opportunities. For example, you must know where each object is referenced, and you must determine whether each object is actually referenced. Once you do this, you will find that you can move objects,
Compress the memory space occupied by the object and move the object in the whole memory. Because you know that every reference to the object can be modified, you can modify all references. This is impossible in C ++. If we
In addition to the automation of "delete", we still manage the memory like "new" and "delete", and we will certainly be slower than "new" and "delete", because we only add additional
Sales. However, after the smart compression of memory space, we found that our speed can exceed "new" and "delete" because we can maintain extremely compact, thus forming cache localization, pages
Localization and other advantages, so the results are very good, especially for very difficult to manage the server memory. For example, for Server Heap space fragmentation or similar problems, we actually do
It is better than any previous attempt. Performance will not decrease over time, and we get stable memory management speed.
CHARLES: Interesting. We often hear people say, "I am willing to write unmanaged code, I don't want to write managed code, and I don't want my objects to be controlled by others ". Many C ++ Programmers think this way.
PATRICK: Yes, indeed. This is the "micro-management" of objects. This issue depends on experience and belief. When performing operations related to "Removing memory ",
The collector is the least worried. In this case, we need to deal with the Terminator and the destructor ., In C ++, "destructor are called when you perform Delete" is very definite. For us
In other words, the garbage collector pays attention to the extinction of objects. The Destructor is actually the call time of the terminator determined by the garbage collector. Many are surprised. In particular, we must note that
Which objects are used, because when you analyze several objects, the sequence of their destructor calls cannot be predetermined. It is possible that you will analyze the underlying objects before creating high-level objects. If
When a high-level object is destructed, You need to perform some additional work on the underlying object, which will fail because the underlying object has been destructed. Of course, the memory of the underlying object is still there. We pay attention to consistency in memory management, and the high-level object execution
You can access all the objects you want to access when you parse the code, but the status of these objects cannot be changed by the destructor. Be very careful here.
For example, you want to use a class hierarchy to implement the file system. The underlying class encapsulates the file handle of the operating system. When the file handle class is no longer referenced, you want to disable the Operating System statement in the destructor
To avoid resource leakage. Then we set up a high level. If you are a word processor, there will usually be several layers of objects, and all terminated operations are progressive, Because you often want to save the cached content first.
When the file is closed, All cached content is automatically written. This is not a good way to write a word processor. The correct method should be to explicitly close the file. For applications, the first step is
To distinguish between the side effects of objects and the lifecycle of objects. If there are other things that need to end as objects die, You should provide an explicit method. (For example) when you call this explicit
When the close method is used, everything is fine. However, if you forget to call close and your object has not been referenced, what should you do at this time? In essence, if your program cannot guarantee a higher level
Objects can be processed down to the file handle when the cache is cleared. If the file handle is closed first, the problem is obvious. We are faced with this problem. We use a simple method to solve this problem. We have
An object is called a key terminated object, which encapsulates widby (. NET
(2) The OS handle class is terminated. When a series of objects need to be terminated, the key terminated object is terminated, so that the file handle can be seen until the high-level completes the work. In
In general, we do not have a guarantee mechanism because we do not want to introduce complex object relationship graphs due to the termination call sequence. In general, there is no call sequence for terminated code. Our simple solution is just one.
To prevent the programmer from correctly handling the final side effects when the object is destroyed. In fact, in the debugging mode, many of our terminated code has a call, saying that if the garbage collection has started, and the program
The termination code is entered again. This is an error. We throw an error and the developer is responsible for modifying this error.
CHARLES: Very interesting. What is the semantics of the key terminated object? How do you define key termination objects?
PATRICK: we inherit from the key handle. These things are built in CLR. You can inherit from the key handle, but only system-level code has this requirement.
CHARLES: Let's talk about the history of the CLR garbage collector, for example, the first challenge you were facing ......
PATRICK: The History of the garbage collector is that I wrote a vast majority of the garbage collectors at Microsoft. The first product-level garbage collector we wrote is still in use, namely the Garbage Collector of JScript and VBScript.
At that time, we gathered four people and decided to use some weekends to create JScript, because we thought it was cool to use JScript for web page programming. A long time ago, I used Perl tools.
There is a debate, it is very explicit management of memory, the interpreter will generate new and delete as required. I think, "No, we have to introduce the garbage collector because the micronetwork is too costly.
"One of my friends said," Okay, let's write explicit management. You write the Garbage Collector and let's see who is doing well ." I didn't finish the task on time. My friend did it faster than I did because it is better to explicitly process the delete operation.
Implementation. Then we started to run the Code he wrote, but found that the code was too slow. He said, "Okay, I gave up. I don't think your code will be as slow as mine ." Then I complete the garbage collection.
And finally put it into the product. This garbage collector is very simple and conservative in programming. We do not know all references to the memory. If an integer seems like the address of an object, we will recognize
The object is still alive. We are very conservative and will not destroy all objects that can be destroyed, nor move objects in large quantities, because if an integer actually points to an object, we are not sure whether it is a pointer because it
It looks like an integer, so we do not dare to change the content of the integer, because maybe this is the price or something. This garbage collector is very limited and not complex.
Then, this group of friends started the development of Java Virtual Machine (JVM), Microsoft Java Virtual Machine. I wrote another Garbage Collector for this virtual machine. This garbage collector step
The garbage collector from JScript is also conservative. At that time, all JVM programs were conservative. Then, I consulted another friend outside Microsoft and discussed it together. "If we want
What should we do to be the best garbage collector in Windows ?" So we worked together, wrote some specification documents, and then I started to implement it. Interestingly, I use LISP to implement
Now, because at that time, LISP had the best debugging tool, and the protection was strong. For example, all arrays had a boundary check. We have a very good debugger. I wrote it in LISP, and then wrote it in lisp.
A jvm simulator, debugging, and writing a converter to automatically convert lisp code into C ++ code, which is the basis of the new JVM garbage collector.
CHARLES: is it a challenge for you to write a converter that converts LISP to C ++?
PATRICK: No, because I used to work in lisp. I used to work in Texas Instruments to develop Ti
Explorer. I have written a converter that converts a dialect of Lisp into a dialect called
Convert LISP to standard lisp. We converted all 3 million lines of system code, all of which were automatically converted, and then we abandoned the Old dialect. So I know how to do this job. When I write
When writing Lisp code, I am very careful to use only those functions that facilitate conversion to C ++. Therefore, conversions are very straightforward, because I have experience in writing Lisp converters.
CHARLES: of course, is the CLR garbage collector written in C ++?
PATRICK: Yes. When we move from JVM to CLR, I used some JVM garbage collectors as the basis and then made significant optimizations. From my point of view, write a good
The garbage collector is essentially a solid foundation that supports sound mechanisms. When you find some mechanisms that can work, you don't want to have too many changes on this mechanism, and the mechanisms must be orthogonal enough. If you
The architecture is good, you can gradually add a mechanism. On the surface, introduce what I call "policy. Policy determines under which circumstances the mechanism is used. Most of the speed and efficiency of the garbage collector come from
For policy adjustments. When the application uses a general mechanism, the garbage collector will automatically detect the increase in workload and then adjust it. Basically, we will adjust the application from a very ineffective collection mode to a more
In the collection mode. Year after year, we are all studying load conditions. If a load looks bad, we will ask, "Where is it? How can we improve the load ?" When we find the method, we
I know, ah, when this happens, these mechanisms should be used to make the load much better. Therefore, the policy will try to find these situations by observing the associated factors. We observe the collection frequency of all age generations.
Rate, we observe the fragmentation inside the memory, we observe the memory usage, we observe the internal records, study what inside the garbage collector should have been less time-consuming, however, it takes a lot of time under certain conditions. Me
Observe all these overhead and frequency. We can draw a conclusion from all this. Oh, this mechanism is actually not very useful. We thought we were trying to reuse the memory as much as possible.
Full garbage collection, but completely garbage collection is not found, so the next time the OS tells us that the memory is still too small, we 'd better not perform full garbage collection on the application again, because
Nothing happened between the last time and this time, and we still cannot get the benefit from full garbage collection. This is an example of dynamic adjustment. In fact, the garbage collector reflects our
Experience gained from in-depth observation by the department and its partners. We try to find the associated factors or make the application do well-we will try to reproduce these factors or make the application
Poor Program Performance-we will try to adjust the application to a more effective state.
CHARLES: Interesting. I want to ask a question: what defines whether an object is still alive? Let's talk about the object lifecycle and why developers do not need to explicitly point out the end of the object in a non-explicit environment like the garbage collector. This is exactly why the previous Code cannot be combined.
PATRICK: Let's start from scratch. How does one express having an object? We have local variables. Now we say "object I = new
Object, where "I" represents an object. This is an object source. The other source is static variables, which are more complex and less interesting. But in the same way, they are all handles. you can create your own
Handle. This is the main way that the execution engine (EE) owns objects. Obviously, the object will have other objects. This is the beginning of the tree chart. Essentially, we can think of a group of objects as a tree chart or a series of trees.
Graph, the root of these tree maps is either all the variables on your stack or all the static variables owned by your program. This is the original tree set. We call this a tree set. During collection
There are clearly defined protocols.
CHARLES: Is EE an execution engine?
PATRICK: Yes, it's CLR. When the garbage collection module decides to start collection, it calls to EE and requests to stop all threads so that it can check the thread stack. EE
According to this, all stacks are frozen. The garbage collector then tells ee that now you have to traverse all the stacks and static variables and then return the original tree set. One of EE's traversal modules is responsible for this. However
Then, CLR calls the Garbage Collector module with a tree each time. After receiving the tree, the garbage collector traverses the static data generated by the compiler, which tells us which offset of the object instance corresponds
Reference. We will check all reference locations one by one and perform recursive checks on each location. When the recursion process is exited, all trees that can be reached from the root in the tree graph are checked.
All locations are marked. We use many methods for marking. This process is not very interesting. In the end, we can determine whether an object can be reached by marking it. The basic idea is to keep some marks
Trace, holding an object, you can say it has been marked. We can write something or make an External table in a place that is not typically seen by others inside the object. Which method is used for both methods?
Method to view the efficiency under specific circumstances. By the way, the work code is not written recursively, because you may have a very long check chain, which may consume the stack space. We use the data stack to record
Reference of the object to be checked. Stack, check the items in it, and apply all references to the object to the stack until the stack becomes empty. Empty stack means that we have marked all objects that can be reached by this root. We
Repeat this operation on all local variables, the registers that store the reference, and static variables. Once completed, we marked every object that the program can reach with no omission. In this case, we can check objects one by one
Check the memory and find it marked. Okay, leave it. Not marked? Oh, we have a spam. At specific times, we will decide whether to compress all the garbage. This is the basic idea. What matters is what we call
For the "full garbage collection" operation, because we check all objects that can be reached by all the root. We also have a way to collect only the recently allocated objects, which we call the "0th generation" collection. At this time, garbage collection
Only the objects that are recently allocated are checked. Therefore, we also need to find a way to ensure that if older Objects Reference these new objects, we can know. We have a way to quickly find these special references.
Location. You do not need to traverse and search all objects.
CHARLES: now is a good time to explain the meaning of "age generation. For the Garbage Collector, this is the latest spam found by the garbage collector?
PATRICK: Yes, it's the latest generation. We call it "0th generations ". Generally, you will find a lot of garbage here. Its locality is also good, And the cache usually contains
If you are lucky, most of the newly created objects are in the cache, so the processing is fast and the compression is efficient. Therefore, when you process the created object
As if it is in the cache and has passed the life cycle, you will encounter the best situation. This is rarely ideal in actual situations, but it is the motivation for you to process new objects first. The policy engine strives to ensure the efficiency of this process. Ratio
For example, if we find that there is no garbage in the 0th generation, we will say, "Well, maybe we shouldn't collect it frequently, because this time we didn't find anything, it's a waste of time ". If we find a lot of garbage, we will
Said, "Hey, that's great. Let's try again later ." This is one of the ways the policy engine strives to ensure efficient operation.
CHARLES: A few years ago, there was a big debate about "deterministic termination". I once talked to a programmer in the C ++ development team about "deterministic termination ", hosting C ++ also now has a certain "deterministic termination ". Right? After all, C ++ has destructor.
PATRICK: C ++ is basically in a hybrid world. If objects are explicitly created and destroyed, they are not managed by the garbage collector. Therefore, they need to be "terminated with certainty ". This
Some objects are in their own world. Even if they are prefixed with "_ GC" and try to point out that they are hosted objects, the garbage collector cannot help too much. I used this question for nearly 6 months.
Time tries to provide an integrated solution. Finally, we spent some money and asked Chris
Sells helped us solve this problem. He used a very clever method. However, through measurement, he found that the loss of efficiency in the process of allocation of medium-intensity objects should be at least two benchmarks. So when garbage
When the collector has a great impact on the application, you will be at a cost of efficiency. However, at this point, we cannot force programmers. Our suggestion is: do not perform micro-management. In the end
We will call the terminator to solve the problem. The Spam manager considers the problem from the perspective of the entire memory and tries to make the entire process efficient, not limited to a specific part.
CHARLES: I understand. In a sense, this is a general management platform. But it is interesting that since this is a general platform, why can't I mark an object in managed code,
I want to manage this object myself. I will tell the spam collector when the object will end and the spam manager will be able to collect it? You mean that the spam manager scans the whole and collects various objects on its own.
PATRICK: Yes. If you tell the Garbage Collector, This is not safe, because you may pass the object to the program, but you do not know it. In this way, you may introduce a bug that causes the program to crash.