Garbage collection (garbage Collection, hereinafter referred to as GC) is a core part of some advanced development languages, and while all high-level languages are trying to avoid users ' concern for it, it is important to understand the GC for writing efficient applications. If you already know some of the basics of GC, this article will reveal some GC-related content when performance tuning is performed on Windows systems based on. NET applications.
When you decide to tune your program for performance, there are two things that are often the case:
Encounter serious problems such as outofmemory, high CPU and so on.
Program responsiveness decreased.
In general, there is a situation where you can see the value of some counters through performance counters (perfmon) to determine whether to tune, and which parts of the system to tune.
Either way, finding the symptoms is the first problem, and then the remedy is the case. 工欲善其事, its prerequisite, I recommend two of the most commonly used tools: performance counters and WinDbg. The vast majority of these tools can help us locate the problem. The third weapon is our brains-enough thinking before we start solving problems. This article focuses on exposing some of the GC's "bad habits" starting with performance counters.
Before we begin, let's take a look at the most basic concepts of GC:
The GC is for the managed heap (managed heap), which means that the GC in the stack is ignored. The GC is actually responsible for the death and death of objects on the managed heap. As a core module, GC execution efficiency cannot be overlooked, so the GC is handled separately for objects larger than 85,000 bytes, and placed in the large object heap (large object heap, hereinafter referred to as LOH). GC Generational (Generation) management, the. NET GC is divided into three generations, Gen0, Gen1, and Gen2, and the heap in which the three generations reside is compressed. The more tenacious the object, the easier it is to ascend from the lower generation to the higher. Gen2 Recycling is also known as full Collect, which is the most cost-performance. The Loh is not compressed and will not be reclaimed by itself.
The following three scenarios trigger the execution of the GC:
The allocation of memory exceeds the limit of Gen0 or Loh.
The System.GC.Collect () method is called.
Insufficient system memory resources.
The 1th case is the most common, and each time the GC is complete, Gen0 is empty, accepting the new assignment, only to Gen0 full, then the GC runs again. The 2nd scenario is usually something that our code should definitely avoid, which, by specification, only the BCL can invoke. Situation 3 is due to the fact that other processes are taking up too much memory.
Know when a GC will occur, but what is the process of GC execution? What is the result of the execution? Is the result of execution what we expect? Where do we find the answers to these questions? Performance counters are now in effect. We roughly analyze the role of several GC-related counters (Counter):
% Time in GC
This value is the percentage of time that has passed since the last GC ended to the current GC. For example, the last GC ended with 100 loops, and the current GC went through 50 loops, and the value of this counter was 50/100=50%. Look at the performance counters to speculate on what the problem is, there are two main types of cases, the first class needs to look at the counter to the trend of change, the second class needs to see the counter to the value. The concept of a "healthy value" is introduced here to the 2nd class situation. Generally speaking, if the value is greater than 50%, we should check the problem of the managed heap, if the value is below 20%, there is generally no need to optimize the program. If the system load is high or the memory pressure is large, it is possible to cause this value to rise.
Allocated Bytes/sec
If you think that you spend too much time on GC, then you should look at the allocated bytes/sec this counter. It shows the rate at which the GC allocates the managed heap. It is important to note that the value of this counter is inaccurate at a very low allocation rate, which is only updated at the beginning of each GC and is not easily explained if the sampling frequency of the performance counter (which is 1 seconds by default) is set to be greater than the frequency of the GC.
When the GC starts, it updates the counter to the value-adding Gen0 and Loh together with the value, then subtracting the last value, divided by the time interval. This is the rate of allocation.
For example: By default, performance counters update data once a second, and in the 1th second the Gen0 is triggered by the need to allocate 100k, so at the end of the 1th second this value is (100k-0k)/1sec, which is 100k/sec. In the 2nd Second No GC occurs, the value of the record is 100k, then the 2nd second end of the value is (100k-100k)/1sec, is 0k/sec, 3rd second Gen0 GC is triggered a total of 200k allocated, so at the end of 3rd seconds This value is (200k-100k)/ 1sec, is 100k/sec.
From the above to the example can be seen if the GC occurs is not very frequent, this value should be 0k/sec.
(line) attempts to monitor the value of the selected counter
Large Object Heap Size
This value records the size of the Loh.
# Gen X Collections
Here the values for X are 0, 1, and 2. This counter shows the number of times each generation has been recycled since the process began running. From the implementation of GC, in fact there is no Gen0 for the recovery of operations. One of the most important features of GC-to-generation operations is that the recycling of high generations will also be compared to all generations of its low-recycling operations.
GC is its soft afraid of hard "cartilage head", for small objects, it is the easiest to settle the object in the lower generation of the solution, the uneven will be sent to a higher level of generation, until the GEN2,GC become ferocious and fierce--of course, the program will pay a higher cost of performance losses. When is the GC most vulnerable? Is the time to face the big object, the GC for large objects become very afraid, not the heap compression does not promote, only to list them in a blacklist in the waiting for some of the Black Hand. But this feature of the GC gives us a performance boost in most cases. Gen2 a shot, will take the LOH with the GC hands, this feature is sometimes the beginning of the evil.
For this feature, we can clearly see Gen0 and Gen1 run very often will not affect too much performance, but Gen2 frequent let us feel more obvious, if Gen2 run very frequently, that program on pins and needles. In general, the ratio of Gen0, Gen1 and Gen2 to the number of recoveries is good in the 100:10:1 ratio.
Our small objects on the managed heap tend to have two situations, one is that the object is disposed of when the Gen0 is recycled, and we call it a "premature death" (Die Young), and one that is very tenacious but one to Gen2 immediately hangs, what we call "midlife crisis." (Mid-life crisis). For the former, it is a good thing, because the effect of Gen0 on performance can be negligible. But one of the things that can happen with frequent Gen2 recycling is that the Loh's recovery is also carried out with the recovery of Gen2-even though the Loh has a surplus space to allocate, but it suffers.
Another scenario is that we see no noticeable change in size after Gen2 recycling, which suggests that the recovery operation is basically done for the Loh. For Loh, the objects inside are not all larger than 85k, and there are some objects created in the. NET Runtime.
So, if we see that the GC consumes a lot of time, but its distribution rate is not high, the biggest possibility is that there are a lot of objects in constant ascension from Gen0 to Gen2. This scenario can often be confirmed by the value of the following counter:
Promoted Memory from Gen X
The value range of x is 0 and 1, which is used to reflect the elevation of the object between the lower generation and the higher generation. If a large number of Gen2 recoveries occur, the value of promoted Memory from Gen 1 will be higher. However, it should be noted that the object promotion caused by the finalizer (finalization) depends on the promoted finalization–memory from Gen0, and we should note that the name of this counter is: although Gen0, But it includes the Gen0 and the Gen1. If an Finalizable object survives, all the objects it references are also alive, and these objects are contained in the promoted Finalization–memory from Gen0 counter.
For Finalizable objects, which are added to a list, the GC monitors to determine when and how to handle it. So the final operation (finalize) The best operation is done as soon as possible, if one such operation needs to run for several hours, obviously not a good phenomenon, we should modify our code to avoid this situation. Because each time the GC runs, it depends on the final action of those objects, and if it is not running, wait for it to execute.
Gen X Heap Size
The range of values for X is also 0 and 1. When the value of the counter associated with the promotion is higher, you should look at it. This counter represents the size of Gen0 and Gen1. But what we need to know is that the value of gen 0 heap size is a fake, and he represents just a budget value. Gen0 and Gen1 are small, from 256K to a few trillion.
# Total committed Bytes and # total reserved Bytes
For memory-related data, from Task Manager to performance counters, there are several, from working set to commit size to total reserved bytes, which can be explained on MSDN for these names. Let's focus on a calculation formula here:
#Total committed Bytes=gen0 heap size+gen1 heap size+gen2 heap Size+loh size
and the latter value is larger than the former.
# induce GC
If you see this value is higher than the worse, should check whether the code calls Gc.collect () too much. As mentioned earlier, usually we should not call it directly.
For memory situations, memory fragmentation (fragmentation) is a topic that has to be said. We sometimes encounter a relatively sufficient amount of memory, but we report outofmemory (Oom problem), which is usually the ghost of memory fragmentation. In. NET 2.0, a major improvement over the GC in the 1.1 era has been to improve the ability to handle fragmentation. GC has always been to dispose of objects or continue to ascend, but the GC after 2.0 also has the function of demotion (demotion), which effectively prevents some special objects to the higher generation of the Ascension, at the same time, by increasing the existing Gen2 segment (Segment) reuse to reduce the situation of memory fragmentation.
The most important relationship with memory fragmentation is the nail object (pinned object, which can be understood as objects that are pinned to a location in memory that cannot be moved by the GC). For example, when our program performs asynchronous network I/O operations, buffering (buffer) is pinned (pinned) until the entire operation is complete. This process allocates a space on the GC heap, the space cannot be moved, and there are other contents of the identity class in the space, and the memory in front of it is too small to be used to allocate to other objects, resulting in memory fragmentation. Why is there such an inconvenient nail object? In fact, it is a well-designed Microsoft, with its help, managed code and unmanaged code can better interact. So when we're dealing with nail objects, be sure to release the "pinned" condition as soon as possible (unpin), or we'll go to pin an object in the Loh (to be sure that the Loh is not often "harassed"), and of course, if you're sure Gen2 won't do it frequently, You can also go to pin an object in the Gen2. Like the fixed keyword using C #, the GC. KeepAlive are all capable of producing such nail objects. But this problem-solving approach is not rigid, only a good set of solutions, because Microsoft is also busy preparing for the GC optimization work on this issue.
(Monitoring of # of pinned objects counters)
Encountered the problem, after reading the performance counters, must go to spend time to think. For example, our program in the load gradually increased after the discovery of high CPU, do not hurry to copy the tool to catch the dump and then come to debug. The more you think about it, the more useful it will be to narrow down the scope of the problem.
Before my development group was responsible for an ASP. NET application in the case of 650-800 Requests/sec encountered a higher CPU, after a careful observation of some values of the counter, found that the program throws a high number of exceptions per second, around 200-300, But our project has an exception record component, and there is no way to find that so many exceptions are thrown. Then make sure that some of the methods that are called by the program itself throw an exception and catch it, such as the Response.End () method throws a thread abort exception, but it is already captured within the method. For such an exception, the # of Excepts thrown/sec counter is also dedicated to record it faithfully. Looking back on the knowledge of the anomaly, the exception is cost-performance, the GC's handling of the exception caused a large CPU fluctuation, and then check the code, found a method such as the invocation of such a feature, replace with other alternatives, the problem has been resolved.
The garbage collection mechanism has a long history and profound, and it is not enough to write a heavy book on the GC in. NET alone. This article shares some of the author's knowledge and experience of GC and related performance optimizations, such as the relevant strong references, weak references, different patterns provided by the GC for different CPUs, what is root (root), what is the hoard feature, etc. A friend who wants to engage in. NET development can learn more about its nature and unleash the full potential of. NET's sharp weapons.