A major feature of the Java language is the ability to automate garbage collection without the developer focusing too much on system resources, such as the release of memory resources. Automatic garbage collection, while greatly reducing the workload of developers, also increases the burden on software systems.
Owning a garbage collector can be said to be a significant difference between the Java language and the C + + language. In the C + + language, programmers must handle each memory allocation with care, and must manually release the memory space that was used after the memory is exhausted. When the memory is not fully released, a memory block that is allocated but never freed will cause a memory leak and, in severe cases, cause the program to crash.
The following lists the common algorithms and experimental principles of the garbage collector:
Reference counting method (Reference counting)
Reference counters are used in Microsoft's COM component technology, Adobe's ACTIONSCRIPT3 species.
The implementation of the reference counter is simple, for an object A, as long as any one object refers to a, then the reference counter of A is incremented by 1, and when the reference is invalidated, the reference counter is reduced by 1. Object A can no longer be used as long as the value of the reference counter for object A is 0.
The implementation of reference counters is also very simple, just to configure a single shaped counter for each object. However, the reference counter has a serious problem, that is, the case of a circular reference cannot be handled. Therefore, this algorithm is not used in the Java garbage collector.
A simple circular reference problem is described as follows : There is Object A and object B, object A contains a reference to object B, and object B contains a reference to object A. At this point, the reference counters for object A and object B are not 0. However, there is no 3rd object in the system that references a or B. That is, a and B are garbage objects that should be recycled, but because of the mutual reference between the garbage objects, which makes the garbage collector unrecognized, causing a memory leak.
tag-purge algorithm (mark-sweep)
the tag-purge algorithm divides garbage collection into two phases: the tagging phase and the purge phase. a feasible implementation is to mark all large objects starting from the root node in the tagging phase first through the root node. Therefore, an object that is not marked is a garbage object that is not referenced. Then, in the purge phase, all unmarked objects are cleared. The biggest problem with this algorithm is that there is a lot of space debris, because the reclaimed space is discontinuous. In the process of object heap allocation, especially the memory allocation of large objects, the productivity of discontinuous memory space is lower than that of continuous space.
replication Algorithm (Copying)
Divide the existing memory space two times, use only one piece at a time, copy the surviving object in use in memory to the unused block of memory at garbage collection, then clear all objects in the memory block in use, swap the two memory roles, and complete the garbage collection.
If there are many garbage objects in the system, the number of surviving objects that the replication algorithm needs to replicate is not too large. As a result, the efficiency of the replication algorithm is high at a time when garbage collection is really needed. And because objects are uniformly copied to the new memory space during garbage collection, you can ensure that the reclaimed memory space is not fragmented. The disadvantage of this algorithm is to put the passbook in the system in half.
Java's new generation of serial garbage collector uses the idea of a copy algorithm. The Cenozoic is divided into 3 parts of Eden Space, from space, to space. The From space and to space can be considered as two blocks of the same size, equal status, and can be used for character interchange. The From and to spaces are also known as Survivor Spaces, which are survivor spaces that are used to store objects that have not been reclaimed.
During garbage collection, the surviving objects in the Eden space are copied into unused survivor space (assuming to), and young objects in the survivor space being used (assuming from) are also copied into the to space (large objects, or older objects that go directly into the old age band, such as When the fruit to space is full, the object will go straight into the old age. At this point, the remaining objects in the Eden space and from space are garbage objects, which can be emptied directly, and the to space will hold the surviving objects after this collection. This improved replication algorithm not only guarantees the continuity of space, but also avoids a lot of wasted memory space.
tag-compression algorithm (mark-compact)
The efficiency of the replication algorithm is based on the premise that there are fewer surviving objects and more garbage objects. This happens often in younger generations, but more commonly in older times, most objects are living objects. If the replication algorithm is still used, the cost of replication will be high due to the large number of surviving objects.
Tag-compression algorithm is an old-age recovery algorithm, which has been optimized on the basis of the mark-sweep algorithm. It is also necessary to first mark all the objects that can be reached from the root node, but after that it does not simply clean up unmarked objects, but instead compresses all the surviving objects to one end of the memory. After that, clean up all the space outside the boundary. This method avoids the production of fragments, and does not require two blocks of the same memory space, therefore, it is more cost-effective.
Incremental Algorithm (Incremental collecting)
During the garbage collection process, the application software will be in a state of high CPU consumption. In this state of high CPU consumption, all threads of the application hang, pausing all normal work, waiting for the garbage collection to complete. If the garbage collection takes too long, the application will be suspended for a long time and will severely affect the user experience or the stability of the system.
The basic idea of the incremental algorithm is that if all of the garbage is processed at once, and the system needs to be paused for a long time, then the garbage collection thread and the application thread can be executed alternately. Each time, the garbage collection thread collects only a small area of memory space, and then switches to the application thread. Repeat until garbage collection is complete. in this way, due to the intermittent execution of the application code during the garbage collection process, the system's pause time can be reduced. However, because of the consumption of thread switching and context conversion, the overall cost of garbage collection increases, resulting in decreased system throughput.
Sub-generational (generational collecting)
According to the characteristics of garbage collection objects, the optimal way of different stages is to use the appropriate algorithm for garbage collection in this stage, the generational algorithm is based on this idea, it divides the memory interval according to the characteristics of the object, and uses different recovery algorithms to improve the efficiency of garbage collection according to the characteristics of each block. In the hot Spot virtual machine, for example, it puts all new objects into a memory area called the young generation, which is characterized by the fast recovery of objects, so that the younger generation chooses a more efficient replication algorithm. When an object is still alive after several recoveries, the object is put into a memory space called the Laosheng generation. In the Laosheng generation, almost all of the objects survived after several garbage collections. As a result, these objects can be thought of as resident memory for a period of time, even throughout the application's life cycle. If the Laosheng generation is still recycled using the replication algorithm, a large number of objects will need to be copied. Coupled with the Laosheng generation of recovery cost is lower than the new generation, so this practice is also undesirable. According to the idea of generational, the old age can be recycled using a different marker-compression algorithm to improve the efficiency of garbage collection.
The garbage collector can be divided into different types from different perspectives.
1. By number of threads, it can be divided into serial garbage collector and parallel garbage collector. The serial garbage collector uses only one thread at a time for garbage collection, and the parallel garbage collector turns on multiple threads at the same time for garbage collection. Using a parallel garbage collector on a CPU with strong parallelism can shorten the pause time of GC.
2. According to the mode of operation, it can be divided into concurrent garbage collector and exclusive garbage collector. The concurrent garbage collector works alternately with the application thread to minimize application downtime; Once the exclusive garbage collector (Stop the World) runs, it stops all other threads in the application until the garbage collection process completely finishes.
3. Fragmentation can be divided into compressed garbage collector and non-compressed garbage collector. The compressed garbage collector will compress the surviving objects after the recycle is complete, eliminating the recovered fragments, and the non-compressed garbage collector does not do this step.
4. According to the working memory range, can be divided into the new generation of garbage collector and the old garbage collector.
You can use the following indicators to evaluate the quality of a garbage processor.
Throughput: The ratio of the time spent by an application to the total elapsed time of the system during the lifetime of the application. Total system uptime = Application time +GC time consuming. If the system is running 100MIN,GC time consuming 1min, then the system throughput is (100-1)/100=99%.
Garbage collector load: In contrast to throughput, the garbage collector payload refers to the ratio of the time spent by the collector to the total system uptime.
Pause time: The pause time of the application when the garbage collector is running. For an exclusive collector, the pause time may be longer. When using a concurrent collector, the program's pause time is shortened because the garbage collector and the application are running alternately, but the throughput of the system may be lower because it is probably less efficient than an exclusive garbage collector.
Garbage collection frequency: Refers to how long the garbage collector runs. In general, for fixed applications, the garbage collector should be as low as possible. Generally, increasing heap space can effectively reduce the frequency of garbage collection, but it may increase the amount of downtime that is generated by recycling.
Reaction time: The amount of memory space that is occupied by an object when it is called Garbage is released.
Heap allocation: Different garbage collector allocations to heap memory may be different. A good garbage collector should have a reasonable breakdown of heap memory intervals.
How the Java garbage collector works