Terminology Explanation:
1. Garbage
The so-called rubbish (garbage) is the object that needs to be recycled. As a writer, you can judge that "this variable is no longer needed," but the computer can't. Therefore, if the program refers directly or indirectly to an object, then the object is considered "alive", whereas an object that has not been referenced is considered "dead". It is the nature of the GC to find these "dead" objects and then recycle them as garbage.
2. Root
The so-called root, is to determine whether the object can be referenced starting point, as to where is the root, different languages and compilers have different rules, but the basic is the variable and run stack space as the root.
The main GC algorithms include: Mark removal mode, replication collection method, reference counting method.
1. Marking clearance (Mark and Sweep) was the first developed GC algorithm (1960). The principle is very simple, starting from the root to mark the objects that might be referenced recursively, and then reclaim the objects that are not marked as garbage.
The initial state in which an object can refer to other objects.
Mark phase, the object has an inner flag that marks itself as being labeled, and we paint it black. The marked object is treated as a "live" object. The tag starts at the root of the object tree.
The purge phase, when garbage collection is done, starts at the root, scans all objects sequentially, and reclaims objects that are not marked.
The processing time of the Mark elimination algorithm is related to the combination of the surviving object and the total number of objects.
As a mark of the elimination of the deformation, there is also called the tag compression (Mark and compact) algorithm, he is not to mark the object clearly, but to keep them compressed.
A disadvantage of the tag cleanup algorithm is that when a large number of objects are allocated, and only a small portion of them survives, the time consumed is much more than necessary, because a large number of dead objects are scanned during the cleanup phase.
2. Copy collection (copy and Collection)
The replication collection algorithm attempts to overcome the defect of the tag algorithm. In this algorithm, the object being referenced from the root is copied into another space, and then the objects that the replicated object can refer to are replicated in a recursive way.
By copying all objects that are not dead to the new space and then discarding the old space, you can release all the space that the dead object occupies, without the need to scan each object again. Next time the GC, the new space will become the future of the old space.
Based on my personal experience, the Java GC takes the form of tag cleanup and replication collection. The OBJECT-C GC uses another GC:
3. Reference counting method
The reference count (Reference count) is the simplest and easiest to implement in the GC algorithm, and it is almost always invented at the same time as the marking clear way.
The rationale is to keep the reference count of the object in each object and update the reference count when the reference occurs or decreases.
Reference count increases or decreases, generally occurs when the variable assignment, object content Update, function end (local variables are no longer referenced), and so on point of time. When the amount reference count of an object changes to 0 o'clock, it will not be referenced in the future, so it can free up the corresponding memory space.
1 1 1
A---->B----->d
\ | /|
\ | / |
\ 2/| 1
C E
Suppose that all the objects in the above diagram hold the number of references that they have been referenced by multiple other objects (reference count), and the number in the upper-right corner of each object is the reference count.
When a reference to an object changes, the reference count changes as well. We let object B's reference to object D fail, so the reference count of object D becomes 0, so the reference count of object D to C is 0. Because the reference count for object D becomes 0, the reference count from Object D to objects C and E is reduced accordingly. As a result, the reference count for object E also becomes 0, so the object E is also released.
1 1 Expiration 0
A---->b_____>d
\ | /|
\ | / |
\ |1/|
C E
Objects that have a reference count of 0 are freed, and the "live" object is preserved.
1 1
A----->b
\ |
\ |
\ 1
C
Implementation is easily the best advantage of the reference counting algorithm. Mark Clear and Copy collection These GC mechanisms are difficult to implement, and in the case of reference counting, the C + + programmers (including me) in some years should have implemented similar mechanisms, and it can be said that the algorithm is quite universal.
In addition, it is also an advantage to be released when an object is not being referenced for an instant. In other GC mechanisms, it is difficult to predict that an object will fit, but it is immediately released in the reference counting mode. Also, because the release operation is performed on a per-object level, it is also an advantage to have a shorter interruption time (Pause times) due to the GC than other algorithms.
Disadvantages of reference counting methods:
The biggest disadvantage of reference counting is the inability to free the object of a circular reference.
1
A
^ \
/ \
1/'/1
B-------->c
In the above illustration, a, B, and C three objects are not referenced by other objects, but are circular references, so their reference count will never be 0, and the objects will never be released.
The second disadvantage of reference counting is that you must make correct additions and deletions to the reference count when the reference occurs, and if you omit a certain increase or decrease, you will get a memory error that is hard to find out. The reference count forgets to increase, releasing the improper object, and the reference count forgets that the object remains in memory, causing memory leaks. If the language compiler itself manages the reference count, it will be a breeding ground for bugs if you manage the reference count manually.
The last drawback is that reference counting is not appropriate for parallel processing. In order to avoid multiple threads in a multithreaded environment to operate the reference count at the same time, the operation of reference counting must be carried out in an exclusive way, and if the reference counting operation occurs frequently, it is necessary to use the concurrency control mechanism such as lock and so on, and its overhead is not to be underestimated.
The basic algorithm of GC, in general, can not escape the above three kinds of methods and their derivatives. Now, by merging these three approaches, there are some more advanced approaches. The most representative are: generational, incremental, and parallel.
1. Generational recycling (generational GC)
Because the nature of GC and program processing is irrelevant, the shorter the time it consumes, the better. The purpose of generational recycling is to shorten the time that the GC consumes as the program runs.
The basic idea of recycling is to use the nature of the general procedures, that is, most of the objects will be garbage in a short time, and after a certain period of time still surviving objects are often used for even longer life. If an object with a long lifespan and a short life span is quickly discarded, what can be done to make the GC more efficient? If the "young" target of short birth time is scanned, it should be more efficient to recycle most of the rubbish.
In a generational recovery, objects are divided by generation time, and the young objects that have just been generated are generation, while those who have survived for a longer period are classified as old generation. Depending on how the implementation is implemented, more generations may be divided. Then according to the general nature of the program (that is, most objects will be garbage in a short time, and after a certain period of time still surviving objects tend to have a longer life), so long as only scanning the new generation of objects, you can recycle most of the discarded objects part.
This approach is called a small collection (Minor GC), referencing the Java garbage collection mechanism: http://www.blogjava.net/ldwblog/archive/2013/07/24/401919.html
Small recycle: Start a regular scan from the root and find the live object. This step is often marked with a clear or replicated collection algorithm, but most of the generational collection uses a replication-gathering algorithm. If an object belonging to an intergenerational generation is encountered during the scan, the object will not continue to be recursively scanned. As a result, the objects that need to be scanned are drastically reduced.
Then, the remaining objects after the first scan are divided into the generations. In particular, if you are using the Replication collection algorithm, as long as the replication target space set to the new generation of the can, and the mark elimination algorithm, most of the use of the object to set some kind of logo on the way.
Record references from the generations:
According to the previous way, from the former generation of objects to the new generation of reference how to do it? If you refer to a new generation region, the references to the Cenozoic from the new generation will not be detected. If a young object is only quoted from the old generation, it will be mistaken for "death". Therefore, in the generational collection, updates to the objects are monitored, and the references to the new generation are recorded in a table called a Recordset (remembered set). In the process of performing a small collection, the Recordset is also used as a root object.
In the future let the generational recycling work correctly, you must make the contents of the recordset newspaper updated. For this reason, the reference must be recorded in the instant that the new generation's reference is made, and the subroutine responsible for performing the operation needs to be embedded in the update operation of all the design objects.
This subroutine that is responsible for recording references works like this: With two objects A and B, when the content of a is rewritten and a reference to B is added, if a is a part of an aged generation object, and B belongs to a Cenozoic object, the reference is added to the recordset.
This type of inspection requires protection of all areas involved in modifying the content of the object, and is therefore a write barrier (writes barrier). Write barriers are used not only for generational recycling, but also for many other GC algorithms.
In the running process, the generation of the object will also die. In order to avoid the "death" object in the generation area, it is necessary to take a few seconds to reclaim the whole area including the generation region. To do this, a GC operation with all areas as an object becomes a complete recycle (full GC) or a large collection (Major GC).
The performance of the generational recovery will be greatly related to the behavior of the program, the number of generations, the trigger condition of the large recovery and so on.
2. Incremental recycling
In programs with high real-time requirements, it is often more important to shorten the maximum outage time of GC than to shorten the average interruption time of GC. For example, in a robotic posture control program, a robot may fall if the control is interrupted by a GC for 0.1 seconds. or the vehicle control program because the GC and delay the corresponding flowers, the consequences are disastrous.
In these programs that require high real-time performance, you must be able to predict the disruption that the GC produces. For example, you can break up to 10 milliseconds as an additional condition.
In the general GC algorithm, the interruption time produced by GC is related to the number and state of the object. Therefore, in order to maintain the real-time nature of the program, it is not equal to the GC complete, but the GC operation is divided into several parts to execute. This approach is becoming an incremental recycle (incremental GC).
In an incremental recovery, the GC process is gradual, and in retrospect the program itself continues to run, and the reference Guanxu between objects may change. If the object that has been scanned and tagged has been modified, and a reference is made to the new object, the new object will not be labeled, and the "surviving" object is being recycled.
To avoid such problems, write barriers are used as well as generational recycling. When the reference relationship of an object that has been marked changes, a write barrier records the newly referenced object as the starting point of the scan.
Because the process of incremental recycling is distributed incrementally, the interruption time can be controlled within a long queue. On the other hand, because the terminal operation consumes a certain amount of time, the total time consumed by GC will increase correspondingly.
3. Parallel recycling
On a recent computer, multi-core processors with multiple CPU cores on one chip have become increasingly popular. For example, the Core i7 has 6 cores and 12 threads.
In such an environment, you need to maximize CPU performance by leveraging threads. Parallel recycling is a way of making GC operations by maximizing CPU processing power.
The basic principle of parallel recycling is that the colleagues in the original program run the GC operation, which is similar to the incremental recycle. However, in contrast to the incremental collection of GC task partitions on one CPU, parallel recycling can take advantage of the performance of multiple CPUs and allow these GC tasks to be carried out in parallel as much as possible.
Because software runs and GC operations are performed by colleagues, they encounter the same problem as incremental recycling. In order to solve this problem, the parallel recycle also needs to use the write barrier to keep the current state information updated. However, the GC operation is completely parallel, and it cannot be done without affecting the operation of the original program. Therefore, at some specific stages of GC operations, it is still necessary to interrupt the operation of the original program.
GC Unification Theory: David F.bacon of the IBM Watson Research Center in the United States believes that any GC algorithm is a combination of two approaches to recycling and reference counting. For example, the new barrier mechanism used to improve tracking and recovery algorithms such as generational and incremental recovery, from the point of view of reference state change, is to absorb the idea of reference counting collection.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.
A Free Trial That Lets You Build Big!
Start building with 50+ products and up to 12 months usage for Elastic Compute Service