How to implement automatic garbage collection

Source: Internet
Author: User
The complexity of explicit memory management is also prone to errors. Therefore, we need an automatic memory management policy, which is the automatic garbage collection mechanism. Since automatic garbage collection is used, the platform must discover and clear the garbage in one way. This is a concern of the garbage collection algorithm. The task of the garbage collection algorithm is to separate the active object and the dead object, and then recycle the memory of the dead object, in order to make better use of the memory.

The complexity of explicit memory management is also prone to errors. Therefore, we need an automatic memory management policy, which is the automatic garbage collection mechanism. Since automatic garbage collection is used, the platform must discover and clear the garbage in one way. This is a concern of the garbage collection algorithm. The task of the garbage collection algorithm is to separate the active object and the dead object, and then recycle the memory of the dead object. in order to make better use of the memory, some algorithms also compress memory fragments. The following describes common garbage collection algorithms:

Reference Counting)

Reference count, as its name implies, indicates that each object has a counter. when a reference to it is added, its counter will be added with 1, when this reference is no longer used, its counter will decrease by 1. When the counter is 0, the object is considered as spam and can be recycled. if the object is recycled, the counters of all objects referenced by the object will decrease. In this way, many object references may be reduced to 0.

One advantage of using reference counting for garbage collection algorithms is that the implementation is simple, compared with other garbage collection algorithms, the garbage collection process does not suspend the program (this will be mentioned later ). Because the increase and decrease of the count are carried out during the program running, when the count of an object is found to be 0, it can be recycled immediately.

However, the reference count has its own difficulty: circular reference. For example, if object A references object B, the counter of object B is incremented by 1, and then the count of reference C and C is incremented by 1, then C references B, and the count of reference B is incremented by 1 to get 2. If A no longer references B, the counter of B becomes 1. B and C reference each other to form an isolated island, but the counter is not 0 and cannot be recycled. This problem is more serious in object-oriented languages.

In fact, reference counting is a simple and effective resource management method that can be applied in many scenarios. For example, to make efficient use of resources, you can copy a resource without adding a reference. you can only increase the reference count without copying it.

Previously, I had a real-time monitoring system. the processes installed on the client continuously send the monitoring client information (such as the screen) to the server. The server is a Web program, and multiple administrators can monitor the same client. When an administrator starts monitoring a client, I add 1 to a counter on the corresponding data. when the agent stops monitoring, the administrator will log out, minus 1, when the decrease is 0, the server no longer accepts the monitoring data from the client and sends the command to the client to stop monitoring. This not only avoids copying a copy of data for every monitored administrator, but also controls the data lifecycle.

Trace)

Use the relationship diagram of the trail object and then collect it. The following garbage collection algorithms are used for tracking:

Mark-Sweep)

When the memory is allocated, a threshold value may be exceeded (the specific operation varies with the platform), and garbage collection is triggered. First, the garbage collector determines some roots, such as local variables, method parameters, instance variables of the class and all static variables of the method currently being executed by the thread. The garbage collector then looks for all their references from these root objects and marks the referenced objects ), when the garbage collector encounters a marked object, it does not go forward (Avoiding loop marking and leading to an endless loop ). This stage is called Mark Phase ). After the marking stage ends, all marked objects are called Reachable objects or Live objects, and all unmarked objects are considered as garbage, can be recycled. After this Phase is completed, it enters the cleaning Phase (Sweep Phase ). In this phase, all objects are traversed and the memory occupied by the objects not marked is recycled.

A problem with this algorithm is memory fragmentation. At the beginning, the memory may be allocated in sequence. after several garbage collection times, some objects in this continuous memory space become garbage and are recycled, some are still surviving objects. In this way, many holes will appear in the memory. Memory fragments are very harmful, and there may be a lot of idle memory, but they are not large enough fragments, this will cause an out of memory exception (OOM) to be thrown because no "hole" can be installed in the next allocation ). In this way, another algorithm appears.

Mark-Compact)

The tag phase of the tag compression algorithm is the same as that of the tag phase of the tag cleaning algorithm. When the tag compression algorithm is used to mark reachable objects, we no longer traverse all objects to clean up the garbage. we only need to align all surviving objects to the left, let the discontinuous space become continuous, so there will be no memory fragments. Not only that, because no continuous space becomes continuous, the memory allocation is faster.

For the Mark cleaning algorithm, idle memory is no longer consecutive because of holes in the memory. to allocate memory, a linked list of idle memory space may be maintained in the system. When you need to allocate memory, it will traverse this linked list, find a memory block that is large enough, and divide it into two parts, one for the current allocation, the other one is put back into the linked list (this will cause more memory fragments, and some strategies are not sequential searches. it is good to find the large enough, it may be to find a better idle memory block ).

For the Mark compression algorithm, the memory space is continuous. we only need a pointer to mark where the next allocation task starts, after the allocation, the pointer increments the size of the allocated object, which is very fast and does not need to maintain the memory linked list of that space.

In this case, it seems that the Mark compression algorithm is definitely better than the Mark cleaning algorithm. what else is necessary for Mark cleaning? But remember that the compression algorithm needs to move the object to achieve the purpose of compression, so that all object references must be updated. It seems that there are advantages and disadvantages.

Mark-Copy)

The Mark stage is the same as the previous one. in the copy stage, all the surviving objects will be copied to another idle memory, and the original memory will all become idle memory. This algorithm is easy to implement and efficient (this algorithm is used for garbage collection in the new generation of Sun JVM ).

Other tracking problems

There are some other problems about using tracking for garbage collection:

  1. You need to pause the running of the current program. In the garbage collection process, if the current program is still running, it will continue to allocate and use the memory, which will cause more complicated problems, to avoid these problems, most of the implementation of the garbage collector will suspend all threads more or less (only the thread that executes the hosted code will be suspended ). This may be unacceptable for applications with high real-time performance.
  2. In some stages, the garbage collection thread can run concurrently with the application thread, but the garbage collection thread also occupies system resources, which will reduce the application execution performance.
  3. All threads are allocated on the same stack, which may result in data inconsistency. This requires locking to synchronize threads, which will reduce the memory allocation efficiency, you can divide the memory into many regions and allocate each thread a region so that no synchronization is required.
Generation Division

The above describes the garbage collection algorithms for tracking. these algorithms all share the same thing: Mark reachable objects. It can be imagined that if the space to be marked is very large and there are many objects to be marked, the process will be very slow. in order to make the process faster, most modern garbage collectors divide memory space into generations. For example, the 0, 1, and 2 generations of CLR. The new generation, old generation, and persistent generation of JVM. In this way, the garbage collection will traverse the mark in a relatively small space.

However, this generational strategy also relies on some empirical assumptions:

  1. The lifecycle of newly allocated objects is relatively shorter.
  2. The lifecycle of the old object is longer.
  3. Few old objects reference new objects
  4. The lifecycle of a small object is shorter.
Summary

This article focuses on the garbage collection algorithms. these algorithms are applied to the actual platform and there are various problems and application strategies.

Some may say, why do we need to know how the automatic garbage collection mechanism works? Isn't this a black box for application programmers? To some extent, it is a black box. you really cannot control the behavior of the garbage collector. But after knowing these details, we may make some adjustments to the garbage collector to make it more suitable for our applications: Client applications? Server applications? Real-time application? (In this regard, JVM provides many configurable options that can be adjusted based on actual scenarios, while CLR only provides a few configuration functions ).

Furthermore, by understanding these features, we can write code that is more friendly to the garbage collector. for example, after learning about the Mark compression algorithm, we may find that for the garbage collector, the amount of time is not the amount of garbage, but the number of surviving objects in the system. If there are many small surviving objects, it takes a longer time to mark them.

This article is available at http://www.nowamagic.net/librarys/veda/detail/1496.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.