Some words about garbage collection

Last Update:2017-02-27 Source: Internet

Author: User

Tags count garbage collection requires

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

"It's hard to believe that Java can be as fast as C + + or even faster." ”

According to my own practice, this statement is indeed tenable. However, I have also found that many of the doubts about speed come from some early implementations. Since these methods are not particularly effective, there is no model for reference and cannot explain why Java is fast.

The reason I think about speed is partly due to the C + + model. C + + puts its focus on everything that happens during "static" compilation, so the runtime version of the program is very short and fast. C + + is also directly based on the C model (mainly for backwards compatibility), but sometimes it is the most convenient method in C + + because it can work in a certain way only. One of the most important cases is the way C and C + + manages memory, which is an important basis for some people to think Java is slow: in Java, all objects must be created in the memory heap.

In C + +, the object is created on the stack. This can be achieved faster because when we move into a particular scope, the stack pointer moves down one unit to allocate storage space for all objects created within that scope that are based on the stack. And when we leave the scope (after all the local builders are called), the stack pointer moves up one unit. However, creating a "memory Heap" (Heap) object in C + + is often much slower because it is based on the memory heap of C. This memory heap is actually a large memory pool that requires recycling (regeneration). When you call Delete in C + +, the freed memory leaves a hole in the heap, so when you call new, the storage allocation mechanism must perform some form of search so that the object's storage matches any ready-made hole in the heap, or it will quickly run out of storage space. While the allocation of memory heaps can have such a significant performance impact in C + +, searching for available memory is an important cause. So it's much faster to create a stack based object.

Similarly, because so much of C + + work is done during compilation, this is a factor that must be considered. But in some parts of Java, things happen to be more "dynamic", and it changes the model. When creating objects, the use of garbage collectors has a significant impact on the speed at which objects are created. On the surface, this seems odd--the release of storage space can have an impact on the allocation of storage space, but it is one of the important tools that the JVM takes, which means that allocating storage space for heap objects in Java is almost as fast as creating storage space on the stack in C + +.

C + + heaps (and slower Java heaps) can be imagined as a courtyard, each with its own piece of land. At some later time, this "real estate" will be discarded and must be regenerated. In some JVMs, however, the Java heap works in quite different ways. It's more like a conveyor belt: each time a new object is assigned, it moves forward. This means that the allocation of object storage space can reach very fast. The heap pointer moves simply forward to Virgin territory, so it is almost exactly the same as the C + + stack allocation (and, of course, more overhead on data logging, but faster than searching for storage space).

Now, you may notice that the heap of facts is not a conveyor belt. If you treat it that way, you end up requiring a large number of page exchanges (which can cause great disruption to performance), which will eventually run out of memory and cause memory paging errors. So here's a trick to take, the famous "garbage collector". While collecting "garbage", it is also responsible for compressing all objects in the heap, moving the "heap pointer" as close to the beginning of the conveyor belt as possible, away from the place where the paging error occurred (memory). The garbage collector will rearrange everything to make it a high-speed, infinitely free heap model, while allocating storage space with ease.

To really grasp how it works, we first need to understand the work programme of different garbage collectors (GC). A simple, but slower, GC technique is reference counting. This means that each object contains a reference counter. The reference counter adds value each time a handle is connected to the same object. The reference count is decremented whenever a handle exceeds its scope or is set to null. As a result, as long as the program is running, it requires continuous reference-count management-although this management itself has less overhead. The garbage collector moves through the entire list of objects, and once it finds that one of the reference counts becomes 0, it frees up the storage space it occupies. But there is also a disadvantage: if the object is a circular reference to each other, then even if the reference count is not 0, it is still possible to belong to the "garbage" should be collected. In order to find such a self referenced group, the garbage collector is required to do a lot of extra work. Reference counting is a type of garbage collection, but it does not appear to be suitable for use in all JVM scenarios.

In faster scenarios, garbage collection is not based on reference counting. Instead, they are based on the principle that all objects that are not deadlocks will eventually go back to a handle that either exists on the stack or exists in the static storage space. This backtracking chain may go through several layers of objects. So, if you start from the stack and the static storage area and go through all the handles, you can find all the active objects. For each handle that you find, you must track the object it points to, and then follow all the handles in that object, "chase" to the object they point to ... Wait until you iterate through the entire link network that originated from a handle in the stack or a static storage area. Each object that has been moved halfway must remain active. Note For those special self referenced groups, the previously mentioned problem does not occur. Because they are not found at all, they are automatically treated as garbage.

In the approach described here, the JVM employs an "adaptive" garbage collection scheme. For those active objects that it finds, the action taken depends on what variant is currently in use. One variant is "Stop and copy." This means that the program will first stop running (not a background collection scheme) for reasons that will soon be obvious. Each active object that is found is then copied from one memory heap to another, leaving all the garbage. In addition, as objects are copied to the new heap, they are focused one after another. This makes the new heap more compact (and allows the new storage area to be simply drawn from the end, as described earlier).

Of course, when you move an object from one place to another, all the handles (references) that point to that object must change. Handles that are obtained by tracking objects in the memory heap, as well as those static storage areas, can be changed immediately. However, you may also encounter other handles to this object during the traversal process. Once this problem is discovered, fix it immediately (imagine a hash table mapping the old address to the new address).

There are two issues that make replication collectors seem inefficient. The first problem is that we have two stacks, all of which are moving back and forth within the two separate stacks, requiring a twice times the amount of management that is actually needed. To solve this problem, some JVMs allocate memory stacks as needed and simply copy one heap to another.

The second problem is replication. As the program becomes more and more "robust", it hardly produces or produces very little rubbish. Even so, a replica collector will still copy all of the memory from one place to another, which is very wasteful. To avoid this problem, some JVMs can detect if no new garbage has been generated and then switch to another scheme (which is the reason for "adaptive"). Another scenario is called "tagging and Purging," which is what the Sun's JVM has always used. For conventional applications, marking and purging is very slow, but once you know that you are not producing rubbish, or that it produces very little rubbish, it can be very fast.

Tag and purge take the same logic: start from the stack and static storage area, and track all handles to find the active object. However, each time an active object is found, a tag is set, marking the object. However, the object is not collected at this time. The purge process begins only when the tag process finishes. In the purge process, the deadlock object is freed, however, no form of replication is made, so if the collector decides to compress a intermittent memory heap, it is implemented by moving the surrounding objects.

"Stop and copy" shows us that this type of garbage collection is not done in the background; instead, the program stops running once garbage collection occurs. In Sun's document library, it is found that many places define garbage collection as a low priority background process, but it is only a theoretical experiment and does not actually work at all. In practical applications, Sun's garbage collector runs when memory is reduced. In addition, Mark and purge also requires the program to stop running.

As noted earlier, in the JVM described here, memory is allocated in large chunks. If you assign a big object, it gets its own block of memory. Strict stop and copy requires that you copy each active object from the source heap to a new heap before releasing the old heap, which involves a lot of memory conversion work. Through a block of memory, the garbage collector can often use dead blocks to replicate objects, just as it did when it was collected. Each block has a build count that tracks whether it is still "alive". Typically, only blocks that have been created since the last garbage collection are compressed, and for all other blocks, the build count overflows if a reference has been made from some other place. This is a situation that many short-term, temporary objects often encounter. A full cleanup is done periodically-the large objects are still not replicated (just let their build count overflow), while the blocks that contain the small objects are copied and compressed. The JVM monitors the efficiency of the garbage collector, and if the garbage collection becomes a waste of time because all objects belong to a long-term object, it switches to the tagging and purge scheme. Similarly, the JVM tracks the successful mark and purge work, and if the memory heap becomes more and more fragmented, the Stop and copy scheme is swapped back. The notion of "customization" comes from this behavior, and we conclude it as follows: "Automatic conversion stops and copies/marks and clears the two modes according to the situation."

The JVM also employs many other acceleration scenarios. One of the particularly important concerns is the loader and the JIT compiler. If a class must be loaded (usually when we first want to create an object of that class), the. class file is found and the bytecode of that class is fed into memory. At this point, one method is to JIT-compile all the code, but there are two drawbacks to doing so: it will take more time, and the compilation time may be longer if combined with the running time of the program, and it increases the length of the execution file (the byte code is much more streamlined than the extended JIT code), which can result in a memory page exchange, This significantly slows down the execution speed of a program. Another alternative would be to not JIT-compile unless necessary. In this way, code that is not executed at all may never be JIT-compiled.

Because the JVM is external to the browser, you may want to benefit from the increased speed of some JVMs as you use your browser. Unfortunately, the JVM is not currently able to communicate with different browsers. To play the potential of a particular JVM, either use a browser built with that JVM, or only run a stand-alone Java application.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More