Modern garbage collection [translate]

Source: Internet
Author: User
This is a creation in Article, where the information may have evolved or changed.


I've read a lot of articles about the newest garbage collector (garbage collector) in the Go language recently, but they all make me doubt that many of the official team blogs from the Go language. They seem to imply that a huge breakthrough has taken place in the field of garbage collection.



Here is an excerpt from the first time this garbage collector was made public in August 2015:


Go is preparing to build a garbage collector that belongs not only to the 2015 but also to the 2025 and future. The Go 1.5 garbage collection will signal that Stop-the-world will no longer be a barrier to building a secure programming language. Applications can then be easily and efficiently extended between hardware. And as hardware becomes more powerful, software extensibility becomes more powerful, and garbage collection is no longer an obstacle.


The Go team not only claims that they have solved the stop-the-world problem in garbage collection, but also that it will make your programming experience easier:


The current scenario for comparing high-level abstractions with garbage collection performance issues is to add more garbage collection preconfigured. Programmers can choose different pre-configurations to launch apps based on their application's specific circumstances. The disadvantage of this scheme is that as time goes on, the pre-configuration becomes more and more, and you are getting into the choice syndrome. The Go solution is completely the opposite, and it simply provides a pre-configuration, the GOGC.


When you see these messages about the new runtime, the Go language developers are undoubtedly happy. But these words are just excerpts from the blog world, let's calm down and scrutinize them carefully.



The reality is that Go's new garbage collector doesn't really use any new concepts or new research results. The Go team also acknowledged in the statement that the concurrency tag/delete model in the new garbage collector was raised as early as 1978. The new garbage collector attracts attention simply because it is designed to minimize the time it takes to pause in garbage collection, but in fact it pays for all the other important aspects of garbage collection. The Go team does not seem to have told the market the price of these tradeoffs. So, we can summarize the facts as:


We used a more than 10-year-old algorithm to create a garbage collector that belonged to the next 10 years. Go's new garbage collector is a concurrent, tri-colored, tag/delete collector. The algorithm was first proposed by Dijkstra in 1978. It is different from almost all of the "Enterprise" garbage collection solutions available today, and it can be adapted to modern hardware as well as the low latency required by modern software alone.


So, in the past 40 years, in the "Enterprise-class" garbage collector field of research, is nothing? Still is...



Brief introduction of garbage collection theory



When designing a garbage collection algorithm, here are a number of different factors you need to consider:


  • Program Throughput: How long is your algorithm slowing down the program? It is usually measured by how much percent of CPU time is used for garbage collection vs real work.
  • garbage Collection throughput: How much garbage can the collector clean up within a constant CPU time?
  • Heap Usage: How much extra heap memory does your garbage collector need to use?
  • pause Time: How long will your garbage collector Stop-the-world at one time?
  • Pause frequency: How often will your garbage collector Stop-the-world?
  • Pause Distribution: will your garbage collector pause for long periods of time and pause temporarily? Or is it more constant?
  • memory allocation efficiency: is allocating new memory efficient, inefficient, or unpredictable?
  • memory compactness: will your garbage collector throw out an out-of-memory error when requesting memory because of more memory fragmentation and enough memory?
  • Concurrency: Does your garbage collector work well with multicore processors?
  • Extensibility: will your garbage collector work well when the heap gets bigger and larger?
  • Configurable : will the configuration of your garbage collector be complex?
  • warm up time: will your garbage collector adjust itself according to the circumstances? If so, how long does it take to reach the optimal state?
  • Memory Release: Does your algorithm release memory that is no longer in use back into the operating system? If yes, when?
  • Portability: can your garbage collector work in a different CPU architecture?
  • Compatibility: which programming language and compiler service does your garbage collector serve? Can it be used for languages that do not require garbage collection (such as C + +) services? When changing the garbage collection algorithm, is it necessary to recompile the entire program and dependencies?


As you can see, there are many factors to consider when designing a garbage collector. Some of these may even affect the ecology surrounding the platform.



Because the factors to be considered are so many and complex, garbage collection is a sub-domain in computer science research. New algorithms are constantly being presented and then implemented in academia and engineering. Unfortunately, there is no single algorithm that works perfectly for all situations.



Compromise is everywhere.



Let's say something more specific.



The first garbage collection algorithm is designed for programs that use small-scale heap memory in a single-processor machine. CPU and memory are very expensive, and the user has no particularly high requirements for the performance of the program, so a pause visible to the naked eye is also allowed. The algorithm design was designed to minimize the use of CPU time and heap memory capacity. This is thought that garbage collection does not start until you are unable to continue allocating memory. When it cannot be allocated, the garbage collector pauses the program and then performs a full amount of tag/delete recycling in the heap.



This kind of garbage collector is very old, but they still have some visible advantages: they are very simple to implement, they do not slow down your program when they are not recycled, and do not consume extra memory. In some conservative recyclers, such as Boehm, they don't even have to change with your compiler and programming language! This makes them ideal for desktop applications that use a small amount of heap memory, such as AAA video games.



Let's turn to another case if you are having a 10-core processor and using hundreds of GB of heap memory. Perhaps your server is dealing with transactions in financial markets, or is running a search engine, so a short pause is very important to you. In such a case, you may need an algorithm that slows down the program in the background, but reduces the speed of the application.



So there's no single algorithm that works perfectly for all situations. Nor does a programming language runtime know whether you are performing a bulk operation or a deferred-sensitive interactive program. This is why "garbage collection preconfigured" begins to appear. Not because of the stupidity of the runtime engineers, but because of our current capabilities in the field of computer science.



Generational assumptions



Since 1984, it has been learned that most of the allocated memory can be recycled within a short period of time after it has been allocated. This observation is called generational hypothesis and is one of the most powerful experience discoveries in the entire programming language community. Even after more than 10 years of engineering and programming language changes, it has proven to be quite correct.



This discovery is meaningful for garbage collection algorithms, which means that the algorithm can take advantage of it. The new generational garbage collector has the following improvements compared to the old pure tag/delete type of collector:


    • garbage Collection throughput: It can reclaim more garbage at a faster rate.

    • memory allocation efficiency: allocating memory no longer requires searching the entire heap for gouges, so allocating memory becomes very efficient.

    • Program throughput: allocated memory is placed neatly and compactly, which optimizes the use of the cache. Although the generational collector allows the program to perform some extra work at run time, it optimizes the cache, which results in greater benefits than disadvantages.

    • pause Time: most (not all) pauses become shorter.


Of course, it also has the following drawbacks:


    • Compatibility: implementing a generational garbage collector requires the ability to move entities in memory, and it requires an extra work when the program writes to the pointer. This means that the collector must be tightly integrated with the compiler. So now there is no generational garbage collector for C + +.

    • Heap Usage: The generational garbage collector needs to be allocated and replicated back and forth in more space (spaces). Therefore, this increases the amount of heap usage.

    • Pause Distribution: So at this point many garbage collection pauses are already very short-lived, but sometimes a slower mark/delete of the whole heap is still required.

    • Configurable : The generational collector invented the concept of "Cenozoic" and "Laosheng generation", which makes the performance of the program more sensitive to the specific size of the "generation".

    • warm-up time: in order to alleviate the problem of reconfiguration, some generational recyclers dynamically adjust the "Cenozoic" size according to the operation of the program. However, the pause time will change with the program's running time.


Despite these shortcomings, almost all modern garbage collection algorithms are generational because of the his flaws. The generational garbage collection algorithm can also be combined with many other features, such as concurrency, parallelism, compact memory, and so on.



Go's concurrency Collector



Since Go is a type-system and relatively common imperative language, its memory access pattern can be compared with C #. So it's run with a garbage collector similar to. NET, and it's all generational.



In fact, most go programs have a request/response pattern like an HTTP server, which means that the GO program shows a strong generational trend. So the Go team is also exploring the future of a "request-oriented collector". But this collector has been observed only as a generational collector with two adjustable strategies. In any other request/response model run time, this garbage collector can be imitated, just to ensure that the "new generation" space is large enough to fill all the request data.



In addition, the Go now garbage collector is not generational. The rest of the section is the old tag/delete collector.



In doing so you can get the benefit that you can have a very very low pause time. But in almost every other way, you have to pay the price:


    • garbage Collection throughput: as the heap grows, the amount of time it takes for garbage collection to increase. That is, when your program uses more and more memory, your memory will be released more and more slowly, and the ratio of garbage collection vs Actual work is getting higher. The only possible way to invalidate the above statement is to make your program completely non-parallel and allow garbage collection to operate without restrictions in other cores at the same time.

    • memory compactness: memory is completely non-compact due to the large capacity of the "Cenozoic" space.

    • Program throughput: because every recycle garbage collection has a lot of things to do, it will inevitably cause it to use more CPU time.

    • Pause Distribution: Any concurrent garbage collector encounters a situation in the Java world known as "concurrency mode failure": Your business threads are making garbage faster than the collector thread cleans up. In this case, the collector can only completely stop your business thread and wait for the cleanup to complete. So while the Go team claims that their garbage collection pauses are very short, it is only true if the collector has enough CPU time to ensure that it runs faster than the business process. In addition, the Go compiler itself lacks the ability to reliably suspend business programs immediately. So, whether the pause time is really short, depends on what kind of business code you're running (for example, using Base64 to extract chunks of data can quickly increase the pause time).

    • heap Usage: because using the tag/delete cleanup heap can be very inefficient and slow. So Go needs a big "new generation" of free space to keep you from encountering "concurrency mode failure". So, go defaults to make your heap usage 100 more ...


So, the cost of Go optimization for pause time is almost as much as the rest of the code becomes slower.



Compared to Java



The HotSpot JVM provides a number of garbage collection algorithms that you can select from the command line. None of the goals of these algorithms are to get super-short pauses like Go, because they all have other compromises to consider. The user can switch the garbage collection algorithm by restarting the program. So when users are tuning their code for different scenarios, they can try different algorithms.



On any modern computer, the default algorithm is a high-throughput collector. It is designed to perform high-volume tasks and does not have special optimizations in terms of pause times. While this default option allows people to think that Java garbage collection does not do a good job, in the black box, Java is simply trying to get your program to run the fastest and use the least amount of memory, although the pause time may not be optimistic.



If more pause times are important to you, then you can switch to the concurrency tag/recycle collector (CMS). This is the closest one to the garbage collector for Go. It is also generational, but it will have a longer pause time than the Go garbage collector: "Cenozoic" space when paused, in addition to garbage collection, but also try to move objects to make themselves more compact. There are two kinds of pauses in the CMS. The first is faster, it takes about 2-5 milliseconds. The second is slow, which takes about 20 milliseconds. The CMS is also adaptive: because it is concurrent, it must guess the right start to execute (just like go). However, Go requires a lot of extra heap presence in advance to avoid "concurrency mode failure", but the CMS will be adaptive at runtime to try to avoid.



The latest generation of Java garbage collector is called "G1", meaning "garbage collection First" (garbage). Although it is not the default option for Java 8, it will be Java 9. It is designed to be the most universal and ubiquitous algorithm of the present. For almost the entire heap, it is concurrent, generational, and compact in memory. It can also be adapted to the environment, but as with all recovery algorithms, it does not know the true intent of your program, so it also allows you to perform some additional configurable parameters: such as the maximum amount of memory it can use, and the target pause time, then it adjusts everything else to meet your requirements. G1 By default is more inclined to let your program run faster than to have a shorter pause time, so the default target pause time is 100 milliseconds. Each pause time is not constant, and most pauses are very short. Most of the G1 's performance is very good in an environment that uses terabytes of heap memory. So G1 's extensibility is also very good.



Finally, there is a new garbage collection algorithm called Shenandoah. It has entered OpenJDK, but will not appear in Java 9. Unless you're using a special Java build from Red Hat. It is designed to have a very short pause time at any heap size, and also to keep the memory compact. The cost is more heap usage and implementation complexity. It requires that the object's position be moved while the app is still running, which requires that both the read and write operations of the pointer address interact with the garbage collector.



Conclusion



The purpose of this article is not to persuade you to use another programming language or tool. But just want to say: garbage collection is an unusually complex problem. So for all the breakthroughs that have been generated in this field, it is necessary to maintain a skeptical attitude. They are likely to simply not tell the other side of the balance.



However, if you don't mind the cost of other aspects and just want to minimize the pause time, then use Go's garbage collector.



Original link



https://medium.com/@octskywar ...


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.