A big contrast between go and Java garbage collection algorithms in the eyes of an expert

Source: Internet
Author: User

Introduction: GC is the most modern language built-in features, the author of the Go language claims that the GC pauses below 10ms in-depth analysis, but also with the Java garbage collector contrast. If the Go GC is mature enough, see the highly available architecture of volunteer translators.


I've seen a bunch of recent articles about the latest garbage collectors in the Go language. Some of these articles are from the Go project itself. They claim that the GC technology has undergone a fundamental breakthrough.


The following is the announcement of the new garbage collector in August 2015:


Go is building a garbage collector (GC), not only for 2015 years, but also for the future of 2025 and beyond ... the STW pause is no longer an obstacle to using the Go language. in the future, applications can easily scale with hardware and become more powerful with hardware, and GC will not be a stumbling block to software extensibility.


Not only did the Go team claim to have solved the issue of GC pauses, but the whole thing became very silly:


One way to resolve performance issues is to add GC options, each of which sets different options for performance issues. The programmer searches for the appropriate GC settings for their application. The downside is that after 10 years you will get very many configuration options (configuration options become a black magic). go does not go this way. Instead, we provide a single option, called GOGC.


In addition, with the continuous support of dozens of options, the Go team can improve the run-time performance based on feedback from the real-world application's performance.


Many Go users are very happy with the new runtime. But to me, it's more like a misleading. Since these statements are repeated in various blogs, it is time to look at them in more depth.


The reality is that theGo GC does not really implement any new ideas or research. as expressed in the announcement, it is a concurrent tag/scan collector (based on the idea of the 1970s). It is designed to optimize pause times at the expense of other factors in the GC. Go's technical lectures don't seem to mention these tradeoffs:


To create a garbage collector over the next decade, we turned to an algorithm that was decades old. Go's new garbage collector is a concurrent, tri-color, tagged scan collector, an idea that was first presented by Dijkstra in 1978. Compared to most enterprise-class garbage collectors today, this is a highly differentiated choice, and we think the algorithm is well suited to the properties of modern hardware and the latency requirements of modern software.



After reading the above statement, you may be very confused that all the "enterprise" levels of GC research have not been fruitful in the past 40 years.


Fundamentals of GC theory


Here are the different factors you want to consider when designing a garbage collection algorithm:


    • Program throughput: How much of your algorithm slows down the program? This is expressed as a percentage of the time spent performing garbage collection and working.

    • GC Throughput: How much garbage can be purged by the collector within a given CPU time?

    • Heap overhead: How much extra memory does your collector need? If your algorithm allocates temporary memory at collection time, does it cause your program's memory usage to suddenly skyrocket?

    • Pause time: How long has your garbage collector stop world?

    • Pause frequency: How often does your garbage collector pause a program?

    • Pause Distribution: Sometimes there is a very short pause, but sometimes there is a long pause.

    • Memory allocation performance: Is it fast or slow to allocate new memory? or unpredictable performance?

    • Collation: Does the garbage collector report out-of-memory (OOM) errors because of memory fragmentation because there is enough free space to satisfy the request?

    • Concurrency: How does the garbage collector use multicore?

    • Extensibility: How does your garbage collector work as the heap grows?

    • Tuning: How complex is the garbage collector's configuration to be out-of-the-box and get the best performance?

    • Warm-up time: is the garbage collection algorithm adaptive based on measurement behavior? How long does it take to achieve the best?

    • Memory release: Does your algorithm free unused memory back to the operating system? If so, when will it be released?

    • Portability: Can your garbage collector work on a CPU architecture that provides a much weaker memory consistency than x86?

    • Compatibility: What languages and compilers are used by your garbage collector? Can it work with languages that are not designed with GC in mind, such as C + +? Does it need to modify the compiler? If so, does changing the GC algorithm require recompiling All programs and dependencies?


As you can see, designing a garbage collector has many different factors to consider, some of which can affect the design of a broader ecosystem on your platform. I'm not even sure if the list above contains all the factors.

Because design space is so complex, garbage collection is a sub-domain of computer science. There are a lot of research papers in this field, and the new algorithms are presented and realized at a steady rate by academia and industry. However, no one finds that a single algorithm satisfies all the conditions theoretically.


The art of trade-offs (tradeoff)


Let's talk a little more specifically.

The first garbage collection algorithm was designed for a single-processor machine with a smaller heap. CPU and RAM were very expensive at the time, and the user's requirement for program pauses was not very stringent, so a visible pause was acceptable. This algorithm takes precedence over the CPU and heap overhead of minimizing the garbage collector. This means that the garbage collector didn't do anything until you failed to allocate memory. The garbage collector pauses the program and completes the tag/scan of the heap and reclaims the memory.

This type of collector, although somewhat old, still has some advantages-the simplicity of the algorithm does not slow down your program, and does not increase any memory overhead when garbage collection is not done. In the case of a conservative garbage collector such as Boehm GC, you don't even need to modify the compiler or the programming language! This makes them suitable for desktop applications that typically have a smaller heap of memory, including AAA video games (where a large amount of RAM is occupied by data files that do not need to be scanned).

Stop-the-world (STW) labeling/scanning (mark/sweep) is the most common GC algorithm in the undergraduate computer science class. During the interview, I will ask the candidate to talk about GC, which almost always takes the GC as a black box and knows little about the GC.


Simple STW Labeling/scanning (mark/sweep) has a very serious problem. As you add processors or heap growth, the algorithm does not work well. But-if your heap is small, the algorithm will be able to meet the demand for pause time! In this case, you should use the algorithm to keep your GC overhead low enough.

In extreme cases, you might use a hundreds of GB heap on a dozens of-core machine. Perhaps your server is running a financial market transaction, or a search engine, so a low pause time is very important to you. At this point you may want to use a collection algorithm that reduces the speed of the program but can execute concurrently.


Or you may have a large amount of homework. Because they are non-interactive, pausing time is not important at all. In this case, you might want to use an algorithm with throughput above all to increase the ratio of working time to execution collection time.

The problem is that no single algorithm is perfect in all respects . It is also impossible for the language runtime to know whether your program is a batch job or an interactive latency-sensitive program. This is the reason why GC tuning exists. It reflects the basic limitations of our computer science.


Intergenerational (generation) hypothesis


Since 1984, we have found that most objects are "young" (quickly becoming garbage after distribution). This condition, known as the intergenerational hypothesis (generational hypothesis), is one of the strongest experiences in the entire PL engineering field. It has been correct in a variety of programming languages, as well as in the software industry for decades: functional languages, imperative languages, languages without value types and languages with value types.

Discovering this fact is very useful because it means that the GC algorithm can be used at design time. These next-generation garbage collectors have many improvements to the old SWT garbage collector:


    • GC Throughput: They can collect garbage more and faster.

    • Allocation performance: Allocating new memory no longer requires searching through the heap for available memory, so the memory allocator becomes more efficient.

    • Program throughput: Objects are neatly placed in memory adjacent to each other, which greatly improves cache utilization. The generational garbage collector does require a program to do some extra work at runtime, but this reduction is offset by the improved cache effect.

    • Pause time: Most (but not all) pauses become lower.


And of course introduce some drawbacks:


    • Compatibility: Implementing a generational garbage collector requires the ability to move objects in memory and, in some cases, to perform additional work when the program uses pointers. This means that the GC must be tightly integrated with the compiler. Therefore, there is no generational garbage collector for C + +.

    • Heap overhead: These collectors work by copying memory back and forth between various "spaces". Because there must be room for replication, these garbage collectors add some heap overhead. In addition, they need to maintain various pointer mappings to further increase memory overhead.

    • Pause assignment: While many GC pauses are now very short, some still need to perform a full tag/scan of the entire heap.

    • Tuning: Algebraic collectors introduce the concept of "young generation" or "Eden Space", which is very sensitive to the size of the space.

    • Warm-up time: In response to tuning problems, some collectors dynamically adjust the size of the young generation since the observation program is running, in which case the pause time depends on how long the program is running.


The advantages of generational collectors are so tempting that basically all modern GC algorithms are generational. The generational garbage collector can be enhanced with a variety of other features, and a typical modern GC integrates concurrency, parallelism, grooming, and generational integration.


Go Concurrent Garbage collector


Since Go is an imperative language, its value type, memory access pattern and C # (. NET using generational garbage collector) are equivalent.

In fact, go programs typically handle request/response tasks (such as HTTP servers), which means that the GO program shows strong intergenerational behavior, and the Go team is exploring potential algorithms that can take advantage of the intergenerational hypothesis, which they call a "request-oriented garbage collector". This is essentially a generational garbage collector that can be strategically tuned. When processing the request/Response mode, make sure that the young generation is large enough to allow all the garbage generated by processing the request to optimize the GC. (Highly Available Architecture translator Note: Refers to go next generation garbage collector transaction-oriented Collector)

However, the current GC for Go is not generational. Just run the tag/scan in the background. (Highly Available Architecture translator Note: Concurrency tag cleanup algorithm)


This makes the pause time very short, but makes other factors worse. From our basic theory we can see:


    • GC Throughput: The GC time is growing synchronously with the heap size. Simply put, the more memory your program uses, the slower the memory is released, and the more time your computer spends. If your program is not parallelized, you may not have to consider this problem.

    • Collation: The GC process generates memory fragmentation because it is not collated. The program also does not benefit from neatly arranged content in the cache.

    • Program throughput: Because the GC has to do a lot of work per cycle, it consumes a lot of CPU time.

    • Pause distribution: Any garbage collector running concurrently with a program may experience a "concurrency mode failure" problem in Java: Your program creates garbage faster than a GC thread can purge it. In this case, runtime has no choice but to completely stop the program and wait for GC to complete garbage collection. So when the go team declares that the GC pause is very low, the claim can only be applied when the GC has sufficient CPU time and space to complete the garbage collection. Also, because the go compiler lacks the ability to ensure that threads can be paused quickly and reliably, it can cause a low pause time depending on what type of code you are running (for example, Base64 decoding a large blob in a single goroutine causes the pause time to rise).

    • Heap overhead: Because collecting heaps through the tag/scan is very slow, you need a lot of space to ensure that you do not encounter a "concurrency mode failure". Go defaults to using 100% of the heap overhead to increase the amount of memory required by the program by a factor.


We can see these tradeoffs:


Service 1 allocates more memory than service 2, so STW pauses higher in service 1. But the STW pause duration dropped by an order of magnitude on two services. We saw that after switching, the CPU usage in the GC was increased by about 20% after two services.


In this particular case, Go is at the expense of a slower collector in exchange for a descending order of magnitude of pause time. Is this a good trade-off? Is the pause time low enough?


Paying more hardware costs to get a lower pause time may not be meaningful in some cases. If your server pause time is reduced from 10msec to 1msec, will your users really notice? What if you have to double your number of machines to achieve this?


Go takes pause time optimization as its primary goal, so that it seems willing to slow down the program to any order of magnitude for a shorter pause.


Comparison with Java


The HotSpot JVM has several GC algorithms that you can select from the command line. Because he needs to balance all the other factors, no GC algorithm's target can reduce the pause time to go level. You can switch between GC by restarting the program, because compilation is done when the program is running (Highly available schema translator Note: This refers to the JIT compiler), so the different memory barriers required by different algorithms can be compiled and optimized into code as needed.


The default algorithm is the throughput collector (throughput collector). This is designed for batch jobs, and there are no pause time targets by default. This default choice is also a reason why people think Java GC is a bit appealing: out-of-the-box, it tries to make your application run as fast as possible, with as little memory overhead as possible, while pause time is not the first consideration of the algorithm.


If the pause time is more important to you, then you may need to switch to the concurrency tag/scan collector (CMS concurrent mark/sweep collector). This is the closest garbage collector to the GC algorithm used by Go. But it's also a generational garbage collector, which is why it pauses longer than Go: younger generations need to organize and move objects, causing the application to pause. There are two types of pauses in the CMS. The first is that it can last for about 2-5 milliseconds in a short period of time. The second may last 20 milliseconds or longer. The CMS is adaptive: because it is concurrent, it must guess when it can start running the GC (like Go). The CMS will adjust itself at run time and try to avoid "concurrency mode failure". Because the bulk of the heap is the tag/scan algorithm (high-availability Architecture Translator Note: This is the old age, when using the CMS algorithm, the younger generation is not using the algorithm but using the tag/collation-based parnew, so strictly to organize and organize the benefits of memory in the CMS algorithm head is problematic), The problem may be caused by heap fragmentation.


The latest generation of Java GC is called "G1" (garbage first garbage priority). It will become the default algorithm in Java 9. It is designed to provide a common algorithm. The algorithm is a concurrency, generational, and collating algorithm for the entire heap. G1 is also largely adaptive, because (like all GC algorithms) It doesn't know what you really want, but it allows you to specify the preferred trade-off: Just tell it the maximum amount of RAM you are allowed to use and the pause time target (in milliseconds), and it will try to meet the pause time target. The default pause time target is approximately 100 milliseconds unless you specify a different target. G1 will be more inclined to let your application run faster rather than pausing less. Each pause time is not exactly the same, but most are very fast (less than a millisecond), and some pauses are slightly slower (50 milliseconds) because the heap is defragmented. The scalability of the G1 is also very good. It has been reported that people use the G1 algorithm on terabytes-level heap-scale programs. It also has some other features, such as a string in the Deduplication heap.


A project team supported by Red Hat has developed a new GC algorithm called Shenandoah. The code has been contributed to OpenJDK, but will not appear in Java 9 (unless you use the Java version of the Red Hat). This algorithm is designed to provide consolidation while maintaining a very low pause time, regardless of the size of the heap. Its cost is an additional heap overhead and more memory barriers (high-availability Architecture Translator Note: The read-write barrier is used at the same time, and the other algorithms use only write barriers). In this sense, it is similar to Azul's "no-pause" garbage collector (archnotes Translator Note: Refers to the garbage collector using the C4 algorithm, strictly speaking is not completely without a pause, just to ensure that the pause time in any case is less than 10ms, due to the OS on the soft real-time system The error may be more than 10ms, so it can be considered as a non-stop garbage collector).


Conclusion


The focus of this article is not to persuade you to use different programming languages or tools. Just want to bring the right understanding of the garbage collector. Garbage collection is a very challenging job, and many computer scientists have spent decades on it, so it is unlikely that a new GC algorithm will be available one night, and more likely, the new GC algorithm is just a different, mature GC for the old GC algorithm. Algorithms are less likely to consider the partial gate tradeoff.

But if you only want to reduce the program pause time, then follow the go GC.


reference reading (click to open)


    • Why is the server not going to the QPS? Java Threading Tuning Authoritative guide

    • C + + Python PHP Java Nodejs Performance PK, the result PHP7 is the most ...

    • GIAC Global Internet Architecture Conference successfully concluded, all PPT open download


This article was translated by a highly available architecture volunteer in the original English address:


https://medium.com/@octskyward/modern-garbage-collection-911ef4f8bd8e#.5j56cki9w


Technical original and architectural practice articles, welcome to the "Contact us" through the public number menu to contribute. Reprint please indicate from the highly available architecture "archnotes" public number and contains the following two-dimensional code.


high-availability schema


long press QR code follow "High Availability Architecture" public number

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.