Garbage collection of Go1.5

Source: Internet
Author: User
This is a creation in Article, where the information may have evolved or changed.

Garbage collection of Go1.5

2015-06-30

Go1.5 garbage Collection is a non-generational, mobile, concurrent, tri-colored mark sweeping garbage collection.

Simply explained, non-generational refers to the Go1.5 garbage collection is not using the generational garbage collection algorithm, not mobile refers to not do the memory of the collation and contraction, where the "concurrency" refers to the garbage collection, the user code can be run at the same time. Tri-color Mark Sweep is a classic garbage collection algorithm, more basic knowledge related can read another blog, this article assumes that the reader has some basic understanding of garbage collection.

The implementation of Go1.5 garbage collection is divided into five phases:

    • Gcoff garbage collection off state
    • Gcscan Scanning phase
    • Gcmark Mark Stage, write barrier effective
    • Gcmarktermination Mark End Stage, STW, assigning black objects
    • Gcsweep Sweep Stage

From a more macroscopic perspective, describe the GC process for Go1.5:

    1. Switch from Gcoff to Gcscan stage
    2. Wait for all p to be informed of this change. All goroutine will pass through a GC security point and know that the gcscan phase is now in
    3. Scans each goroutine stack, marks all the encountered pointers and inserts them into the queue
    4. Switch to Gcmark Stage
    5. Wait for all p to learn about this change
    6. Write barrier is in effect, all changes to the black/gray/white object to the white object will be captured, tagged and inserted into the queue. malloc assigns white objects
    7. At the same time, the GC iterates over and marks all objects that can be reached
    8. After the GC finishes marking the heap, it processes p in turn, taking out some objects from the queue
    9. Once the GC has finished processing all the objects in the queue, it goes to the next stage: Gcmarktermination
    10. Wait for all p to learn about this change
    11. malloc now allocates black objects, and unmarked objects are continuously reduced
    12. Once again, take the object out of the queue and mark all the objects that are not marked to reach
    13. When all P is processed and there is no new gray object (meaning all the available objects are marked), switch to the gcsweep stage
    14. Wait for all p to learn about this change
    15. From now on, malloc allocates white objects (sweep span is required before use). Write barrier is no longer required
    16. GC Background Sweep
    17. After the sweep is complete, switch back to Gcoff and wait for the next round

The scan phase-to-mark phase toggle is triggered when no more pointers need to be scanned. The scan stage simply marks the object as gray, but it is not marked as black. After switching to the mark phase, write barrier takes effect.

GO1.5 's design goal is to minimize the time of the STW (Stop the World) and to improve the real-time nature of the application. Not stop the world, meaning that the garbage collection process will run concurrently with the user code. Garbage collection is a technical challenge in organizing memory, while user code allocates and modifies memory to make them work at the same time.

Concurrent tri-Color marking algorithm is a classical algorithm, through write barrier, maintenance "Black object can not reference white object" This constraint, can guarantee the correctness of the program. Go1.5 will open the write barrier during the tagging phase. At this stage, if the user code wants to perform an action and modifies a black object to reference a white object, the write barrier code directly shades the white object to gray. To read the source code implementation, there is a very small detail: The original algorithm is only black reference white will need to be white mark, and the Go1.5 implementation is whether the black/gray/white object, as long as the white object is referenced, the white object is marked. The reason for this is that the go tag bitmap is in a different place from the object's memory and cannot be modified atomically, and the implementation of some thread synchronization is expensive, so the algorithm here has done some variant processing.

Only the tagging phase is added with the write barrier. Mark End Phase Stop the world, no user code running at the same time, no write barrier easy to understand, but the scanning phase why not add write barrier it? In fact, this phase will scan each goroutine in turn, and when scanning one of the goroutine, the goroutine needs to be paused, but not stop the world. User code and garbage collection do not modify the object at the same time after pausing, so write barrier is not required. The scanning phase does not need to be recursive to mark processing, so in fact the pause goroutine overhead is not big.

The scan phase places the stack objects of each goroutine in a queue. To the Mark phase, the object is taken from the queue and the algorithm is executed. Notice that there is also an end-of-mark phase, why do you need this stage? Also, why does the mark end phase Stop the world? In fact, from a technical point of view, no Stop the world algorithm is there. Go1.5 is doing this just to keep the implementation simple. During tagging, because of the presence of the write barrier, objects are kept in the work queue, and garbage collection is constantly fetching data from the work queue, and it is not easy to define a "complete" state. If stop the world, when the producer stops, the consumer will end up consuming the queue is a "done" state.

The scanning process is to assemble the root into the queue, the tagging process is to keep the data from the queue, and then tag the new data into the queue; At the end of the tag, after the Stop the world, the GC will re-perform the scanning and tagging process. Because the scan phase does not scan all the root collections, only the Goroutine stack is scanned. This time, the other root collections are scanned and tagged.

With regard to the color problem of assigning objects, the mark end phase is to assign black objects, while the other stages are assigned white objects. The allocation of white objects, as long as the next active object reference to it, the process of marking the algorithm will ensure that the object will be hit the correct color. The black object is assigned directly at the end of the tag, because there is no marking procedure in the next step.

The garbage collection algorithm requires some additional information, such as the color of the object tag, and whether the object contains pointers (references to other objects). Go1.5 uses bitmaps to record this information, and each memory address should have a bitmap.

There are two different bitmap representations, where the stack, data area, and BSS bitmap are only required to be represented by 1 bits. Because these are root collection objects and do not require the tag to be active, you need to use 1 bits to indicate whether the pointer needs to be accessed during GC. The other is a heap bitmap, each machine word length to be represented by 2 bits. The low level is the same as before, 0 means skip, 1 means access pointer is required. High levels have other meanings. If the two bits are all 0, it means that this is a garbage object, garbage collection does not handle it.

The meaning of a high is determined by the address corresponding to the tag bit, relative to the location of the first address of the assigned object:

    • If it is the first, the high is the GC mark bit
    • If the second, the high is the GC's checkmarked bit (for debugging)
    • If it is a third or later, the high indicates that the object is still being described.

Go1.5 the 2-bit bitmap is grouped, preceded by 4 high-level put together, followed by 4 lows together.

Mark bit only has 1 bits, how to distinguish black and white gray three color? The Go1.5 convention, which marks and is a gray object in the queue, is marked with a black object that is not in the queue, and the last marker is a white object.

There is also a lot of detail in this, like object modification is not a simple memory change, and the corresponding marker bit may be bit to affect. There are, for example, Memmove is also writebarrier, involving a bitmap change. What's going to happen to memmove? The object that references this object is modified. But how to track which objects refer to this object???

GO1.5 expects the GC to delay before 10MS, and the app code can run at least 40 seconds in 50 seconds. Take out 25% of the Gomaxprocs's computational power to execute the GC. How do you control the frequency and CPU usage of GC? The Go1.5 has a GC controller and corresponding algorithms.

Before 1.5, garbage collection was STW, and GC trigger timing was easy to capture. The GC is triggered when the heap size reaches twice times the heap size after the last GC was completed. For example, when the last GC was completed and the heap size is 4M, the GC is triggered when the heap grows to 8 m. In Go1.5, however, the calculation of allocations becomes blurred, as the application may allocate objects during GC. So an algorithm is needed to estimate (the GC controller can analogy the operating system's progress Scheduler, the control algorithm can be analogous to the algorithm of process scheduling).

The algorithm is more complex, only to mention a little bit of basic ideas, details see the original. First, you need to estimate the workload of the scan. The CPU consumed by the tagging phase is also controlled by the number of scans. User code assistance is then required. If there is only one background garbage collection thread, and the user code is allocated faster than the tag speed, the worst case garbage collection will never be completed. To solve this problem, the user code may need to assist in performing the scan when it is assigned. Next, CPU scheduling is required, user code assistance can lead to overestimating or underestimating GC CPU budget, so to monitor user assistance and background recycling CPU, if less than 25%, switch background scan run to increase GC CPU utilization, otherwise you can not run background collection to reduce the CPU. Finally, the trigger frequency control, together with user assistance and CPU scheduling and trigger control, creates a feedback loop that enables both CPU utilization and heap growth to reach the optimization target.

This article is written in advance, the current Go1.5 official version has not been released, but git code has been able to experience the ^_^

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.