Golang Garbage Collection

Source: Internet
Author: User

Garbage collection (GC) is a very important part of Golang.

Introduction to GC Algorithms

Three classic GC algorithms: reference count (reference counting), tag-sweep (Mark & Sweep), node replication (Copying garbage Collection), generational collection (generational Garbage Collection).

A. Reference count

OC used this method before (I don't know what it is now).

The idea of reference counting is very simple: each cell maintains a field and holds the number of references (similar to the direction graph) that other cells point to it. When the reference quantity is 0 o'clock, it is recycled. The reference count is progressive and allows the overhead of memory management to be distributed throughout the program. The share_ptr of C + + uses the reference calculation method.

The implementation of the reference counting algorithm is generally to put all the units in a cell pool, such as a free list. So that all the cells are strung up and the reference count can be made. The newly assigned cell count value is set to 1 (note that it is not 0 because the application generally says PTR = new object). Each time a pointer is set to point to the cell, the count value of the cell is added 1, and each time a pointer to it is deleted, its count is reduced by 1. When its reference count is 0, the unit is recycled.

Although this is easier to say, there are many details to consider when implementing it, such as deleting a cell, and all the cells it points to need to subtract 1 from the reference count. So what if, at this point, you find that the reference count for one of the units is 0, is it recursive or is it a different strategy? Recursive processing can cause system bumps.

Advantages

    1. Progressive type. Memory management is intertwined with the execution of the user program, dispersing the cost of the GC into the entire program. Unlike the tag-sweep algorithm that requires STW (Stop the WORLD,GC when the user program is suspended).
    2. The algorithm is easy to implement.
    3. Memory units can be recovered quickly. Compared to other garbage collection algorithms, the heap is exhausted or a threshold is reached to be garbage collected.

Disadvantages

    1. The original reference count cannot handle circular references. This is probably the most criticized shortcoming. But there are many solutions to this problem, such as strong citations.
    2. Maintaining reference counts reduces operational efficiency. The memory unit's update delete and so on all need to maintain the related memory unit the reference count, compared to some tracing garbage collection algorithm does not need these cost.
    3. Cell Pool free list implementation is not cache-friendly, this will lead to frequent cache miss, reduce the efficiency of program operation.

B. Mark-Sweep

The tag-sweep algorithm is the first automatic memory management, tracking-based garbage collection algorithm. The algorithm idea was put forward in the 70 's, it is a very old algorithm. The memory unit is not recycled at once, but remains unreachable until a threshold or a fixed length of time is reached. At this point the system suspends the user program, which is STW, and instead executes the garbage collection program.

The garbage collector makes a global traversal of all surviving units to determine which cells can be recycled. The algorithm is divided into two parts: Mark (Mark) and sweep (sweep). The marking phase indicates that all surviving units, the sweep phase, recycle the garbage cells.

The advantage of the tag-sweep algorithm is that the tracking-based garbage collection algorithm has the advantage of avoiding the disadvantage of the reference counting algorithm (you cannot handle circular references and need to maintain pointers). Shortcomings are also obvious, need STW.

Tri-Color Labeling algorithm

The three-color marking algorithm is the improvement of the marking stage, the principle is as follows:

    • At first all objects are white.
    • Scan all available objects from the root, mark as Gray, and put in the pending queue.
    • Removes the gray object from the queue, marks its reference object as gray and puts it in the queue, and itself is marked black.
    • Repeat 3 until the gray object queue is empty. In this case, the white object is garbage and is recycled.

as shown below.


One of the obvious benefits of tri-color tagging is the ability to allow user programs and Mark concurrency.

C. Node replication

Node replication is also a tracking-based algorithm. It divides the entire heap into two halves (semi-space), one contains existing data, and the other contains data that has been discarded. Node-replicated garbage collection starts with the role of toggle two-half, and the collector iterates through the surviving data structure in the old half area, that is, fromspace, and copies it to the new half when it first accesses a cell, i.e. tospace. After all the surviving units in the Fromspace have been accessed, the collector creates a copy of the surviving data structure in the Tospace, and the user program can start again.

Advantages

    1. All surviving data structures are indented and arranged at the bottom of the tospace so that there is no memory fragmentation problem.
    2. Obtaining new memory can be achieved simply by incrementing the free-space pointer.

Disadvantages

    1. Memory is not fully utilized, and half of the memory space is wasted.

D. Collection of generations

Tracking-based garbage collection algorithm (tag-Sweep, node-copy) A major problem is wasting time on objects with long lifecycles (long-life objects do not need to be scanned frequently).

At the same time, memory allocations exist in such a fact that "most object die Young". Based on these two points, the generational garbage collection algorithm stores objects in two (or more) areas of the heap over a life cycle, which are generational (generation). For the Cenozoic region, the frequency of garbage collection is significantly higher than that of the old age region.

Allocation of objects from the new generation of the allocation, if the later discovery of the object's life cycle is longer, then move it to the old age, the process is called promote. With constant promote, the size of the last Cenozoic is not particularly large in the whole heap. When collected, the focus is on the relative efficiency of the new generation, and the STW time will be shorter.

Advantages

    1. Better performance.

Disadvantages

    1. Achieve complex

Golang GC

The go language provides a variable gogc that can be used to control the GC. This variable indicates that the total heap memory is larger than the percentage of heap memory that is occupied by the node after the most recent GC. If gogc=100 indicates that the last GC has elapsed, the total heap memory is 100% larger than the heap memory occupied by all the available nodes, that is, the total heap memory is twice times the amount of node memory available.

The larger the value, the faster the GC, but the more memory the program consumes, the less obvious the GC effect. Conversely, the GC has a clear effect on memory cleanup, but it often takes more time.

When will the GC be triggered?

When allocating larger than 32K byte objects on the heap, it detects whether the garbage collection condition is met and garbage collection if satisfied.

func mallocgc(size uintptr, typ *_type, needzero bool) unsafe.Pointer {    ...    shouldhelpgc := false    // 分配的对象小于 32K byte    if size <= maxSmallSize {        ...    } else {        shouldhelpgc = true        ...    }    ...    // gcShouldStart() 函数进行触发条件检测    if shouldhelpgc && gcShouldStart(false) {        // gcStart() 函数进行垃圾回收        gcStart(gcBackgroundMode, false)    }}

The above is automatic garbage collection, there is also an active garbage collection, by calling runtime. GC (), which is a blocking type.

// GC runs a garbage collection and blocks the caller until the// garbage collection is complete. It may also block the entire// program.func GC() {    gcStart(gcForceBlockMode, false)}

GC Trigger conditions

The trigger condition focuses on the middle part of the following code: forceTrigger || memstats.heap_live >= memstats.gc_trigger . Forcetrigger is the symbol of ForceGC, and the latter sentence means that the active object on the current heap is greater than the GC trigger threshold set at the time of our initialization. Heap_live will continue to be updated at malloc and free, and will not expand here.

// gcShouldStart returns true if the exit condition for the _GCoff// phase has been met. The exit condition should be tested when// allocating.//// If forceTrigger is true, it ignores the current heap size, but// checks all other conditions. In general this should be false.func gcShouldStart(forceTrigger bool) bool {    return gcphase == _GCoff && (forceTrigger || memstats.heap_live >= memstats.gc_trigger) && memstats.enablegc && panicking == 0 && gcpercent >= 0}//初始化的时候设置 GC 的触发阈值func gcinit() {    _ = setGCPercent(readgogc())    memstats.gc_trigger = heapminimum    ...}// 启动的时候通过 GOGC 传递百分比 x// 触发阈值等于 x * defaultHeapMinimum (defaultHeapMinimum 默认是 4M)func readgogc() int32 {    p := gogetenv("GOGC")    if p == "off" {        return -1    }    if n, ok := atoi32(p); ok {        return n    }    return 100}

The main process of garbage collection

Three-color labeling method, the main flow is as follows:

    1. All objects are initially white.
    2. From Root, find all the available objects, mark them as Gray, and put them in the pending queue.
    3. Walks the gray object queue, marks its reference object as Gray, puts it in the pending queue, and itself is marked black.
    4. The Gray object queue is processed, and the cleaning work is carried out.


      Go_gc.png

Here to explain:

    1. The first step is to start with Root, which includes the global pointer and the pointer on the goroutine stack.

    2. Mark has two of processes. The first is to traverse from root and mark it as gray. Traverse the gray queue. The second Re-scan global pointers and stacks. Because Mark and the user program are parallel, there may be a new object assignment at the time of Procedure 1, which needs to be recorded through the write barrier. Re-scan finish the check again.

    3. Stop the world has two of processes. The first is when the GC is about to start, and this time it's mostly about prep work, like enable write barrier. The second process is the re-scan process mentioned above. If there is no STW at this time, then mark will be endless.

In addition, the corresponding gcphase for each stage are as follows:

    • OFF: _gcoff
    • Stack Scan-mark: _gcmark
    • Mark Termination: _gcmarktermination

Writing barrier (write barrier)

Write barrier in garbage collection can be understood as a piece of code that the compiler intentionally inserts during a write operation, and the corresponding read barrier.

Why write barrier is required, it is simple, for the garbage collection algorithm that runs concurrently with the user program, the user program will always modify the memory, so it needs to be recorded.

Golang 1.7 Before the write barrier used the classic dijkstra-style insertion write barrier [Dijkstra '], STW the main time-consuming process in stack re-scan. A mixed write barrier mode since 1.8 (yuasa-style deletion write barrier [Yuasa ' 90] and Dijkstra-style insertion write barrier [Dijkstra ' 78]) to avoid re-scan.

Mark

The code for garbage collection is mainly concentrated in function Gcstart ().

// gcStart 是 GC 的入口函数,根据 gcMode 做处理。// 1. gcMode == gcBackgroundMode(后台运行,也就是并行), _GCoff -> _GCmark// 2. 否则 GCoff -> _GCmarktermination,这个时候就是主动 GC func gcStart(mode gcMode, forceTrigger bool) {    ...}

* STW Phase 1

Preparation before the GC starts.

func gcStart(mode gcMode, forceTrigger bool) {    ...    //在后台启动 mark worker     if mode == gcBackgroundMode {        gcBgMarkStartWorkers()    }    ...    // Stop The World    systemstack(stopTheWorldWithSema)    ...    if mode == gcBackgroundMode {        // GC 开始前的准备工作        //处理设置 GCPhase,setGCPhase 还会 enable write barrier        setGCPhase(_GCmark)                gcBgMarkPrepare() // Must happen before assist enable.        gcMarkRootPrepare()        // Mark all active tinyalloc blocks. Since we're        // allocating from these, they need to be black like        // other allocations. The alternative is to blacken        // the tiny block on every allocation from it, which        // would slow down the tiny allocator.        gcMarkTinyAllocs()                // Start The World        systemstack(startTheWorldWithSema)    } else {        ...    }}

* Mark

The mark phase is run in parallel and is implemented by running the mark worker in the background.

Func gcstart (Mode gcmode, Forcetrigger bool) {...//start in background mark worker if mode = = Gcbackgroundmode {GCBG Markstartworkers ()}}func gcbgmarkstartworkers () {//Background marking is performed by Per-p G ' s.    Ensure that//the P has a background GC G. For _, P: = Range &AMP;ALLP {if p = = Nil | | p.status = = _pdead {break} if P.gcbgmarkwor Ker = = 0 {go gcbgmarkworker (P) NOTETSLEEPG (&work.bgmarkready,-1) noteclear (&wor K.bgmarkready)}}}//Gcbgmarkworker is always in the background, most of the time it is dormant, by Gccontroller to dispatch func Gcbgmarkworker (_p_ *p) {fo        R {//sleeps the current goroutine until certain conditions are met Gopark (...) ...//Mark Process Systemstack (func () {//Mark our goroutine preemptible so it stack//can be S Canned. This lets the mark workers//scan each of the other (otherwise, they would//deadlock). We must not modify anything on//the G stack. However, Stack shrinking is//disabled for Mark workers, so it's safe to//read from the G stack. Casgstatus (GP, _grunning, _gwaiting) switch _p_.gcmarkworkermode {default:throw ("Gcbgmarkworker : Unexpected Gcmarkworkermode ") Case Gcmarkworkerdedicatedmode:gcdrain (&AMP;_P_.GCW, GCDRAINNOBLOCK|GCD Rainflushbgcredit) Case Gcmarkworkerfractionalmode:gcdrain (&AMP;_P_.GCW, Gcdrainuntilpreempt|gcdrainflu Shbgcredit) Case Gcmarkworkeridlemode:gcdrain (&AMP;_P_.GCW, Gcdrainidle|gcdrainuntilpreempt|gcdrainflus Hbgcredit)} casgstatus (GP, _gwaiting, _grunning)}) ...}

The markup code for the

Mark stage is primarily implemented in function Gcdrain ().

Gcdrain scans roots and objects in work buffers, blackening grey//objects until all roots and work buffers has been D    Rained.func Gcdrain (GCW *gcwork, Flags gcdrainflags) {...//Drain root marking jobs. If Work.markrootnext < Work.markrootjobs {for! ( Preemptible && gp.preempt) {job: = atomic.            Xadd (&work.markrootnext, + 1)-1 if job >= work.markrootjobs {break}        Markroot (GCW, job) if idle && pollwork () {goto Done}}}    Process heap tag//Drain Heap marking jobs. For! (preemptible && gp.preempt)            {...//Remove the object from the gray queue Var b uintptr if blocking {b = Gcw.get ()} else {            b = Gcw.trygetfast () if b = = 0 {b = Gcw.tryget ()}} if b = = 0 {            Work barrier reached or Tryget failed.  Break}      Scan the reference object of the Gray object, mark as gray, into gray queue Scanobject (b, GCW)}} 

* Mark Termination (STW Phase 2)

Mark termination phase will stop the world. The function is implemented in Gcmarktermination ().

func gcMarkTermination() {    // World is stopped.    // Run gc on the g0 stack. We do this so that the g stack    // we're currently running on will no longer change. Cuts    // the root set down a bit (g0 stacks are not scanned, and    // we don't need to scan gc's internal state).  We also    // need to switch to g0 so we can shrink the stack.    systemstack(func() {        gcMark(startTime)        // Must return immediately.        // The outer function's stack may have moved        // during gcMark (it shrinks stacks, including the        // outer function's stack), so we must not refer        // to any of its variables. Return back to the        // non-system stack to pick up the new addresses        // before continuing.    })    ...}

Cleaning

func gcSweep(mode gcMode) {    ...    //阻塞式    if !_ConcurrentSweep || mode == gcForceBlockMode {        // Special case synchronous sweep.        ...        // Sweep all spans eagerly.        for sweepone() != ^uintptr(0) {            sweep.npausesweep++        }        // Do an additional mProf_GC, because all 'free' events are now real as well.        mProf_GC()        mProf_GC()        return    }        // 并行式    // Background sweep.    lock(&sweep.lock)    if sweep.parked {        sweep.parked = false        ready(sweep.g, 0, true)    }    unlock(&sweep.lock)}

Parallel sweep, which starts bgsweep () when the GC is initialized, and then loops through the background

func bgsweep(c chan int) {    sweep.g = getg()    lock(&sweep.lock)    sweep.parked = true    c <- 1    goparkunlock(&sweep.lock, "GC sweep wait", traceEvGoBlock, 1)    for {        for gosweepone() != ^uintptr(0) {            sweep.nbgsweep++            Gosched()        }        lock(&sweep.lock)        if !gosweepdone() {            // This can happen if a GC runs between            // gosweepone returning ^0 above            // and the lock being acquired.            unlock(&sweep.lock)            continue        }        sweep.parked = true        goparkunlock(&sweep.lock, "GC sweep wait", traceEvGoBlock, 1)    }}func gosweepone() uintptr {    var ret uintptr    systemstack(func() {        ret = sweepone()    })    return ret}

Whether it is blocking or parallel, it is done by the Sweepone () function. Memory management is span-based, MHEAP_ is a global variable, and all assigned objects are recorded in Mheap_. At the time of tagging, we just have to find the corresponding span of the object to mark, sweep the time to scan span, no marked span can be recycled.

// sweeps one span// returns number of pages returned to heap, or ^uintptr(0) if there is nothing to sweepfunc sweepone() uintptr {    ...    for {        s := mheap_.sweepSpans[1-sg/2%2].pop()        ...        if !s.sweep(false) {            // Span is still in-use, so this returned no            // pages to the heap and the span needs to            // move to the swept in-use list.            npages = 0        }    }}// Sweep frees or collects finalizers for blocks not marked in the mark phase.// It clears the mark bits in preparation for the next GC round.// Returns true if the span was returned to heap.// If preserve=true, don't return it to heap nor relink in MCentral lists;// caller takes care of it.func (s *mspan) sweep(preserve bool) bool {    ...}

Other

    • Gcwork

Each P has a GCW to manage the gray objects (get and put), and the GCW structure is gcwork. The core of Gcwork is WBUF1 and Wbuf2, which is a gray object, or work (all of which is called work).

type p struct {    ...    gcw gcWork}type gcWork struct {    // wbuf1 and wbuf2 are the primary and secondary work buffers.    wbuf1, wbuf2 wbufptr      // Bytes marked (blackened) on this gcWork. This is aggregated    // into work.bytesMarked by dispose.    bytesMarked uint64    // Scan work performed on this gcWork. This is aggregated into    // gcController by dispose and may also be flushed by callers.    scanWork int64}

Now that there is a work buffer on each P, is there a global work list? Yes. The benefit of binding a work buffer on each P is the same as the cache, which does not require a lock.

var work struct {    full  uint64                   // lock-free list of full blocks workbuf    empty uint64                   // lock-free list of empty blocks workbuf    pad0  [sys.CacheLineSize]uint8 // prevents false-sharing between full/empty and nproc/nwait    ...}

So why use two work buffer (WBUF1 and WBUF2)? For example, I'm going to get a work out, first take from Wbuf1, Wbuf1 is empty then get with wbuf2 swap. Move the full or empty buffer in work buffer to the work of global at other times.

The advantage of this is that in the get the time to go to the global work (more than goroutine to get the competition). What's interesting here is that the global work list is Lock-free, and is implemented by atomic manipulation of CAs. Here are a few functions to look at Gcwrok.

    • Initialization
func (w *gcWork) init() {    w.wbuf1 = wbufptrOf(getempty())    wbuf2 := trygetfull()    if wbuf2 == nil {        wbuf2 = getempty()    }    w.wbuf2 = wbufptrOf(wbuf2)}
    • Put
// put enqueues a pointer for the garbage collector to trace.// obj must point to the beginning of a heap object or an oblet.func (w *gcWork) put(obj uintptr) {    wbuf := w.wbuf1.ptr()    if wbuf == nil {        w.init()        wbuf = w.wbuf1.ptr()        // wbuf is empty at this point.    } else if wbuf.nobj == len(wbuf.obj) {        w.wbuf1, w.wbuf2 = w.wbuf2, w.wbuf1        wbuf = w.wbuf1.ptr()        if wbuf.nobj == len(wbuf.obj) {            putfull(wbuf)            wbuf = getempty()            w.wbuf1 = wbufptrOf(wbuf)            flushed = true        }    }    wbuf.obj[wbuf.nobj] = obj    wbuf.nobj++}
    • Get
 //get dequeues A pointer for the garbage collector to trace, blocking//if necessary to ensure all  Pointers from all queues and caches have//been retrieved. Get returns 0 if there is no pointers remaining.//go:nowritebarrierfunc (w *gcwork) get () uintptr {wbuf: = w.wbuf1.pt    R () if Wbuf = = nil {w.init () Wbuf = W.wbuf1.ptr ()//Wbuf is empty at the this point.             If Wbuf.nobj = = 0 {w.wbuf1, w.wbuf2 = w.wbuf2, w.wbuf1 wbuf = W.wbuf1.ptr () if wbuf.nobj = = 0 {            Owbuf: = Wbuf wbuf = getfull () If wbuf = nil {return 0} Putempty (owbuf) w.wbuf1 = Wbufptrof (WBUF)}}//Todo:this might bes a good place to add p Refetch code wbuf.nobj--return wbuf.obj[wbuf.nobj]}  
    • ForceGC
      In addition to the two GC triggering methods above: automatic detection and user active invocation. In addition to this, the Golang itself monitors the running state and, if there is no GC for more than two minutes, triggers the GC. The Monitor function is Sysmon () and is started in the main goroutine.
// The main goroutinefunc main() {    ...    systemstack(func() {        newm(sysmon, nil)    })}// Always runs without a P, so write barriers are not allowed.func sysmon() {    ...    for {        now := nanotime()        unixnow := unixnanotime()                lastgc := int64(atomic.Load64(&memstats.last_gc))        if gcphase == _GCoff && lastgc != 0 && unixnow-lastgc > forcegcperiod && atomic.Load(&forcegc.idle) != 0 {            lock(&forcegc.lock)            forcegc.idle = 0            forcegc.g.schedlink = 0            injectglist(forcegc.g)  // 将 forcegc goroutine 加入 runnable queue            unlock(&forcegc.lock)        }    }}var forcegcperiod int64 = 2 * 60 *1e9   //两分钟
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.