V8 JavaScript engine Research (iii) implementation of garbage collector

Source: Internet
Author: User
Tags compact


Introduction to V8 garbage collection mechanism


The implementation of the V8 garbage collector is a very important reason for V8 efficiency.



V8 automatically reclaims object memory that is no longer needed at run time, which is also garbage collection.



V8 uses a combination of full-pause (Stop-the-world), generational (generational), and precise (accurate) garbage collection mechanisms to ensure faster object memory allocations, shorter pauses when garbage collection is triggered, and no memory fragmentation.



V8 garbage collection has the following characteristics:


    • Suspends execution of all programs when a garbage collection cycle is processed.
    • In most garbage collection cycles, only the objects in the partial heap are processed at a time, minimizing the impact of the pause program.
    • Know exactly what objects and pointers are in memory and avoid mistakenly treating objects as a memory leak caused by pointers.


In V8, the object heap is divided into two parts: the new generation of newly created objects, and the Laosheng generation where objects that survived after a garbage collection cycle were promoted. If an object is moved during a garbage collection cycle, V8 will update all pointers to this object.


deep analysis of V8 garbage collection mechanism


Like most dynamic type languages, JavaScript uses a garbage collection mechanism to manage memory, but ECMAScript does not have any interface for manipulating garbage collection, and all operations are done automatically by the JavaScript engine. This saves developers from having to manually manage memory allocations for a variety of tricky issues.



V8 uses a variety of very complex strategies to ensure efficient memory usage and recycling when implementing the JavaScript garbage collection mechanism, which is one of the reasons why node chooses V8 as the JavaScript engine. If the V8 is only used in the browser, the memory leaks and other memory problems will only affect an end user, and the V8 engine will not have a long life in the browser, and once the page has been closed, it is freed. However, if you use V8 on the server side (using node as the server environment), then the impact of memory problems will spread to a large number of end users, obviously V8 need efficient memory management mechanism.



V8 limits the amount of memory used, since the initial V8 is designed as a browser JavaScript engine, and the second is the limitation of the garbage collection mechanism of V8, and the longer the garbage collection cycle, the greater the impact on the program if the memory allocation is too large. The memory size limit for V8 is configurable.


The heap composition of V8


The heap of V8 consists of a series of regions:


    • Cenozoic: The creation of most objects is allocated here, this area is small, but garbage collection is very frequent, independent of other areas.
    • Laosheng The pointer area: contains most of the objects that may contain pointers to other objects. Most of the objects that are promoted from the new generation (surviving for some time) are moved here.
    • Laosheng Data area: Contains the original data object (no pointer to other objects). Strings, boxed numbers, and double-precision unboxed arrays are moved here after being promoted from the Cenozoic.
    • Large Object area: This is where large objects larger than other extents are stored. Each object has its own mmap memory. Large objects are not recycled.
    • Code area: The Code object (that is, the instruction object containing the JIT processing) is stored here. The only areas with execute permission (the code is too large to be stored in the large object area, which can also be executed, but does not mean that the large object area has execute permissions).
    • Cell area, attribute cell area, and map area: Contains cell, attribute cell, and map. Each zone holds objects of the same size and structure that they point to.


Each zone consists of a series of Memory pages (page). A memory page is a piece of connected memory that is allocated using the operating system's mmap (or the equivalent of Windows). The page size for each area is 1MB and is aligned in 1MB, except for the large object area (which may be larger). In addition to storing objects, memory pages contain a header information (including some flags and metadata) and a token bitmap (indicating which objects are alive). Each page also has a slot buffer that is allocated in separate memory, placing a set of objects that may point to other objects on the current page.


Identifying pointers


The primary solution to any garbage collector is to differentiate between pointers and data. The garbage collector needs to follow the pointer to find the living object. Most garbage collection algorithms move objects in memory (to reduce memory fragmentation and make memory more compact), so you need to be able to overwrite where the pointer is pointing and not break the old data.



There are currently three main ways to identify pointers:


    • conservative Law (Conservative). can be implemented without compiler support. Basically, all the aligned words on the heap are considered pointers, which means that some data will also be treated as pointers. Therefore, when an integer is treated as a pointer, it can cause memory that should be recycled to be mistaken for a pointer (actually an integer) pointing to it without being recycled, which can create a very strange memory leak. We cannot move any object in memory because it alters the data itself when the integer is incorrectly treated as a pointer, which results in no benefit to the garbage-collected memory-grooming algorithm. (The benefits of memory continuity are obvious: it's easier to allocate large memory, saving memory fragmentation from constant lookups, and cache caches hitting a high probability).
    • compiler hint Method (Compiler hints). If you use a static type language, the compiler can accurately tell the offset address of a pointer in each class. As long as you know which class an object is implemented from, you can find all its pointers. Java Virtual machines Use this approach, but this scenario is impractical for dynamic type languages such as JavaScript, because properties in an object can be either pointers or data.
    • pointer labeling method (Tagged pointers). This method needs to keep one at the bottom of each word (word) to indicate whether it is a pointer or data. This approach has compiler-supported limitations, but is simple to implement. V8 uses this method, and some statically typed languages also use this method, such as Ocami. The V8 stores all data in 32bit width, with the lowest one being 0, while the minimum two bits of the pointer are 01.


V8 will belong to all small integers in the range 230 to 230-1 (denoted by sims inside V8) using a 32-bit word (word) with a minimum of 0, and a pointer with a minimum of two bits of 01. Because all objects are aligned at least 4 bytes (byte), there is no problem with this implementation. The vast majority of objects on the heap contain a set of tagged words (word), so the garbage collector can quickly scan them, find pointers, and ignore integers. Some objects, such as String, are known to contain only data (no pointers), and their contents can be unmarked.


Sub-generational recycling


In most programs, the life cycle of most objects is short, with only a small number of objects having a longer life cycle.



With this in mind, the V8 divides the heap into two generations: the Cenozoic (new-space) and the Laosheng generation (old-space).



The object was created in the Cenozoic, and the Cenozoic was small and only 1~8mb. allocating memory in the Cenozoic is easy, holding an allocation pointer to the memory area, incrementing the pointer when you need to allocate memory to the new object. When the allocation pointer reaches the end of the Cenozoic, a small garbage collection cycle (scavenge algorithm) is triggered to quickly reclaim dead objects from the Cenozoic. For objects that survived in both small garbage collection cycles, they were promoted (promote) to Laosheng generation. Laosheng reclaims memory in a very infrequent large garbage collection cycle such as Mark-Clear (Mark-sweep) or mark-organize (mark-compact). When a sufficient number of objects are promoted to the Laosheng, a large garbage collection cycle is triggered, as for the timing, depending on the memory size of the Laosheng generation and the behavior of the program.


New generation garbage collection algorithm


Because garbage collection is very frequent in the Cenozoic, the recovery algorithm must be fast enough. V8 's new Generation garbage collection uses the scavenge algorithm (an implementation of the Cheney algorithm).



The Cenozoic is divided into two sub-regions of equal size: the to and from regions. The memory allocations for most objects occur in the to zone (some specific types of objects, such as executable code objects, are assigned in the Laosheng generation). When the to area is filled, swap the to and from areas, now all objects are stored in the from area, and then copy the live objects from the from area to the to area or promote them to the Laosheng generation, the copy process is actually the memory of the to area of the process, the memory footprint of the to area will be continuous, So that the next memory allocation remains fast and easy. After the copy, release the memory from the zone.



It can be seen that in the new generation of garbage collection cycle, there is always half of the memory space-time, due to the new generation is very small, so wasted space is not big, and most of the new generation of objects life cycle is very short, so the need to replicate the living objects are very few, so the time efficiency is very high.



Characteristics of new Generation garbage collection:


    • Small memory Area
    • Waste half the space
    • More garbage
    • Small Copy Range
    • High and fast execution frequency


Algorithm specific execution process:



Maintain two pointers in the To zone: Allocationptr (point to the address where memory is allocated for the next object) and Scanptr (the address of the object that needs to be scanned).



First, the to area fills up, swapping the to and from zones, when the to area is empty and the from area is full.



Copy all the objects from the root object in the from area to the to area, you can think of the to area as a two-terminal queue, Scanptr points to the first team, allocationptr to the tail, and then executes the loop, each time traversing a to area of the object, That is, each time you traverse an object pointed to by a scanptr, and then increment the scanptr. When traversing to an object, parse all the pointers inside the object in turn, ignoring it if the pointer does not point to the object in the From zone (because it must point to an object in the Laosheng generation, and the object in the Laosheng is not recycled), or if the pointer points to an object in the From area, And this object has not been copied to the to area (the conversion address has not been set, as described later), it is copied to the end of the to area, that is, Allocationptr point to the location, while self-increment allocationptr. The transform address of this object in the from zone is then set to the position in the to area after replication. The transform address is saved in the first word of this object (word) Instead of the map address. The garbage collector can distinguish between the conversion address and the map address by checking the lows in the first word of the object, the low of the map address is marked, and the conversion address is not. Subsequent when parsing to a pointer to an object in the From zone, and the object has been copied to the to zone (there is a converted address), simply update the address pointed to by this pointer to the object's translation address.



When all the objects in the to area are processed, that is, when Scanptr meets Allocationptr, the algorithm ends. All objects in the from are garbage objects and can be released entirely.



Abstract, the entire algorithm traversal process is actually a BFS (breadth-first search) process, before scanptr all objects are the object whose internal pointers have been parsed, the object between Scanptr and Allocationptr is the object that needs to parse the internal pointer, When Scanptr and allocationptr coincide, the search is over.



The pseudo code of the algorithm is as follows:


 
def scavenge():
  swap(fromSpace, toSpace)
  allocationPtr = toSpace.bottom
  scanPtr = toSpace.bottom

  for i = 0..len(roots):
    root = roots[i]
    if inFromSpace(root):
      rootCopy = copyObject(&allocationPtr, root)
      setForwardingAddress(root, rootCopy)
      roots[i] = rootCopy

  while scanPtr < allocationPtr:
    obj = object at scanPtr
    scanPtr += size(obj)
    n = sizeInWords(obj)
    for i = 0..n:
      if isPointer(obj[i]) and not inOldSpace(obj[i]):
        fromNeighbor = obj[i]
        if hasForwardingAddress(fromNeighbor):
          toNeighbor = getForwardingAddress(fromNeighbor)
        else:
          toNeighbor = copyObject(&allocationPtr, fromNeighbor)
          setForwardingAddress(fromNeighbor, toNeighbor)
        obj[i] = toNeighbor

def copyObject(*allocationPtr, object):
  copy = *allocationPtr
  *allocationPtr += size(object)
  memcpy(copy, object, size(object))
  return copy
Write barrier


One problem with the above algorithm is ignored: If an object in the Cenozoic has only one pointer pointing to it, and this pointer belongs to an object in the Laosheng generation, then how to recognize that this object in the Cenozoic is not a garbage object. It is unrealistic to scan all the objects in the Laosheng generation because of the huge amount of consumption.



To solve this problem, V8 maintains a list in the storage buffer, documenting the situation where the Laosheng object points to the new generation object. When a new object is created, there is no pointer to it, but when an object in a Laosheng generation points to it, it is recorded in the list, which is also known as a write barrier because each write operation requires the execution of such a process.



Each time a pointer to an object such a write operation needs to execute such a series of additional instructions, which is the cost of garbage collection mechanism, but the cost is not large, the write operation is not very often compared to read operation. Some other JavaScript engine garbage collection algorithms use a read barrier, but this requires hardware assistance to have a lower consumption. V8 uses a number of mechanisms to optimize the consumption of write barriers:


    • It is often possible to verify that an object is in the Cenozoic, and that the write barrier does not occur on the operations of Cenozoic objects.
    • When all references to an object occur locally, the object is allocated on the stack, and the object on the stack obviously does not require a write barrier.
    • The old-and-new situation is very rare, so the rapid detection of new and new and old-and-old two common situations can improve the performance, because these two situations do not need to write barriers. Each page is aligned in 1MB, so you can find the page it is on by filtering the low 20 bits (bit) of an object. The header contains an identity that points to whether it is in the new and old generation, so you can quickly check out the generation of two objects.
    • When a storage buffer with a relational list is filled, V8 will sort it, merge the same items, and remove the pointer no longer pointing to the new generation.
Garbage collection algorithm for Laosheng generation


The scavenge algorithm has very fast recovery and grooming capabilities for a small amount of memory, but has a significant memory overhead because it supports both the to and from zones. This is acceptable for a small number of memory areas, but it is impractical to use the scavenge algorithm for more than a few megabytes of memory. Therefore, for the Laosheng generation of hundreds of MB of memory space, V8 uses two closely connected algorithms, namely the mark-sweep algorithm and the tag-collation algorithm.



Both of these algorithms contain two phases: the tagging phase, the purge phase, or the grooming phase.


Mark Purge (mark-sweep)


Iterates through all the objects in the heap, marking all living objects. In the purge phase, only objects that are not marked are cleared, that is, garbage objects. Because the proportion of garbage objects is very small, the efficiency is very high.



The problem with tag cleanup is that when a purge finishes, the memory space tends to be fragmented, causing the memory to no longer be contiguous. When you need to allocate an object that occupies a large amount of memory, if no memory fragment satisfies the size of the space required by this object, V8 will not be able to complete this assignment, resulting in a garbage collection.


Mark Grooming (mark-compact)


In order to solve the memory fragmentation problem caused by the Mark cleanup, V8 introduced the tagging collation. The tagging phase of the marker grooming is the same as the Mark cleanup, but the purge stage becomes the grooming stage: moving the Living object to a section of the memory area and clearing the memory outside the boundary directly after moving. The grooming phase involves the movement of objects and is therefore inefficient, but ensures that no memory fragmentation is generated.



Laosheng Garbage Collection features:


    • Large Memory Area
    • The stored object has a long life cycle.
    • Fewer objects to be purged, small recycling range
    • Low and slow execution frequency


Algorithm specific execution process:



In the tagging phase, all objects that are alive on the heap are discovered and tagged. Each page contains a tagged bitmap, and each bit of the bitmap corresponds to a word on the page (a pointer size is one word, that is, the size of the object's starting address is a word), which is necessary because the object can start at any offset address of the word alignment. Bitmaps occupy a certain amount of memory overhead (32-bit systems account for 3.1%, 64-bit systems are 1.6%), but all memory management systems have similar overhead. The 2 bits (bit) are used to represent the state of an object, and the object occupies at least two characters, so the 2 bits do not overlap.



V8 uses a tri-color notation, and the state of the object can be labeled as three (so it is labeled with 2 bits):


    • If it is white, then this object is not found by the garbage collector;
    • If it is grayed out, this object has been discovered by the garbage collector, but its adjacency object has not been processed;
    • If it is black, then this object has been discovered and all its adjacency objects have been processed;


If you think of a heap as a graph of pointers between objects, then the markup algorithm is essentially a DFS (depth-first search). At the beginning of the marker cycle, the marker bitmap is emptied and all objects are white. All objects that are accessible from the root are marked gray and then placed in a separately assigned double-ended queue. Then loop through each object in the queue: each time the garbage collector pulls an object from the queue and marks it as black, it marks the Adjacency object (the object pointed to by the inner pointer) gray and puts it in a double-ended queue. When the queue is empty, all discovered objects are marked black, at which point the algorithm ends. For particularly large objects, such as long arrays, processing may be fragmented to reduce the chance of queue overflow. If the queue overflows, the objects are still marked dimmed but not placed in the queue (at which point their adjacency objects have not yet been discovered), and when the queue is empty, the garbage collector scans the remaining gray objects in the heap and then puts them in the queue to continue the tagging process.



The pseudo code is as follows:


markingDeque = []
overflow = false

def markHeap():
  for root in roots:
    mark(root)

  do:
    if overflow:
      overflow = false
      refillMarkingDeque()

    while !markingDeque.isEmpty():
      obj = markingDeque.pop()
      setMarkBits(obj, BLACK)
      for neighbor in neighbors(obj):
        mark(neighbor)
  while overflow
    

def mark(obj):
  if markBits(obj) == WHITE:
    setMarkBits(obj, GREY)
    if markingDeque.isFull():
      overflow = true
    else:
      markingDeque.push(obj)

def refillMarkingDeque():
  for each obj on heap:
    if markBits(obj) == GREY:
      markingDeque.push(obj)
      if markingDeque.isFull():
        overflow = true
        return


When the tag algorithm is finished, all living objects are marked black, and all the dead objects remain white. This information will be used in the purge or grooming phase, depending on which algorithm is used next, both of which reclaim memory in page-level units (V8 pages are 1MB of contiguous memory, unlike virtual memory pages).



The purge algorithm scans the contiguous range of garbage objects, then frees up their space and joins them in the free memory chain list. Each page contains a separate list of free memory chains that represent small memory areas (<256 words), Medium memory area (<2048 word), large memory area (<16384 characters), and large memory areas. The purge algorithm is fairly straightforward, simply traversing the page's tagged bitmap, locating the white object within the range, and releasing it. The free memory chain list is used by a large number of objects that are promoted from the Cenozoic scavenge algorithm to be placed in the Laosheng generation, and are also organized to move objects. Some objects can only be allocated in the Laosheng generation, so the free memory chain list is also used for allocation.



The grooming algorithm attempts to move an object from a fragmented page (a page containing many small free memory) and consolidate it together. These objects are migrated to a different page, so you may need to assign a new page, and the page that is moved out can be returned to the operating system. The process of finishing is very complex, and the approximate step is to copy each living object in the fragmented page to a piece of memory that is allocated by the spatial memory list, and then write a new translation address on the first word of the object in the fragmented page, and in the subsequent migration, the old address of the object is recorded, and after the migration is over, V8 iterates through all the pointers that record the old addresses, and then updates the pointers to the new converted addresses. If a page is very active and there are a lot of other pages with pointers pointing to objects on this page, then this page will be retained until the next garbage collection cycle is processed.



V8 's Laosheng Memory recycling combines the two algorithms of tag cleanup and labeling, which in most cases uses tag cleanup when space is not sufficient to allocate objects promoted from the new generation.


Mark-Clear and mark-Organize optimization increment marker (Incremental marking)


Because V8 pauses all the code that is executing the business during the recovery period of the tag cleanup and tagging, this can cause the application to lag and, as the heap size grows, the time of the lag increases rapidly, which is obviously unacceptable. So V8 during tagging, the incremental tagging method is used to split the marked process into many parts, marking only a small fraction at a time, and then recovering the execution of the business code, then marking it, so that the loop executes the tag alternately. In this way, the entire time of the original application will be split into a number of small time slices, greatly improving the responsiveness of the application.



The biggest difference between an incremental tag and a standard markup method is that, during markup, the object's forward graph changes (because the execution of the business code is allowed intermittently). So what needs to be addressed is the case where the black object points to the white object. The black object has been completely processed by the garbage collector, so it will not be processed again, so if this happens, the white object (which is now non-spam) is still recycled as a garbage object. V8 solution is still the use of write barrier technology, at this time not only the Laosheng generation point to the new generation when the write barrier, when the black object pointing to the white object will also trigger the write barrier, the black object will be re-marked as a gray object, and then put into the double-ended queue, when the tag algorithm later processing to the object in the All objects pointed to by the object are marked gray, and the previous white object has become grayed out, and the target is reached.


Lazy Cleanup (lazy sweeping)


When the increment tag is complete, the lazy purge begins when all objects are marked as either alive or dead, the heap knows exactly how much memory can be reclaimed, but instead of having to recycle all the dead objects at a time, the garbage collector can choose to reclaim some of the memory as needed. Until all garbage objects have been reclaimed, the entire increment tag-the period of inertia cleanup ends.


More Optimizations


V8 has joined the parallel purge, the main thread does not manipulate the dead object, the independent thread is responsible for recovering the memory of the dead object, the whole process only need a very small number of synchronous operation.



At the same time V8 is experimenting with parallel tagging and will introduce this technology in the future.


Summarize


Garbage collection is a very complex technology that requires a combination of algorithms and lots of optimizations to achieve an efficient and fast garbage collector, but fortunately the engine has done all the work, and developers need to focus on more important business logic.



V8 JavaScript engine Research (iii) implementation of garbage collector


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.