Generational garbage collection, based on the "
most objects will become garbage immediately after they are generated."This empirical fact is the starting point of design. This paper discusses the other garbage collection algorithms based on the fact that the reference counts out some optimization ideas. The key to the generational divide is:
- The next age is recorded for the object, and with each garbage collection, it increases;
- Assigning different memory spaces to objects of different ages, called a generation;
- For a generation of space, there is suitable for its garbage collection algorithm;
- A different garbage collection for each generation typically requires an additional piece of information: the information that objects in each generation are referenced by other generations of objects. This reference information for the current generation, playing the same role as "root", but also the starting point of garbage collection.
generational garbage collection is typical of Ungar's generational garbage collection. It divides the heap into the following form: , divided into the Cenozoic and the old age. In the Cenozoic, it is divided into generative space and surviving space. When the build space is full, the replication algorithm is garbage collected and copied into the surviving space. Matched with the previous copy algorithm, the surviving space is divided into two, from and to space. Each time a new generation of garbage collection occurs, it generates two garbage collections to and from the space to. for the old age, the direct mark_sweep recovery. For recordset (record set), is an array of records that are referenced between generations. It cannot only record the referenced object, because the referenced object is copied to the to space, the reference pointer itself is updated, only the referenced Cenozoic objects cannot be found. Therefore, objects in the old age must be recorded in the record set. Update recordset operation when assigning a new object and setting it to an old object when a field is:
write_barrier (obj, field, new_obj) { iffalse / condition, obviously $ rs[$rs _idx++] = obj // update recordset true // This flag is used to prevent duplicates from joining the recordset *field = new_obj // final update }
From the above, a field is added to the object of the old age:remembered, which is used to mark whether in a new generation of recordsets, to prevent duplication of processing.
In the object header of generational garbage collection, you also need to add two fields to the new generation objects:
Age: As mentioned above, it is the difference between what generation to use
forwarded: The algorithm used to determine whether the forwarding has been copied to do not: also want. Because it is a native copy algorithm, this field does not need to be placed in the header, directly take a domain. The following look at the specific algorithm: allocation algorithm, the process is very simple, not when the new generation of garbage collection. Note the initialization of the related header fields:
new_obj (size) { if $new _free + size >= $survivor 1_start minor_gc () // After the introduction, the new generation of recycling!!! if $new _free + size >= $survivor 1_ Start fail () obj = $new _free $new _free += size // Below is initialization obj.age = 0 obj.forwarded = obj.remembered = false Obj.size = size return obj}
The above ignores an error that allocates a large object larger than the Cenozoic size. Before looking at the new generation of recovery logic, first interrupt the next idea, first look at the specific copy process:
Copy (obj) {ifobj.forwarded = =false //the treated object doesn't care because it's already garbage. ifObj.age < Age_max//judging which generation to copyCopy_data ($to _suvivor_free, obj, obj.size) obj.forwarded=trueobj.forwarding=$to _suvivor_free $to _suvivor_free.age++$to _suvivor_free+=obj.size forChild:children (obj.forwarding)//attention is forwarding, the original book should be wrong! Copy and update each sub-objectChild =copy (Child)Elsepromote (obj)//Upgrade returnobj.forwarding} promote (obj) {new_obj=allocate_in_old (obj)ifNew_obj = =NULL major_gc ()//Mark_sweep, there's nothing to say.New_obj =allocate_in_old (obj)ifNew_obj = =NULL fail ()//Normal update on older objects in CenozoicObj.forwarding =new_obj obj.forwarded=true //Here's the key, after replication across generations, you need to update the recordset forChild:children (new_obj)if(Child < $old _start)//The current object has been copied to the old age, and once it is discovered that there is a sub-object in the Cenozoic, update the Cenozoic record set$rs [$rs _idx++] = new_obj//as previously stated, the records in the recordset must be objects in the old age.new_obj.remembered =true return //just need to find one, you can quit no longer processing}
It's time to look back at the new generation GC:
minor_gc () {$to _survivor_free=$to _survivor_start forR: $root//The copy algorithm, is to start from the root of the useful are copied away ifR < $old _start//relative to the general replication algorithm, to determine the difference between the generation ofR =copy (R)//another difference from the normal replication algorithm is that the old age of the Recordset is also treated as a rooti =0 whileI <$rs _idx has_new_obj=false forChild:children ($rs [i])ifChild < $old _start//found a sub-object in the new generation, need to copy it awayChild =copy (Child)ifChild < $old _start//copy is still in the new generation (another possibility is that, during the copy process, age is full and is copied to the old age!) )Has_new_obj =true //record, meaning that the current recordset is still being retained ifHas_new_obj = =false //need to record the records in a centralized record$rs [i].remembered =false$rs _idx--Swap ($rs [i], $rs [$rs _idx])ElseI++Swap ($from _survivor_start, $to _suvivor_start)}
These are the new generation recovery algorithms. There are some extra logic, such as to the space full of what to do, direct copy to the old age is also OK.
As for the old age recycling, do not mention, is mark_sweep to see the benefits:
Advantages:
- Very good throughput
Disadvantages:
- Based on an empirical hypothesis that "many objects will die when they are young", once the assumptions are not established, the entire base of the generational instability, because the new object garbage rate is not imagined high, then the Cenozoic will produce a large number of replication, while the old age of garbage, Mark_sweep frequently run.
- The Write barrier (the operation to update the Recordset) is an additional overhead: each object consumes memory in the recordset (1 bytes?). And when its run-time performance is greater than the benefits of a new generation of recycling, generational recycling loses its meaning.
- The mark_sweep of the old GC, inevitably, has an effect on the maximum pause time.
- Cross-generational circular references cannot be reclaimed at once (only when the new generation of objects is old enough to be treated in the older age)
For write_barrier overhead, optimizations can be optimized: for example, instead of logging an object-level recordset, a cross-generational reference to a section of memory is recorded. This makes it possible to record with bitmaps, significantly reducing memory overhead. At the same time, additional operations that traverse in memory blocks are introduced. It's not perfect anyway. For the generation of garbage collection, can also be strengthened, not only divided into two generations, but divided into multiple generations, but also an optimization idea. Here is a slightly more complex algorithm for solving the third and fourth drawbacks:
Train GCThe algorithm is slightly complicated, and the concept of train is introduced to make the algorithm easy to understand. In fact, it is a two-dimensional division of memory in the old age, reducing the scope of garbage collection to a "row" of two dimensions, and placing objects with referential relationships in the same "row". Here, the concept of "line" is abstracted into the train, the train. A shard of memory in each row is abstracted into car and compartment.
the reference between the cross-car and the cross-train has a record set. But because garbage collection recycles only "first" trains and recycles different carriages from front to back, the recordset can be greatly simplified,
don't care about cross-references from front to back. First, although it is optimized
old-age garbage collection algorithm, but for the new generation, there are also changes: In
in the Cenozoic replication, it is not copied to the new generation, but directly into the old generation(because the algorithm itself is to solve the old age recovery efficiency problem, so do not have to replicate in the new generation to replicate it). Copy to the old age, you need to determine the above two-dimensional coordinates, namely train and car. Specific replication logic:
//copy on a car of the old age, logic is not complicatedcopy (obj, to_car) {ifobj.forwarded = =false ifTo_car.free + obj.size >= To_car.start +CAR_SZ To_car=New_car (To_car) copy_data (To_car.free, obj, obj.size) obj.forwarding=To_car.free obj.forwarded=trueTo_car.free+=obj.size forChild:children (obj.forwarding) Child= Copy (child, To_car)//the reference to the relationship, all to the same car copy. As can be seen later, even if the car space is not redistributed, it is also within the same train. returnobj.forwarding}
Then see how it is called, how to put the object of the reference relationship together:
minor_gc () {//root refers to, regardless of, first copy to a new car again (old age recycling will handle)To_car = New_car (NULL)//New_car (NULL) This operation will not only create a new car, but will also create a new train!!!!!!!!!!!!!!!!!!!! forr: $rootifR <$old _start R=copy (R, To_car)//from the recordset reference, indicating that the old age must have reference to its object, directly find the corresponding car can be forremembered_obj: $young _rs forChild:children (remembered_obj)ifChild <$old _start To_car= Get_last_car (Obj_to_car (rememberd_obj))//key points: Two operations, 1 find the old object that references it car,2 find the last car that corresponds to CAR's trainChild =copy (Child, To_car)}
There is a logic to focus on, the object referenced from the root, is copied to the new train, and the object referenced from the recordset is copied to the referenced train.
The idea of garbage collection in the old age is the first train to the rear, the first car is the priority:
major_gc () {has_root_reference=falseTo_car= New_car (NULL)//Note that the new train//start with the root object forr: $rootsifIs_in_from_train (R)//only deal with this from_train, this from_train is the first trainHas_root_reference =true //in From_train, there is a reference from the root. This means that when you copy it to the rear car in train, the train is still referenced by the root ifIs_in_from_car (R)//also only handles From_car, this from_car is the first car in From_trainR = Copy (R, To_car)//moved to new train.//after processing here, the first train of the first car in the root of the reference of the active object, have been copied away, copied to the new train! //This extra judgment here, it's important. if!has_root_reference &&Is_empty (train_rs ($from _car))//The core logic of the train recovery algorithm: If, this train has not been referenced from the root, nor is there a reference from other trains, stating that this is a whole vehicle of rubbish, not just a compartment! Can all be recycled. Circular references can also be recycled here!!! Reclaim_train ($from _car)return //processes a Reference object from a recordset. If there are references from other trains, make a variety of adjustments. scan_rs (train_rs ($from _car)) scan_rs (Car_rs ($from _car)) add_to_freelist ($from _car)
//currently this from_car can be recycled.$from _car =$from _car.next}//The core logic of the train algorithm: to organise the referenced objects from the recordset to the same trainScan_rs (RS) { forRemembered_obj:rs for(child:rememberd_obj)ifIs_in_from_car (child) To_car= Get_last_car (Obj_to_car (remembered_obj))//move to reference it object the same trainChild =copy (Child, To_car)}
Finally, take a look at Wirte_barrier
//The operation here is Obj.field = new_obj.write_barrier (obj, field, new_obj) {ifObj >=$old _startifNew_obj < $old _start//The old age quotes the new generation, simpleAdd (obj, $young _rs)Else //It's a little logical to quote the old age.Src_car = Obj_to_car (obj)//Here's SRC, dest is a reference to the relationshipDest_car =Obj_to_car (new_obj)ifSrc_car.train_num >dest_car.train_num Add (obj, Train_rs (dest_car)Else ifSrc_car.car_num >dest_car.car_num Add (obj, Car_rs (dest_car)//The above two incomplete judgment conditions seem to leak some branches are not processed, but in fact, do not need to deal with. Because the garbage collection process takes place in the past, it only cares about the two ways forward from here. field =New_obj}
Here's a question, copy the object to the train where the source object is located, by source reference object, then
under what circumstances, the source reference object and the scanned object are not in one train? Above minor already mentioned, New_car (NULL) when. Specifically,
Regardless of the processing of the new generation to the old age replication, or the old generation of their own replication, as long as the copy of the root reference object, will be newly established train. This logic is also correct, the root of the object as "locomotive". Finally, look at one: the image of the expression, how to deal with different trains and different compartments of the cross-compartment record set, eventually make From_car all copy away.
Advantages:
- The old age of recovery will not sweep the whole heap, but only one compartment or, will make the pause time is very short;
- Recycling garbage can also be recycled at once;
Disadvantages:
- The throughput is slightly worse because the count of the recordset is large.
Garbage collection algorithm (5) Generational recycling