Java Virtual machine garbage collector and memory allocation policy overview
- Those memory needs to be recycled, when to recycle, and how to recycle is the 3 things a GC needs to accomplish.
- program counters, the virtual machine stack and the local method stack are all three areas of thread-private, memory allocation and recycling are deterministic, and memory is recycled as the method ends or the thread ends.
- The Java heap and the method area are not known to create those objects at runtime, this part of the memory allocation is dynamic, and the memory allocated and reclaimed in this chapter note refers to the Java heap and the method area .
Judgment object is dead.
- Reference counting algorithm: Add a reference counter to an object whenever there is a place to reference it, counter +1, reference fails, counter-1. The counter is 0 to determine that the object is dead. But in this way, it is difficult to solve the problem of circular references between Java objects. So the mainstream Java virtual machine does not use this method to determine whether an object survives.
- Reachability Analysis: A GC roots as a starting point for these nodes to search down through the reference chain (Reference Chain), when a GC roots no reference chain can reach an object A, a can be considered an unavailable object. Such objects can be judged as recyclable objects
1. References
- Narrow definition of reference: When the value stored in the reference type data is the starting address of another piece of memory, it is said that this memory represents a reference. But this approach does not describe some of the "tasteless, discard" objects
- The reference concept was expanded after JKD1.2, the object is divided into strong references (strong regerence), soft references (Soft Reference), weak references (Weak Reference), virtual references (Phantom Reference)
- (1) A strong reference refers to a reference to a new type, as long as a strong reference is still in it, the collector never reclaims the referenced object
- (2) Soft reference is used to describe some useful, but non-necessary objects, when the system to occur memory overflow, this kind of object will be two times, if the space is not enough, throw oom
- (3) A weakly referenced object can only survive until the next collection occurs. Whether the memory is adequate or not.
- (4) A virtual reference is the weakest reference relationship, and setting a virtual reference for an object is only intended to receive a system notification when the object is recycled.
2. Recovery method Area
- The Java Virtual Machine specification says it is not required to reclaim the method area (permanent generation), in the Cenozoic, garbage collection can reclaim more than 70% memory at a time. However, the collection efficiency in the method area (permanent generation) is often low. The main collections are obsolete constants and useless classes.
- Obsolete constants refer to the existence of a string ABC for a constant pool. But the current system has no object referencing ABC, and ABC is an obsolete constant that can be recycled
- Useless class: (1) The object of this class has been completely recycled. (2) The ClassLoader that loaded the class have been recycled. (3) The class object of this category is not referenced anywhere and is accessed through reflection. Hotspot provides the-XNOCLASSGC parameter to control whether or not to reclaim useless classes
Garbage collection algorithm
- Classic Common collection Algorithms: Tag-purge algorithm (mark-sweep), tagging-collation (mark-compact), Replication algorithm (Copying), "generational collection" of mainstream virtual machines
1. Tag-Clear algorithm
- Tag cleanup is the most basic collection algorithm: 1. Mark the objects that need to be recycled. 2. After the mark is complete, all the tagged objects are collected uniformly.
- The mark clear algorithm is insufficient: 1. Not high efficiency. 2. After the purge produces a large number of discontinuous fragments, if these discontinuities are not enough space to store the resulting objects, it will trigger another collection action ahead of time. Reduce memory efficiency.
2. Replication algorithm:
- The replication algorithm divides the memory into two equal blocks, a and a, each time using a piece of memory such as a, when a memory is full, the remaining surviving objects are copied to B memory, a one-off cleanup. Pros: The implementation is simple and efficient, and does not have to consider the memory fragments that are similar to the markup cleanup method left behind. The disadvantage is that the original memory is used only half, the cost is too high.
- Most of the current mainstream commercial virtual machines use this approach to collect the new generation. The new generation of object 98% is going to die, so it is not necessary to allocate Cenozoic memory according to 1:1. The distribution as Hotspot is: 8 (Eden): 1 (from Survivor): 1 (to survivor). The memory allocation guarantee is also used in the old age.
3. Labeling-Finishing (mark-compact)
- The replication collection algorithm, when the object survival rate is high, will produce a lot of replication work, affecting performance. Because the old age, the general object is relatively large, and survival rate is high, generally cannot use this algorithm.
- The marker-finishing algorithm is proposed based on the high survival rate of the old age object and the large object characteristics. The rationale is consistent with the markup cleanup, but the next step is not to purge directly, but instead to let all surviving objects move to one end and then clear the memory outside the end boundary.
4. Generation of collection algorithms
- The virtual machine memory is mainly divided into the new generation and the old age, the new generation generally uses the replication algorithm, from the old age distribution guarantee. The old age was recycled using the tag grooming algorithm.
Hotspot Main algorithm realizes
- Gc-root nodes Find reference chains, Gc-root nodes are generally constants, static variables, and data in local variable tables. When applying a lot of methods, it takes a lot of time to find a reference chain.
- Accessibility analysis, time sensitivity is also reflected in the GC pause, analysis, the system is required to stop, can not appear in the process of accessibility analysis, the object reference relationship is still changing, that is, "stop the word."
- The exact GC used by the current mainstream virtual machines: in the hotspot implementation, a set of data structures called Oopmap are used to achieve this purpose. At the completion of the class loading, in the JIT (even compiler) compilation process, the location of the next stack is recorded in a specific location is a reference. This allows the GC to know this information directly when it is scanned.
- Security point: With Oopmap assistance, hotspot can quickly and accurately complete the Gc-root enumeration, but Oopmap has a lot of content change instructions. If you generate a corresponding oopmap for each instruction, you need to consume a lot of extra space, so you can log these oopmap information only in a specific location (Security point: SafePoint) . Method calls, loop jumps, exception jumps, and so on are the instructions for these functions to produce safepoint.
- Safe zone: A security zone is a piece of code that refers to a relationship that does not change, and when the thread executes to the security zone, it first marks itself into the security zone, and this time the JVM executes the GC without having to take care of the threads of those states, and when the thread is out of the security zone, The first check is whether the system has completed the GC root election, and if it is done, wait until you receive a signal that you can safely leave.
Hotspot's main garbage collector
The generational algorithm can divide the collector into the new generation collector and the old age collector. 1. The new generation collector mainly includes: Serial (serial), parnew (parallel), Parallel scavenge (parallel); 2. Old age collectors mainly include: Serial old,parallel old,cms (Concurrent Mark Sweep).
- Serial is the oldest, single-threaded, new generation collector that, when collecting actions, must suspend all other worker threads until the collection is over. The collector is almost the fastest and most efficient collector in a single CPU. Serial is still a good choice for virtual machines in client mode, and recovering from the new generation of one hundred or two hundred trillion is basically within 100 milliseconds, and these pauses are perfectly acceptable.
- Parnew is a multithreaded version of serial, which is the preferred collector for virtual machines in service mode, mainly because it can be used in conjunction with the mainstream old age collector cms. The number of threads used by the collector is the same as the CPU core.
- The Parallel scavenge collector is the focus is the throughput (throughput = Run user code time/(Run code time + garbage Collection Time)), the shorter the pause time, the better the user experience, high throughput can efficiently utilize CPU time and complete the running task as soon as possible. This collector is suitable for a large number of background operations without the need for too many interacting systems. This collector is also called the throughput priority Collector.
- The Serial old uses the labeling grooming algorithm . Like the serial collector, it is a single-threaded collector. Generally used in the client run mode.
- Parallel old is used in conjunction with the Parallel scavenge collector, using a tagging algorithm , if the new generation uses the Paraller scavenge algorithm, the old age can only use Parallel Old and serial old algorithms. The Paraller scavenge and parallel old algorithms are given priority in the throughput-oriented system.
CMS collector
Is the mainstream collector of the current old age, using the mark clearing algorithm. CMS the entire process is divided into 4 steps: 1. initial tag; 2. Concurrency tag; 3. Re-tagging; 4. Concurrent cleanup. Where the initial tag, re-tagging still requires stop the world.
- The initial tag simply records the objects that the GC root can directly relate to, is fast, the concurrent tagging phase is the GC Roots tracing, and the re-tagging phase is to fix the tag record of the part of the object that the user continues to manipulate during the concurrency tag.
- A lengthy concurrency token and concurrent cleanup step that can work with the user thread without the need for stop the world. The benefits of CMS are reflected in concurrent collection, low pauses.
- CMS Disadvantage 1: Very sensitive to CPU resources, it is not recommended to use this collector at less than 4 cores. The number of recycle threads initiated by the CMS by default is (number of CPUs +3)/4.
- CMS disadvantage 2:cms There is no way to handle floating garbage, because after the tag, the user thread is still executing, will produce other garbage, stay in the next collection. After jdk1.6, the CMS collector boot threshold has increased to 92%, and if the remaining 8% of the old age does not meet the floating garbage distribution, concurrent Mode Failure will appear. This results in full GC generation. You can also temporarily enable the serial old to re-collect the older generations, but this will take a long time to pause. The-xx:cmsinitiatingoccupancyfraction property sets the startup threshold for the CMS and can lower the startup threshold based on the actual appropriate value.
- CMS Disadvantage 3: The use of the tag cleanup algorithm, will generate a lot of debris space, if you cannot find a large enough contiguous space to allocate the current object, then trigger a full GC.
G1 Collector
The G1 collector is considered a critical evolutionary feature of the Hotspot jdk1.7 and is a service-oriented garbage collector. G1 has the following advantages
- Parallel and Concurrency: G1 can more fully apply the hardware advantages of multi-CPU, multi-core environment, and reduce the system downtime. G1 can allow Java programs to continue executing in a parallel way.
- Generational collection: G1 does not need to be combined with other collectors to manage the entire GC heap independently. It can handle new objects in different ways and old objects that have survived multiple GC to produce better collection results.
- Spatial integration: G1 is a collector based on the tag grooming algorithm, which does not produce a large number of discontinuous fragmentation spaces, which is much better than the CMS.
- Predictable pauses: Reducing pauses is almost the focus of all current Internet enterprise applications. G1 In addition to the pursuit of low pauses, a predictable pause-time model has been established that allows the use of this designation to be used within a millisecond time fragment of length m, where the collection consumes no more than n milliseconds. Some of the real-time garbage collectors have been featured.
- Other collectors are collected throughout the new generation and throughout the old age, G1 the entire heap into separate regions of equal size (region), and the new generation and the old age are not required to be contacted. At the time of recycling, there is a priority list that prioritizes the recovery of the region with the highest value, ensuring that the G1 collector gets the highest possible collection efficiency for a limited period of time.
- Region: Each region has a corresponding remembered Set. Each time the operation is reference, the referenced object is checked in another region. The remembered Set is logged. Use remembered set as a gc-root. It is guaranteed that the whole heap will not be scanned, and there will be no recycling omissions.
- G1 Collector If the maintenance remembered set operation is not calculated, it can be divided into the following steps: 1. initial tag; 2. Concurrency tag; 3. Final tag; 4. Filter collection
Memory allocation policy
- Object Precedence in Eden Assignment
- Large objects go straight into the old age: the large objects in the code to avoid short-lived. -xx:pretenuresizethreshold=3145728 indicates that objects larger than 3M are allocated directly in the old age.
- Long-term survival into the old age: a virtual machine each object an age counter, the object every time minor GC survived, age+1. Reach a certain number of ages (default 15), then enter the old age. -xx:maxtenuringthreshold=1 says that every time a GC is minor, the surviving object will enter the old age.
- Dynamic Judgment Object Age: If the sum of the same age objects in Survivor is greater than half the size of survivor space, objects older than or equal to that age go directly into the old era.
- Space Allocation Guarantee: Creating objects Cenozoic If there is not enough space, then the old age to allocate memory, which requires the old age has a continuous space to hold the object, or may fail to trigger the full GC
Summary
This content mainly introduces several garbage collection algorithms, several major virtual machine collectors and the main strategy of memory allocation. Depending on the scenario, choose whether to focus on the pause or the throughput, serial or parallel collector. is an important part of tuning. At the same time understand memory allocation, for writing Java code also has a certain increase.
Java Virtual machine garbage collector and memory allocation policy