About GC, a copy of the picture about Spring Festival
Before introducing GC, it is necessary to first understand the JVM memory division, so that it is easier to understand GC and various GC collector later.
The following figure shows others who are "Stealing". It is a classic description of the jvm architecture. We only need to pay attention to the largest part-runtime data area.
The runtime time zone, as its name implies, is the memory structure of the jvm at runtime. There are five main types.
1. Method Area
The method area is a memory area shared by various threads. When a virtual machine loads a class file, it parses type information from binary data, which is stored in the method area, static variables including classes are also stored in this area. The virtual machine specification divides the region into a part of the Heap, but in fact it has a Non-Heap alias, which is clearly used to distinguish from the Heap. We used to call this region permanent generation when discussing GC. In essence, they are not a concept. For HotSpot, permanent generation is only a way to implement the method area. In addition, the HotSpot version is planned to remove the permanent generation. If the memory in the region is insufficient, OOM will occur.
2. Heap
The heap and method zones are also a memory area shared by various threads. All object instances, including arrays, are allocated with memory. For example, when we use new Object (), the memory is allocated here. java heap is the largest memory area managed by jvm, note that, like the method area, the jvm specification does not require the heap to be continuous. The jvm can dynamically expand and contract the heap at runtime. In order to better implement GC, the modern jvm has refined the heap and divided the whole heap into different regions. The figure below is from the official oracle website, which details the heap details.
The figure is pretty big--it's still horizontal. Let's take a look. The whole heap is divided into three areas: the Young area, the Tenured area (that is, the Old area), and the Perm area, traditionally, we call it the young generation, the old generation, and the permanent generation (in fact, GC is collected based on these three generations ). Careful friends may notice that each region has a virtual, so it is necessary to explain what the virtual is. We know that the heap can be expanded during runtime, for example, when configuring virtual machine parameters, you usually specify the-Xmx,-Xms Max heap, and initial heap. The virtual here is the reserved memory area, the value is the maximum heap minus the initial heap value. In fact, the operating system will divide the memory-Xmx size to jvm at the beginning, but the jvm may not need such a large space at the beginning, therefore, the jvm marks a part of the memory as the virtual area, which will be used for later extension. Among the three memory areas, the Young area is a little more complex. Here, there are three areas: One Eden and two region vor areas (one to region VOR and one from Region VOR). The name is very interesting, one is the Garden of Eden and the other is the surviving zone. Further explanation is given to the specific storage of these zones.
3. Virtual Machine Stack
The Virtual Machine stack is private to the thread, so the value in this region does not need to consider the concurrency issue. The jvm allocates a stack to each java thread, which is private to the thread. The stack memory is released as the thread is destroyed. Jvm generates a stack frame for each method to save information such as the local variable table and the operand stack (these concepts are also mentioned in my previous blog. Basic Types and object references can be stored in the stack. StackOverflowerror and OOM exceptions may be thrown in this region.
4. Local method Stack
The local method stack is similar to the Virtual Machine stack, except that the virtual machine stack serves java methods and the local method stack serves Native methods.
5. Program counters
A program counter is a very small memory area. It is the row number indicator of the bytecode executed by the current thread and always points to the next instruction to be executed, this region is the only memory zone that does not have OOM.
As mentioned above, the VM stack, local method stacks and program counters, their memory allocation can be basically determined during the compilation period, and the memory will be destroyed as the thread method ends or the thread ends, therefore, this part of memory does not need to be recycled. Compared with the memory allocation in the method area, the heap has uncertainty, and the memory allocation and recovery in this part are dynamic. Therefore, the jvm needs to perform GC for these two memories.
After talking about the heap generational division, Let's see why the generational division is different from the objects stored in each generation.
One obvious reason for generational division is that it is convenient to collect garbage, because garbage collection is only for those unreferenced orphan objects, research shows that most of the objects in java are short-lived, but some objects still survive for a long time. Therefore, in order to collect these different objects, the heap is divided into different generations to store these objects. That is to say, the objects in the Young area are relatively "Young ", similarly, we can understand the Old zone. This is why Young and Old are involved.
In fact, GC is not unique to java. GC has a long history than java. We will not introduce the basic principles of GC, such as reference counting and Accessibility analysis. There are two main types of GC: minor gc and major gc, also known as young gc and old gc. Minor gc occurs in Young, and the time is usually very short. major gc occurs in Old, which takes a long time. You need to control the number of major gc times and GC time.
Based on the time when the object was alive, jvm gave the object a concept similar to our human "Age". In fact, every GC, the age of the surviving object is + 1, in most cases, objects are preferentially allocated to the Eden area, which is the reason for the Eden area. When the Eden area does not have enough memory to allocate, a Minor GC is triggered. At this time, the age of the surviving object is + 1. When it reaches a certain age, it indicates that the object is relatively stable and will move this part of the object to the old generation (we can use the parameter-XX: maxTenuringThreshold controls the number of Minor gc entries before entering the old zone. The default value is 15). If you find that the objects cannot be stored in the Minor GC, they are directly stored in the old generation. It should be noted that when large objects (such as arrays and long strings) are to be allocated, they are also directly allocated to the old generation, therefore, we try to avoid the use of large objects, especially short-lived large objects, because it is easy to cause insufficient memory allocation in the old area, thus triggering full gc in advance. When the old generation did not have enough memory, the Major GC will be triggered. Generally, the minor gc time is very short, and the impact on the program can be ignored. However, the major gc process is much longer than the minor gc process, therefore, we should try to avoid the occurrence of major gc (when gc occurs, all application threads will be suspended for execution, officially known as stop the world, the time here refers to the time when the world is stopped ).
Let's take a look at several major GC algorithms. These algorithms coexist because specific algorithms are usually used for different regions.
1. Mark-clear
Obviously, this algorithm is divided into two steps. The first step is to mark the objects to be recycled, and the second step is to clean these objects. There are two main disadvantages of this algorithm: efficiency, low tag and clearing efficiency, and a large number of discontinuous memory fragments are generated after cleaning, this will trigger a garbage collection in advance because the continuous memory cannot be found if a large object needs to be allocated during the running process.
2. Copy Algorithms
The replication algorithm divides the memory into two equal parts. Each time one part is used, when one part is used up, all the surviving objects are copied to the other part, then, the first block of memory is cleared. The benefits of this algorithm are self-evident and will not generate memory fragments. However, it is costly to use only one block of memory, and when there are many surviving objects, efficiency will definitely be low. In fact, the vast majority of objects in java are "short-lived", so the memory does not need to be divided by. By default, HotSpot divides the Eden zone and the two worker vor zones according to the ratio, that is, Eden: Login vor = 8( of course, we can set the size of the login vor through-XX: Login vorratio. If this value is 8, it indicates that the Eden area accounts for eight out of 10 of Young, two distinct vor instances account for the second in the Young area). This algorithm is very suitable for minor gc.
3. Mark-organize
Because the replication algorithm has a low aging rate for a large number of surviving objects, it is not suitable for recycling the old generation, but the tag clearing has fragmentation problems. Therefore, a tag sorting algorithm is generated. The marking algorithm is similar to the marking clearing algorithm. The first step is marking, while the second step is to move the surviving object to one end, and then directly clear the memory outside the boundary, it is not too inefficient to generate fragments.
For the above algorithms, HotSpot provides the following implementations:
In the figure, the yellow background is the collector of the young generation, the light gray background is the collector of the old generation, the blue background indicates the garbage collector, and the two collectors can be shared in a straight line. The following describes the six collectors:
1. "Serial" will cause stop the world, a copy-based Single-thread collector.
2. "ParNew" is the multi-threaded version of serial. Unlike "Parallel Scavenge", ParNew can be used more efficiently with CMS.
3. "Parallel Scavenge" will cause stop the world, a copy-based multi-thread collector.
4. "Serial Old" will cause stop the world, a single-thread Collector Based on tag-clear-arrangement.
5. "CMS" is a kind of collector for concurrent short pause. Some steps will cause stw, which will be explained in detail later.
6. "Parallel Old" is a concurrent tag-based collector, the older version of Parallel Scavenge.
The above six most complex types are CMS collectors, which will be explained in detail later.
ParNew is a multi-thread parallel collector, and CMS is a concurrent collection. Here, we have to mention that parallelism refers to the collection of multiple garbage collection threads for collection. At this time, the application thread is stopped, but concurrency refers to the simultaneous execution of the collection thread and the application thread, that is, the garbage collection process does not affect the application (here, the effect is relative, the actual CMS process is divided into many steps, and some steps will also stop the world ).
Since ParNew and Parallel Scavenge are both for the Parallel collection of the new generation, what are the differences between them?
Collectors such as CMS and ParNew mainly focus on reducing application pauses caused by collection, while Parallel Scavenge focuses on application throughput, the so-called throughput is the ratio of the CPU used to run the application time to the total CPU consumption time. For example, if the virtual machine runs for 100 minutes in total, and the garbage collection takes 1 minute, the throughput is 99/100 = 99%. For Parallel Scavenge, a special parameter-XX: + UseAdaptiveSizePolicy applies this parameter. JVM can automatically adjust the size of each heap memory zone at runtime without manual configuration.
You can use-XX to configure different collectors for VM parameters:
UseSerialGC is "Serial" + "Serial Old"
UseParNewGC is "ParNew" + "Serial Old"
UseConcMarkSweepGC is "ParNew" + "CMS" + "Serial Old ". The recovery of the old generation mostly uses "CMS ". However, when a concurrent mode failure error occurs, the system switches to "Serial Old ".
UseParallelGC is "Parallel Scavenge" + "Serial Old"
UseParallelOldGC is "Parallel Scavenge" + "Parallel Old"
The figure above shows several stages of the cms collector: initial tag, concurrent tag, re-tag, and concurrent cleanup. Among them, step 1 and Step 3 need to suspend all application threads. For the first time, the object marked as alive starting from the root object is paused. This stage is called the initial mark. For the second pause, all application threads are paused after the concurrent mark, remark the objects missed in the concurrent mark stage (because the object state is updated after the concurrent mark stage ends ). The first pause is short, and the second pause is usually long, and remark can be concurrently marked. One CMS will have two STWs. Therefore, when using the CMS garbage collector, we usually use the fullgc we view with jstat (there is a saying that the number of fullgc times is the number of STW times) the relationship between the number of times and the number of times cms occurs is. There are a lot of considerations about CMS parameters:
Other GC parameters include the following:
-XX: + PrintGC: Output GC logs
-XX: + PrintGCDetails: Output detailed GC logs
-XX: + PrintGCTimeStamps: output the GC timestamp (in the form of reference time)
-XX: + PrintGCDateStamps: output the GC timestamp (in the form of a date, for example, 2013-05-04T21: 53: 59.234 + 0800)
-XX: + PrintHeapAtGC prints heap information before and after GC.
-Xloggc: Output path of the ../logs/gc. log file
The above parameters mainly involve GC log printing, and many other jvm parameters are not described in detail. There are many details on the Internet.
After talking about so many GC logs, we will analyze a GC log.
519.514: [GC 519.514: [ParNew: 5149852 K-> 83183 K (5662336 K), 0.0831770 secs] 6955196 K-> 1905793 K (9856640 K), 0.0833560 secs] [Times: user = 0.57 sys = 0.03, real = 0.08 secs]
The preceding 519.514 indicates the number of seconds that GC occurs since the VM is started. [GC indicates that this is a common GC. Of course, there is also [Full GC, [ParNew indicates that the ParNew collector is used to collect the young generation. 5149852 K-> 83183 K (5662336 K) indicates the memory size used in the area before GC, respectively, size of Memory Used in the region after GC, total size of the region. 0.0831770 secs indicates that GC occupies the unit of time in seconds. The later detailed time user = 0.57 sys = 0.03, real = 0.08 secs is consistent with the time output by the Linux time command.