HBase GC's previous life-biography

Source: Internet
Author: User

NetEase Video Cloud is a cloud-based distributed multimedia processing cluster and professional audio and video technology designed by NetEase to provide stable, smooth, low-latency, high-concurrency video streaming, recording, storage, transcoding and VOD, such as the PAAs service, online education, telemedicine, entertainment show, Online finance and other industries and enterprise users only through simple development can create online audio and video platform. Now, NetEase video cloud Technical experts to share a technical article: HBase GC's previous Life-life chapter.

As already mentioned in the previous HBase Blockcache series article, the use of the Lrublockcache caching mechanism can cause excessive memory fragmentation due to the CMS GC policy, which could trigger the infamous full GC, triggering the dreaded ' stop-the-world ' pause , which seriously affects the upper business; the bucket cache caching mechanism because of the application of a fixed size of memory at the time of initialization cache, buffer elimination is no longer managed by the JVM, the data block cache operation is only access to this space and coverage, thus greatly reducing the occurrence of memory fragmentation, Reduces the frequency of full GC occurrences. How does the CMS GC policy lead to excessive memory fragmentation? How much memory fragmentation triggers the full GC? How does HBase continually optimize the CMS GC on the road to evolution? Next this series "HBase GC's previous life" will be for you one by one to solve the mystery, this series of two articles, this article-' life story ' will take you to fully understand the GC mechanism of HBase, the next-' evolution ' will give you the way hbase in the development of the road on how to continuously GC for optimization.

Java GC Overview

The entire hbase is built on a JVM virtual machine, so understanding the memory management mechanisms of hbase and the impact of different caching mechanisms on GC must have a thorough understanding of the Java GC. A deep understanding of how Java GC Works is beyond the scope of this article, and of course, if you are already familiar with Java GC, you can skip this section.

The Java GC is based on the assumption that most memory objects are either short-lived or soon to be referenced, such as the buffer that handles RPC requests may only survive a few microseconds, or a longer life cycle, such as a hotspot block in the block cache. May survive for a few minutes, or even longer. Based on this fact, the JVM divides the entire heap memory into two parts: the New Generation (young Generation) and Laosheng (tenured generation), in addition to the JVM there is a non-heap memory area-perm area, The main storage class information and other meta meta information, the memory structure as shown:


Among them, young district is also divided into Eden area and two survivor districts: S0 and S1. After a memory object is created, it will first request a piece of memory for the new generation, and if the object survives for a long time in the Cenozoic, it will be migrated to the Laosheng generation. In most latency-sensitive business scenarios, such as HBase, it is recommended to use the following JVM parameters,-XX:+USEPARNEWGC and XX:+USECONCMARKSWEEPGC, where the former represents a parallel garbage collection mechanism for the Cenozoic, The latter represents the execution of parallel tagging on the Laosheng generation-clearing the garbage collection mechanism. Visible, the JVM allows different GC policies to be executed for different memory areas.

Cenozoic GC Strategy –parallel New Collector

According to the above, the object is initialized and will be placed in the young area, more specifically the Eden area, and once the Eden area is full, a GC will be performed. The GC algorithm checks the reference of all objects, and if an object is also referenced, it indicates that the object is alive. After the check is complete, these surviving objects are moved to the S0 area, and the entire Eden space is reclaimed, called a minor GC, and then the new object comes in and is placed in the Eden area, which then checks the S0 and Eden for the surviving objects and moves all the surviving objects to the S1 area. Reclaim the entire S0 and Eden space; it is easy to understand that there will always be one area in the two districts of S0 and S1 reserved for the next surviving object.

The entire process can be used as shown:


This algorithm is called the replication algorithm, for this algorithm, there are two points to pay attention to:

1. The algorithm performs a ' Stop-the-world ' pause, but the time is very short. Because the young area is usually set relatively small (generally not recommended not more than 512M), and the JVM will start a large number of threads concurrent execution, a minor GC will typically be completed in a few milliseconds

2. No fragmentation occurs, the surviving objects are placed in contiguous space after each GC (S0 or S1)

All objects in memory maintain a counter, and each time the minor GC moves an object, it adds one to the object's counter. When the counter increases to a certain threshold, the algorithm considers the object's life cycle to be long and moves it into the Laosheng generation. The threshold value can be specified by the JVM parameter xx:maxtenuringthreshold.

Laosheng Generation GC policy –concurrent mark-sweep

After each execution of the minor GC, there will be some long-life objects being moved into the Laosheng generation, and after a period of time, the Laosheng generation space will be filled. In this case, we need to perform GC operations on Laosheng generation space, where we introduce the concurrent Mark-sweep (CMS) algorithm. The entire process of the CMS algorithm is divided into 6 phases, some of which perform a ' stop-the-world ' pause, and some phases execute concurrently with the application thread:

1. Initial-mark: This phase of the virtual opportunity suspends all the tasks that are being performed. This process is a virtual opportunity to mark all ' root objects ', the so-called ' root object ', which generally refers to an object that is directly referenced by a running thread. While the entire JVM is paused, the process is usually fast because the ' root object ' is relatively small.

2. Concurrent mark: The garbage collector starts with the ' root node ' and marks all referenced objects. This phase of the application's threads and markup threads executes concurrently, so the user does not feel a pause.

3. Concurrent Precleaning: The concurrent pre-cleanup phase is still concurrent. At this stage, the virtual machine looks for new entrants into the old age at the mark stage (some objects may be promoted from the Cenozoic to the old age, or some objects are assigned to the old age).

4. Remark: The found object is re-tagged on the basis of Phase 3, which pauses the entire JVM, but because Phase 3 is already checking out all newly entered objects, the process is fast.

5. Concurrent Sweep: The above 3 stages complete the tag of the referenced object, which will reclaim all unmarked objects as garbage. This phase of the application's threads and markup threads executes concurrently.

6. Concurrent Reset: Resets the data structure of the CMS collector and waits for the next garbage collection.

Accordingly, for the CMS algorithm, also need to pay attention to two points:

1. The ' Stop-the-world ' pause time is also short, and long-duration tagging and cleanup are performed concurrently.

2. The CMS algorithm does not re-compress the allocated surviving objects after the tag cleanup, so the entire Laosheng generation generates a lot of memory fragmentation.

CMS Failure Mode

The above mentioned in the normal situation of the CMS the entire process of the pause time is very short, usually also around 10ms~100ms. However, this is not in line with the situation, the online cluster in the reading and writing pressure is very large, often have a long time lag, some lag even a few minutes, leading to very serious read and write blocking, even the region server and zookeeper between the session timeout, Makes the region server exception offline. In fact, the CMS is not perfect, it produces a severe full GC in both scenarios, and is described separately.

Concurrent Failure

This scenario is actually relatively simple, if the system is now performing a CMS recycling Laosheng generation space, in the process of recycling a group of objects come in, unfortunately, the Laosheng generation has no space to accommodate these objects. In this scenario, the CMS collector stops working, the system goes into ' stop-the-world ' mode, and the recovery algorithm degrades into a single-threaded copy algorithm, reallocating the entire heap of memory to the surviving object into S0, freeing all other space. Obviously, the whole process will be very ' long '. But this kind of problem is also easy to solve, only need to let the CMS collector recycle a little earlier can avoid. The JVM provides the parameter -xx:cmsinitiatingoccupancyfraction=n to set the time for the CMS to be recycled, where n indicates that the current Laosheng generation has used memory as a percentage of the total Cenozoic memory, which defaults to 68. This value can be modified smaller to make the collection earlier.

Promotion Failure

Assuming that the xx:cmsinitiatingoccupancyfraction=60 is set at this time, there is no space to accommodate the migrated objects from the Cenozoic when the memory used has not reached the total memory of 60%. Oh,my god! How could that be? The main culprit is memory fragmentation, mentioned above that the CMS algorithm will produce a lot of fragmentation, when the fragmentation capacity accumulated to a certain size will result in the above scenario. In this scenario, the CMS collector will stop working as well, entering the long ' stop-the-world ' mode. The JVM also provides parameter-xx:usecmscompactatfullcollection to reduce fragmentation, which means that a defragmentation is performed after each CMS garbage collection, and it is clear that this parameter has a significant effect on performance. This is not a perfect solution for the latency-sensitive business of hbase.

HBase Memory Fragmentation Statistics Experiment

In the actual online environment, there are very few full GC concurrent Failure modes, and most full GC scenes are promotion Failure. Our online cluster will trigger a full GC for promotion failure every half month or so. To better understand how memory fragmentation triggers promotion Failure under the CMS policy, let's do a simple experiment: The JVM provides parameter-xx:printflsstatistics=1 to print the statistics of memory fragmentation before and after each GC, The statistics consist mainly of 3 dimensions: Free space, max Chunk size, and Num Chunks, which represents the total memory capacity of the Laosheng generation currently idle, and the Max Chunk size represents the memory capacity of the largest memory fragment in the Laosheng generation, Num chunks represents the total number of memory fragments in the Laosheng generation. We set this parameter to 1 in the test environment cluster (a total of 4 region servers) and then use a client YCSB to perform read-and-write operations, respectively, to count the changes of free space and Max Chunk size two metrics over time in the log.

The test results are shown, where the first graph represents the graph of total free space variation over time, and the second sketch represents the max Chunk size graph with time variation. Where the horizontal axis represents time, and the ordinate represents the corresponding memory size.



according to the first graph, the total free memory capacity of the Laosheng is maintained between 300m~400m, when the memory capacity reaches about 300M, the memory capacity will go back to about 400M after a GC,GC. The second graph will show more vividly the promotion Failure caused by memory fragmentation, and the Max Chunk size will keep getting smaller as the data continues to be written, and for a long period of time it basically remains around 30M. At the horizontal axis of 1093, artificially writes the individual data size from 500Byte to 5M size, after which the max Chunk size will be reduced again, when reduced to a certain extent, the curve will suddenly rise to about 350M, through the log confirmation, At this point the JVM has a full GC of promotion failure mode with a duration of approximately 4.91s. The full GC continues to occur for some time thereafter.

Through the above analysis, you can know: CMS GC will continue to produce memory fragmentation, when the fragment is small to a certain extent, will basically remain unchanged, if at this time the business write some large amount of data keyvalue, it is possible to trigger promotion failure mode full GC.

Summary

This article first introduced two common Java GC policies, then introduced the CMS policy may cause two modes of full GC, finally through a small experiment shows that the CMS GC does produce memory fragmentation, and will cause a long time full GC occurs. Next, the Evolution chapter will detail how HBase is optimized for the CMS from the outset, so stay tuned!

Categories: More technical exchanges, please pay attention to our exchanges and consultation Oh!

HBase GC's previous life-biography

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.