HBaseGC's past and present-life

Source: Internet
Author: User
Netease video cloud is a cloud computing-based Distributed Multimedia Processing cluster and professional audio and video technologies built by Netease, provides stable, smooth, low-latency, and high-concurrency PAAS services for live video, recording, storage, transcoding, on-demand audio and video, online education, telemedicine, and entertainment.

Netease video cloud is a cloud computing-based Distributed Multimedia Processing cluster and professional audio and video technologies built by Netease, provides stable, smooth, low-latency, and high-concurrency PAAS services for live video, recording, storage, transcoding, on-demand audio and video, online education, telemedicine, and entertainment.

Netease video cloud is a cloud computing-based Distributed Multimedia Processing cluster and professional audio and video technologies built by Netease, it provides stable, smooth, low-latency, and high-concurrency PAAS services for live video, recording, storage, transcoding, on-demand audio and video, online education, telemedicine, entertainment shows, online finance, and other industries and enterprise users can create an online audio and video platform through simple development. Now, Netease video cloud's technical experts share a technical article: The history of HBase GC.

As mentioned in earlier HBase BlockCache articles, LRUBlockCache may cause excessive memory fragments due to cms gc policies, which may lead to the notorious Full GC, the terrible 'stop-the-world' pause is triggered, which seriously affects upper-layer services. the Bucket Cache mechanism applies for a fixed size of memory as the Cache during initialization, cache elimination is no longer managed by JVM. The data Block cache operation only accesses and overwrites the space. This greatly reduces the memory fragmentation and the frequency of Full GC. Then how does cms gc policy cause excessive memory fragments? How does one trigger Full GC when there are too many memory fragments? How does HBase continuously optimize cms gc on the road of evolution? Next, this series of "The History of HBase GC" will reveal the answer to your questions. This series contains two articles in total, this article-'alishishi' will give you a comprehensive understanding of the GC mechanism of HBase, the next article-'Evolution 'will show you how HBase continues to optimize Full GC on its development path.

Java GC Overview

The entire HBase is built on the JVM Virtual Machine. Therefore, to understand the memory management mechanism of HBase and the impact of different cache mechanisms on GC, you must have a comprehensive understanding of Java GC. The in-depth understanding of the working principles of Java GC is not covered in this article. Of course, if you are familiar with Java GC, you can skip this section.

Java GC is based on the assumption that most memory objects either have short lifecycles and will not be referenced soon. For example, buffer processing RPC requests may only survive for several microseconds; either the life cycle is long. For example, the hotspot Block in the Block Cache may survive for several minutes or even longer. Based on this fact, JVM divides the entire heap memory into two parts: young generation and tenured generation, JVM also has a non-heap memory zone-Perm zone, which stores class information and other meta information. The memory structure is shown in:


Among them, the Young area is divided into the Eden area and two vor areas: S0 and S1. After a memory object is created, it will first apply for a memory space for the new generation. If the object remains in the new generation for a long time, it will be migrated to the old generation. In most latency-sensitive business scenarios (such as HBase), we recommend that you use the following JVM parameters:-XX: + UseParNewGC and XX: + UseConcMarkSweepGC, the former indicates the parallel garbage collection mechanism for the new generation, while the latter indicates the parallel marking-clearing Mechanism for the old generation. It can be seen that JVM allows different GC policies to be executed for different memory areas.

New Generation GC policy-Parallel New Collector

According to the above description, the object will be put into the Young area after initialization. More specifically, it should be the Eden area. When the Eden area is full, a GC will be performed. The GC algorithm checks the references of all objects. If an object is still referenced, the object will survive. After the check is complete, the surviving objects will be moved to the S0 area and the space of the entire Eden area will be reclaimed, called a Minor GC. Then the new object will be added to the Eden area, when the S0 and Eden regions are full, the S0 and Eden survival objects will be checked, all the surviving objects will be moved to the S1 region, and the space of the S0 and Eden areas will be reclaimed; it is easy to understand, s0 and s1. there will always be a zone reserved for storing surviving objects next time.

The entire process can be used as shown in:


This algorithm is called a replication algorithm. For this algorithm, you need to pay attention to the following two points:

1. the algorithm will pause 'stop-the-World', but the time is very short. Because the Young area is usually set to a relatively small value (generally it is not recommended that it not exceed 512 MB), and the JVM will start a large number of concurrent threads for execution, a Minor GC will generally be completed within several milliseconds

2. No fragments are generated. After each GC, the surviving objects are placed in a continuous space (S0 or S1)

A counter is maintained for all objects in the memory. Every time a Minor GC moves an object, the counter of this object is incremented by one. When the counter reaches a certain threshold, the algorithm considers that the object has a long life cycle and moves it into the old generation. The threshold value can be specified through the JVM parameter XX: MaxTenuringThreshold.

GC policy of the old generation-Concurrent Mark-Sweep

After each Minor GC execution, some objects with a long life cycle will be moved into the old generation. After a period of time, the old generation space will be full. Now we need to perform GC operations for the old generation space. Here we introduce the Concurrent Mark-Sweep (CMS) algorithm. The entire process of the CMS algorithm is divided into six stages, some of which will execute 'stop-the-world' pause, and some of which will be concurrently executed with the Application Thread:

1. initial-mark: this stage has a virtual opportunity to pause all ongoing tasks. This process marks all 'root objects' as a virtual opportunity. The so-called 'root object' usually refers to an object directly referenced by a running thread. Although the entire JVM will be paused, this process is usually very fast because the 'root object' is relatively small.

2. concurrent mark: The Garbage Collector starts from the 'root node' and marks all referenced objects. At this stage, the threads and Mark threads of the application are concurrently executed, so the user will not feel paused.

3. concurrent precleaning: the concurrent pre-cleaning phase is still concurrent. In this phase, virtual machine searches for new objects in the old age when the mark stage is executed (some objects may be promoted from the new generation to the old age, or some objects may be allocated to the old age ).

4. remark: remark the searched Objects Based on Phase 3. The entire JVM will be suspended in this phase, but because Phase 3 has already checked all newly entered objects, therefore, this process will be very fast.

5. concurrent sweep: the three phases above complete the marking of the referenced object, which will recycle all unlabeled objects as garbage. Concurrent execution of application threads and marking threads at this stage.

6. concurrent reset: reset the data structure of the CMS collector and wait for the next garbage collection.

Correspondingly, for the CMS algorithm, you also need to pay attention to two points:

1. the 'stop-the-world' pause time is also short, and long-time mark and cleanup operations are performed concurrently.

2. The CMS algorithm does not recompress the allocated surviving objects after marking and cleaning, so the entire old generation will produce a lot of memory fragments.

CMS Failure Mode

As mentioned above, under normal circumstances, the entire CMS process is suspended for a short period of time, generally 10 ms ~ About ms. However, this is not consistent with the online situation. When the online cluster is under a lot of reading and writing pressure, it often gets stuck for a long time, and sometimes gets stuck for several minutes, this can cause serious read/write blocking and even cause Session Timeout between Region Server and Zookeeper, which causes the Region Server to go offline abnormally. In fact, CMS is not perfect. It will produce serious Full GC in two scenarios. Next we will introduce it separately.

Concurrent Failure

This scenario is actually relatively simple. If the system is currently executing CMS to recycle the old generation space, a new generation of objects will come in during the recycling process. Unfortunately, the old generation has no space to accommodate these objects. In this scenario, the CMS recycler stops working, the system enters the 'Stop-the-world' mode, and the recycling algorithm degrades to a single-thread replication algorithm, re-allocate the surviving objects of the entire heap memory to S0 to release all other spaces. Obviously, the entire process will be very long '. However, this problem can be easily solved. You only need to make the CMS recycler clear the problem earlier. JVM provides the parameter-XX: CMSInitiatingOccupancyFraction = N to set the CMS recovery time. N indicates the proportion of memory used by the old generation to the total memory of the new generation. The default value is 68, you can reduce the value so that it can be recycled earlier.

Promotion Failure

Assume that XX: CMSInitiatingOccupancyFraction = 60 is set at this time, but when the memory usage has not reached 60% of the total memory, there is no space to accommodate the objects migrated from the new generation. Oh, my god! Why? The culprit is memory fragmentation. As mentioned above, the CMS algorithm will generate a large number of fragments. When the fragmentation capacity reaches a certain size, it will cause the above scenario. In this scenario, the CMS recycler stops working and enters the long 'stop-the-world' mode. JVM also provides the parameter-XX: UseCMSCompactAtFullCollection to reduce the generation of fragments. This parameter indicates that fragments are executed once after each CMS garbage collection. Obviously, this parameter will have a great impact on performance, and it is not a perfect solution for latency-sensitive businesses such as HBase.

HBase memory fragmentation statistics Experiment

In the actual online environment, Full GC in Concurrent Failure mode is rarely displayed. In most Full GC scenarios, Promotion Failure is used. Our online cluster will trigger a Full GC every about half a month due to Promotion Failure. To better understand how memory fragmentation triggers Promotion Failure in CMS policies, let's do a simple experiment: JVM provides the parameter-XX: printFLSStatistics = 1 to print the statistics of memory fragments before and after each GC. The statistics mainly include three dimensions: Free Space, Max Chunk Size, and Num Chunks, free Space indicates the total idle memory capacity of the old generation, and Max Chunk Size indicates the memory capacity occupied by the maximum memory fragments of the old generation, num Chunks indicates the total number of memory fragments in the old generation. In the test environment cluster (four Region servers in total), we set this parameter to 1, And then use a client YCSB to execute the Read-And-Write operation, the Free Space and Max Chunk Size indicators in the log change over time.

Shows the test result. The first graph shows the change curve of Total Free Space over time, and the second graph shows the change curve of Max Chunk Size over time. The abscissa represents time, And the ordinate represents the memory size.



According to the first graph, the total idle memory capacity of the old generation is MB ~ Between MB, a GC is performed when the memory capacity reaches about mb. After GC, the memory capacity will return to about MB. The second curve shows the Promotion Failure caused by memory fragments. At the beginning, as data is constantly written, the Max Chunk Size will keep decreasing, and it will remain around 30 m for a long time. When the x-axis is 1093, the Size of a single data entry is manually changed from bytes to 5 MB. Then, the Max Chunk Size will be reduced again, after a certain degree of reduction, the curve will suddenly rise to about 4.91 M. After log confirmation, the JVM has a Full GC in Promotion Failure mode, lasting about s. After that, Full GC continues.

After the above analysis, we can know that cms gc will continuously generate memory fragments. When the fragments are small to a certain extent, they will remain unchanged. If the business writes some keyvalues with a large data volume, it is possible to trigger the Promotion Failure mode Full GC.

Summary

This article first introduces two common Java GC policies, and then introduces the CMS policy that may cause Full GC in both modes. Finally, a small experiment shows that cms gc actually produces memory fragments, in addition, Full GC may occur for a long time. Next, "Evolution" will detail how HBase optimized CMS from the very beginning, so stay tuned!

Categories: For more technical exchanges, please pay attention to our exchanges and consultations!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.