BYTEBUF Memory Management in NETTY4

Source: Internet
Author: User

Turn http://iteches.com/archives/65193

Netty4 brings a distinctive feature of its bytebuf to the realization of the reappearance, to be honest, Java.nio.ByteBuf is a very uncomfortable API for me, in contrast, by maintaining two independent read-write pointers, Io.netty.buffer.ByteBuf is much simpler and more efficient. The biggest difference, however, is that Netty's bytebuf is no longer based on the traditional JVM's GC pattern, but instead uses a mechanism similar to Malloc/free in C + +, which requires developers to manually manage recycling and deallocation. From manual memory management to GC, is a great progress in history, but after 20 years, incredibly have a curve back to the manual memory management mode, is confirming the Marxist Philosophy view: Society is always in the spiral forward, not always the best.

Indeed, in terms of memory management, the value of GC to us is self-evident, not only greatly reduces the programmer's mental burden, but also greatly reduces the memory management caused by the crash, for functional programming (a large number of temporary objects), scripting language programming brings spring. Moreover, the efficient GC algorithm also allows the program to perform more efficiently in most cases. However, there are a lot of situations that may be more appropriate for manual memory management. For example:

1, a large number of long-term survival of the object. such as cache management. The GC here is just struggling to do inefficient cleanup work. In Java, there has been no very mature cache scheme, and this is a great relationship.

2, high throughput under too frequent object allocation. In this mode, the time slices processed by a single service are actually very short, but they produce very large object allocations. Although the objects here also have a very short life cycle, too frequent allocations cause the GC to become frequent. Even once younger GC, its cost is much more than a simple service processing.

So, in theory, the awkward GC is actually more appropriate for dealing with situations between these 2: the frequency of object allocation is much less than the time of data processing, but it is relatively short, typically, for OLTP-based services, processing capacity at 1K QPS magnitude, Each requested object is allocated at the 10k-50k level, able to perform a younger GC in 5-10s time, each time the GC can be controlled at 10ms levels, such applications are too suitable for GC line mode: and combined with Java efficient generational GC, It's a perfect match.

However, for a relatively simple business logic, such as a network routing and forwarding application (many Erlang applications are actually this type), the QPS is very high, such as the 1M level, in which case, even if the production of 1K of garbage per processing, will cause frequent GC generation. In this mode, either the per-process recycle mode of Erlang, or the manual recycle mechanism of c/s + +, is more efficient.

As for the cache application, because the object's lifetime is too long, the GC basically becomes worthless.

Netty 4 introduces the manual memory mode, which I think is a big innovation, and this pattern even extends to the cache application. In fact, with many of the best features of the JVM, if you use Java to implement a Redis-type cache, ora in-memory SQL Engine, or a MONGO DB, I think it would be much simpler than C + +. In fact, the JVM has also provided a mechanism to get through this technology, namely direct memory and unsafe objects. Based on this foundation, we can manipulate the memory directly like C language. In fact, Netty4 's bytebuf is based on this foundation.

This article simply analyzes how memory allocation is managed in Netty 4.

Netty 4 introduces a high-performance buffer pool which is a variant of Jemalloc that combines buddy allocation and slab a Llocation.

According to the official website, I looked at the basic algorithm of buddy allocation and slab allocation, and then combined with Netty source code, roughly organized the data structure as follows

Poolchunk: A large contiguous memory, Netty, this value is 16M, one-time distribution through the JAVA.NIO.BYTEBUF, the memory is direct memories, not within the JVM's GC range. Multiple poolchunk can be combined to form a poolarea. Each poolchunk is divided into blocks according to Buddy algorithm , the smallest block:order-0 block (called a poolsubpage) is 8K, then Order-1 block:16k, order-2 block : K, until order-11 block:16m

In Poolchunk, use a int[4096] memorymap to describe all blocks, which is actually a binary tree:

0

1 2

3 4 5 6

07 08 09 10 11 12 13 14

Here, Memorymap[1] represents the order-11 block, which can actually be cut into two blocks of order-10, which are represented by memorymap[2] and memorymap[3], each of which consists of 3 parts:

31 17 16 2 1 0

0-1 bits: Flag bit, 00 (unused, without this block), (Branch, current block to split, composed of 2 low-level blocks), 02 (allocated), 03 (allocated subpage, this is the lowest allocated block, can be a page larger than 8K. )

2-16-bit: The size of the current block, in page.

17–31 bit: The offset of the current block relative to the chunk base address. (in page as unit)

The advantage of using an array instead of a binary tree is that the memory is greatly saved, and the sub-nodes of the nth node are 2n+1, 2n+2. Here, the 16M chunk needs to be described using 16K, accounting for 0.1%

Poolsubpage corresponds to an allocated block, in the slab implementation, each poolsubpage is used only for a certain size of memory allocation, for example, this value from 16, 32 to 16 (Tinypool) is incremented, followed by 1K, 2 K, 4K, 8K, 16K, 32K, 64K ... (Smallpool) increment order, each subpage is used to allocate a fixed size of memory (called a slab), the advantage is that only a bitmap can be used to record which memory is allocated, in the smallest 16 bytes per page (8K) Only 64 bytes of bitmap information is required (0.78%), and in other blocks, the value is even smaller. (The current implementation of Netty here has some shortcomings, each page is a 64-byte bitmap, estimated to simplify the subpage itself pool)

Each chunk organizes multiple queues according to the size of slab, each member of the queue is subpage, so it can be done as quickly as it needs to be allocated.

Each bytebuf () maintains the corresponding chunk, and when it needs to be released, it can quickly locate the corresponding page in the chunk according to the current memory address, and then release it in the corresponding subpage. The whole process only needs to update the corresponding bitmap.

Ignoring the other memory overhead, the memory management cost of Slab+buddy mode is less than 1%, and the allocation and release speed is very fast. But Slab way, the overall memory usage may be smaller, the memory between different slab will not be shared, the equivalent of a big cat dug a big cat hole in the case, but also to the kitten dug a small cat hole.

=========================

In general, Netty's bytebuf brings us to the revelation that even in the JVM, we do not have to adhere to the GC memory management, in the case of better use of manual management, we can also do this, this also for Java management of massive memory, cache data, As well as high-frequency allocation patterns, the power of the JVM can still be used rather than the flexibility to manage memory directly

BYTEBUF Memory Management in NETTY4

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.