Absrtact: In Netty, there are usually multiple IO threads working independently, based on the implementation of Nioeventloop, each IO thread is responsible for polling a separate selector instance to retrieve IO events, and when the Io event arrives, the IO thread begins to process the IO event. The most common IO events are read and write events, and this time it involves the read and write of the IO thread to the data, specifically to the NIO, which reads the data from the kernel buffer to the user buffer or writes the data from the user buffer to the kernel buffer. NIO provides two buffer buffers, namely Directbuffer and Heapbuffer. This article mainly introduces two kinds of buffer, then introduces the implementation principle and application of Netty based on threadlocal memory pool technology, and gives a simple dimension test data.
In Netty, there are usually multiple IO threads working independently, based on the Nioeventloop implementation, each IO thread is responsible for polling a separate selector instance to retrieve the Io event, and when the Io event arrives, the IO thread begins processing the IO event. The most common IO events are read and write events, and this time it involves the read and write of the IO thread to the data, specifically to the NIO, which reads the data from the kernel buffer to the user buffer or writes the data from the user buffer to the kernel buffer. NIO provides two buffer buffers, namely Directbuffer and Heapbuffer. This article mainly introduces two kinds of buffer, then introduces the implementation principle and application of Netty based on threadlocal memory pool technology, and gives a simple dimension test data.
Directbuffer and Heapbuffer
Directbuffer, as the name implies, is allocated in the memory area above the direct memory, and the direct memory is not part of the JVM Runtime data area, nor is it the memory area defined in the Java Virtual Machine specification, but this part of the memory is also used frequently. Starting with the JDK1.4 version of the channel and buffer IO introduced by NIO allows us to use the native interface to allocate memory on direct memory and operate with a reference on the JVM heap memory, which is recycled by the operating system when a reference to the JVM heap memory is reclaimed. Heapbuffer is the buffer allocated to the memory area of the JVM heap, which we can simply understand as a form of encapsulation of Heapbuffer is the byte[] array.
The IO write process based on Heapbuffer is usually to allocate a temporary buffer on the direct memory, then copy the data to direct memory, and then send the direct memory data to the IO device buffer, and finally destroy the temporary direct memory area. The IO read process based on Heapbuffer is similar. After using Directbuffer, it avoids the duplication of data between the JVM heap memory and the direct memory, and has a significant performance improvement in some application scenarios. In addition to avoiding multiple copies of direct memory, another benefit is the fast access, which is related to how the JVM accesses objects.
The disadvantage of Directbuffer is that the allocation of direct memory and the cost of recovery are relatively large, so directbuffer is suitable for scenarios where buffers can be reused.
The buffers in Netty
In Netty, there are two forms of buffers, namely Heapbuffer and Directbuffer. Netty the pools for them all:
The pooling of the corresponding heap memory and direct memory is POOLEDHEAPBYTEBUF and Pooleddirectbytebuf respectively, in their implementation are maintained a recycler, this recycler is the focus of this article, It is also the core implementation of Netty lightweight memory pool technology.
Recycler and internal components
Recycler is an abstract class that provides externally two public methods get and recycle are used to fetch objects from an object pool and to reclaim objects, and a protected abstract method NewObject is provided, NewObject is used to create a new object when there are no available objects in the memory pool, which is implemented by the user, and Recycler lets the user pass in the form of a generic parameter to the object type to be pooled.
/** * light-weight object pool based on a thread-local stack. * * @param <T > the type of the pooled object */public abstract class recycler<< Span class= "Hljs-class" >t>
The recycler consists of three core components, each of which is responsible for the specific parts of the object pool implementation, and Recycler provides a unified object creation and recycling interface to the outside:
Handle
Weakorderqueue
Stack
The functions of each component are as follows
Handle
Recycler gives a default implementation of handle in the inner class: Defaulthandle,handle mainly provides a recycle interface to provide a concrete implementation of object recycling, each handle associated with a value field, For storing specific pooled objects, remember that in the object pool, all pooled objects are wrapped by this handle, and handle is the basic unit of object pool management. In addition, handle points to this corresponding stack, where the object storage is the specific place where the handle storage is maintained and managed by the stack.
Stack
The stack specifically maintains the object pool data, providing the recycler with push and pop two primary access interfaces, which pop is used to eject a reusable object from within, and push is used to reclaim objects that can be reused later.
Weakorderqueue
The Weakorderqueue function can be represented by two interfaces, add and transfer. Add is used to put handler (the basic unit of object pool management) into a queue, and transfer is used to enter objects that can be reused to the stack. We can think of weakorderqueue as an object repository, with only one handle array maintained in the stack for direct service to recycler. When the object is not available from this array, it will look for the corresponding weakorderqueue and call its transfer method to supply the object to the stack.
Recycler Implementation principle
I first give a total, below if there is not understand the place can be combined with this picture to understand:
Represents the work of recycler. Recycler#get is an interface provided externally to get objects from an object pool:
public final t get () { stack<t> stack = threadlocal. get (); defaulthandle handle = stack.pop (); if (Handle == NULL) { handle = stack.newhandle (); handle.value = newobject (handle); } return (T) handle.value;}
Recycler first obtains the stack from the current thread bound value, we can know that netty in fact, each thread is associated with an object pool, the direct association object is a stack, first see if there are available objects in the pool, if there is a direct return, if not a new handle created, and call NewObject to create a new object and put it into Handler's value, NewObject is implemented by the user himself.
When Recycler uses the stack's pop interface, let's look at:
DefaulthandlePop () {IntSize =This.Sizeif (Size = =0) {if (!scavenge ()) {ReturnNull }Size =This.Size }size --; defaulthandle ret = elements[< Span class= "Hljs-keyword" >size]; if (ret.lastRecycledId != ret.recycleid) { Throw new illegalstateexception ( "recycled Multiple times "); } ret.recycleid = 0; ret.lastrecycledid = 0; this. Size = size; return ret;}
First look at whether the stack's elements array has objects available, and if so, the size minus 1 to return the object. If there are no objects available in the elements array, it is necessary to find the objects from the warehouse that are usable, that is, the scavenge implementation, scavenge specifically calls Scavengesome. The stack's warehouse is implemented by a linked list of Weakorderqueue, which maintains the head pointer of the linked list. And each weakorderqueue maintains a linked list, the node is implemented by link, the implementation of link is very simple, mainly inherit the Atomicinteger class there is also a handle array, a read pointer and a pointer to the next node, Link cleverly uses the Atomicinteger value to act as a write pointer to an array to avoid concurrency problems.
The object storage for the Recycler object pool is divided into two parts, the stack's handle array, and the weakorderqueue linked list that the stack points to.
private DefaultHandle[] elements;private volatile WeakOrderQueue head;private WeakOrderQueue cursor, prev;
The stack retains the head pointer and read cursor of the Weakorderqueue list. Each node of the Weakorderqueue list is a link, and each link maintains a handle array.
Reading and writing of objects in a pool
Getting an object from an object pool is primarily a handle array from the stack, while the fallback resource for the handle array originates from the weakorderqueue linked list. There are some differences between the elements array and the source of the objects in the Weakorderqueue list:
PublicvoidRecycle(){Thread thread = Thread.CurrentThread ();if (thread = =Stack.thread) {Stack.push (this);Return }We don ' t want to has a ref to the queue as the value of our weak map so we null it out; to ensure there are no races with restoring it later // we impose a memory ordering here (no-op on x86) map<stack<?>, weakorderqueue> delayedrecycled = delayed_recycled.get (); weakorderqueue queue = delayedrecycled.get (stack); if (queue == null) { delayedrecycled.put (stack, queue = new weakorderqueue ( stack, thread)); } queue.add ( this),
The
Recycle implementation from handle shows that if an object is reclaimed by a thread that owns a stack, the push method of the stack is called directly into the stack's array, and if it is reclaimed by another thread, the object is placed in the thread-associated <stack. In the weakorderqueue> queue, this queue is actually placed in the header of the stack-associated weakorderqueue list:
WeakOrderQueue(Stack<?> stack, Thread thread) { head = tail = new Link(); owner = new WeakReference<Thread>(thread); synchronized (stack) { next = stack.head; stack.head = this; }}
Each thread that does not have a stack reclaims the object and re-creates a weakorderqueue node into the stask associated Weakorderqueue linked table header. This eventually enables the multithreaded collection of objects to be put into the stack's associated weakorderqueue linked list, and the thread that owns the stack can read the objects supplied by other threads.
Simple test data to speak
Let's look at a comparison of performance data based on lightweight memory pooling and raw usage, and here's a simple, recyclable recyclablearraylist that Netty provides to compare with traditional ArrayList Because of the recyclablearraylist and traditional ArrayList advantages, the main reason is that when ArrayList objects are repeatedly created, recyclablearraylist are not actually created, but they are taken from the pool to be used. Each new operation of the ArrayList creates an object in the JVM's memory of the live ammunition, so we can imagine the use of ArrayList, the memory of the young generation is relatively frequent, for the simple period, our example does not involve direct memory technology, So the main area we care about is the improvement of GC frequency recovery, look at my two-segment test code:
Code Listing 1:
PublicStaticvoid Main (String ... s) {int i=0,Times =1000000;byte[] data =Newbyte[1024];while (i++ <Times) {Recyclablearraylist list = Recyclablearraylist.newinstance ();Intcount = 100; For (int j=0;j<count;j++) {list.add (data); } list.recycle (); System.out. println ("count:[" + count + "]"); Sleep (1); }}
Code Listing 2:
PublicStaticvoid Main (String ... s) {int i=0,Times =1000000;byte[] data =Newbyte[1024];while (i++ <Times) {ArrayList list =New ArrayList ();int count = 100 ; for ( int j=0;j<count;j++) { list.add (data); } system.out. println ( "count:[" + count + < Span class= "hljs-string" "]"); sleep (1 ); }
The above code is the same logic, loop 100w each time, each cycle to create a ArrayList object, put in 100 points to the 1kb size of the byte array reference, where the memory consumption is mainly the creation of ArrayList object, because the interior of ArrayList is an object array implementation, Therefore, the memory consumption is less, we can only through the rapid loop creation to achieve the effect of the memory gradient.
The image above is using the traditional ArrayList test data, the right figure is the test data using recyclablearraylist, for the non-cyclic ARRAYLIST,GC frequency compared to the use of recyclablearraylist GC frequency much higher, The above tool also gives a 16-time GC on the left with a time of 77.624ms while the 3 GC on the right figure takes 26.740ms.
Recycler Object Pool Summary
In Netty, all IO operations basically involve the use of buffers, whether heapbuffer or directbuffer above, and if these buffers cannot be reused, the consequences are conceivable. For heap memory, a relatively frequent GC is raised, while for direct memory it causes frequent buffer creation and recycling, which can cause significant performance loss for both buffers. Netty's lightweight object pool implementation based on threadlocal reduces the performance loss due to GC and allocation recovery to a certain extent, making the Netty thread run faster and with better overall performance.
The overall benefits of buffer implementation based on memory pool technology can be summarized as follows:
For the use of Pooledheapbuffer, the Netty can reuse the heap memory area, reducing the frequency of memory requests and also reducing the frequency of the JVM GC.
For Pooleddirectbuffer, Netty can re-use the direct memory area allocation buffer, which makes the use of direct memory in the original compared to the advantages of Heapbuffer to compensate for its own allocation and recovery costs relatively large shortcomings.
Netty essence of lightweight memory pool technology implementation principle and application