The basic data structure of flink memory management source Interpretation

Last Update:2016-03-26 Source: Internet

Author: User

Tags throwable

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Overview

In the field of distributed real-time computing, how to make the framework/engine to be efficient enough to access and process large amounts of data in memory is a very difficult problem. Flink is undoubtedly doing very well in dealing with this problem, Flink's autonomic memory management design may be higher than its own popularity. Just recently in the study of Flink source, so open two articles to talk about Flink memory management design.

The highlights of Flink's memory management are in the form of Java-based programs that are implemented in Scala, and a functional programming language that follows the JVM specification and relies on the JVM's interpretation, but manages the memory independently and does not rely entirely on the JVM's memory management mechanism. Its advantage lies in the flexibility, for big data scenarios, to avoid (uncontrolled) frequent GC-led performance fluctuations, to some extent out of the limitations of the JVM, is a way of thinking on the development.

Basically we have divided the Flink memory design into two parts (following the package partitioning method):

Basic data Structure (package:org.apache.flink.core.memory)
Memory management mechanism (package:org.apache.flink.runtime.memory)

We will explain separately, this article mainly focuses on the basic data structure. Memory management mechanism Please wait for subsequent article analysis.

Is the diagram of all classes in the package:

Among them: MemorySegment , HeapMemorySegment , is the HybridMemorySegment most critical of the three classes, we will focus on analysis.

Flink the type of memory abstracted out

Flink abstracts its managed memory into two types (the main abstract-based memory location):

HEAP:JVM Heap Memory
Off_heap: Non-heap memory

This is defined as an enumeration type in Flink: MemoryType .

Memorysegment

The memory managed by Flink is abstracted as a data structure: MemorySegment .

Thus, Flink provides two implementations of this:

Heapmemorysegment: Managed memory or part of JVM heap memory
Hybridmemorysegment:hybrid (On-heap or Off-heap) memorysegment, the memory may or may not be the JVM heap memory.

Related fields for memorysegment:

UNSAFE: Used to operate on heap/non-heap memory, which is a non-secure API for the JVM
Byte_array_base_offset: The starting index of the binary byte array, relative to the byte array object
Little_endian: Boolean value, whether it is a small-end alignment (involves a byte order problem)
Heapmemory: If it is a heap memory, it points to a reference to the memory that is accessed, or null if memory is not heap memory
Address: The relative address of the byte array (if Heapmemory is null, which may be the absolute address of the off-heap memory, followed by an explanation)
Addresslimit: Identify address end location (address+size)
Size: Number of bytes in a memory segment

Where Little_endian gets the byte order of the current operating system, it is a Boolean value, and many subsequent put/get operations need to be judged by the Bigedian (big end) or Littleedian (small).

Questions about the byte order, if you do not understand, please Google

Enter the code topic to provide two constructors for on-heap memory and OFF-HEAP memory:

Also, a lot of get/put methods are provided, and most of these getxxx/putxxx call unsafe.getxxx/unsafe.putxxx directly or indirectly. These methods of handling different memory types are implemented in the public MemorySegment .

More than that, of course, it's just part of it.

The specific memory accesses are implemented in two of their respective classes.

There are also three notable ways to look at the Memorysegment class:

     Public Final void CopyTo(intOffset, memorysegment target,intTargetoffset,intNumBytes) {Final byte[] Thisheapref = This. heapmemory;Final byte[] Otherheapref = target.heapmemory;Final LongThispointer = This. address + offset;Final LongOtherpointer = target.address + targetoffset;if((numbytes | offset | targetoffset) >=0&& Thispointer <= This. addresslimit-numbytes && otherpointer <= target.addresslimit-numbytes) {Unsafe.copymemo        Ry (Thisheapref, Thispointer, Otherheapref, Otherpointer, numbytes); }Else if( This. address > This. addresslimit) {Throw NewIllegalStateException ("This memory segment has been freed."); }Else if(Target.Address > Target.addresslimit) {Throw NewIllegalStateException ("Target memory segment has been freed."); }Else{Throw NewIndexoutofboundsexception (String.Format ("offset=%d, targetoffset=%d, numbytes=%d, address=%d, targetaddress=%d", offset, targetoffset, numbytes, This. Address, target.address)); }    }

This is a bulk copy method that starts copying NumBytes length bytes from the offset offset of the current memory segment to the beginning of targetoffset from the target memory segment.

     PublicFinalint Compare(Memorysegment SEG2,intOffset1,intOffset2,intLen) { while(Len >=8) {LongL1 = This. Getlongbigendian (OFFSET1);LongL2 = Seg2.getlongbigendian (Offset2);if(L1! = L2) {return(L1 < L2) ^ (L1 <0) ^ (L2 <0) ? -1:1; } Offset1 + =8; Offset2 + =8; Len-=8; } while(Len >0) {intB1 = This.Get(OFFSET1) &0xFF;intB2 = seg2.Get(Offset2) &0xFF;intCMP = B1-B2;if(CMP! =0) {returncmp            } offset1++;            offset2++;        len--; }return 0; }

A self-implemented comparison method for comparing data with the current memory segment offset Offset1 length len with the SEG2 offset starting bit offset2 length of Len.

Here are two while loops:

The first while is a byte-by-bit comparison, and if Len's length is greater than 8, the long-shaped representation of the data from the respective starting offset is compared, and if the equality is shifted back by 8 bits (one byte) and the length is reduced by 8, the cycle repeats.
The second loop compares the last remaining less than one byte (eight bits), so it is a bitwise comparison

     Public Final void swapbytes(byte[] Tempbuffer, Memorysegment SEG2,intOffset1,intOffset2,intLen) {if((Offset1 | offset2 | len | (Tempbuffer.length-len)) >=0) {Final LongThispos = This. address + Offset1;Final LongOtherpos = seg2.address + Offset2;if(Thispos <= This. Addresslimit-len && Otherpos <= seg2.addresslimit-len) {//This, temp bufferUnsafe.copymemory ( This. Heapmemory, Thispos, Tempbuffer, Byte_array_base_offset, Len);// Other---Unsafe.copymemory (Seg2.heapmemory, Otherpos, This. Heapmemory, Thispos, Len);//Temp buffer , otherUnsafe.copymemory (Tempbuffer, Byte_array_base_offset, Seg2.heapmemory, Otherpos, Len);return; }Else if( This. address > This. addresslimit) {Throw NewIllegalStateException ("This memory segment has been freed."); }Else if(Seg2.address > Seg2.addresslimit) {Throw NewIllegalStateException ("Other memory segment have been freed."); }        }//index is in fact invalid        Throw NewIndexoutofboundsexception (String.Format ("offset1=%d, offset2=%d, len=%d, buffersize=%d, address1=%d, address2=%d", Offset1, Offset2, Len, Tempbuffer.length, This. Address, seg2.address)); }

This method is used to exchange a piece of data from two memory segment. In addition to some boundary value judgments, it is a data exchange with the help of temporary variables, but unsafe.copyMemory instead of the assignment number.

Below we will explore the two types of memory management provided by Flink:on-heap and off-heap.

Heapmemorysegment

Memory segment, which is based on JVM heap memory (ON-HEAP), is Flink's earliest self-management mechanism. The class internally defines a reference to a byte array pointing to that memory segment, and the implementations of the previously mentioned MemorySegment abstract methods in the class are manipulated based on references to the internal byte array to obtain built-in rather than additional self-implementation checks (such as array out-of-bounds, etc.). What does that mean? When you define

privatebyte[] memory

When the memory points MemorySegment to the heapmemory in the implementation of this method similar to the following

    publicfinalbyteget(int index) {        returnthis.memory[index];    }

You can use the JVM's own mechanism to determine whether index is between 0 and length-1. Instead of using attributes such as address to determine the scope of the index, such as the above method is implemented in this way HybridMemorySegment

     Public byteGetint Index) {Final Longpos = address +Index;if(Index>=0&& Pos < Addresslimit) {returnUnsafe.getbyte (Heapmemory, POS); }Else if(Address > Addresslimit) {Throw NewIllegalStateException ("segment has been freed"); }Else{//index is in fact invalid            Throw NewIndexoutofboundsexception (); }    }

This implementation must check the boundary value so itself.

Because it is the JVM's heap memory, many method calls can take advantage of the JDK's own methods, such as array copies

    @Override     Public Final void Get(intIndexbyte[] DST,intOffsetintLength) {//System arraycopy does the boundary checks anyways, no need to check extraSystem.arraycopy ( This. Memory, index, DST, offset, length); }@Override     Public Final void put(intIndexbyte[] SRC,intOffsetintLength) {//System arraycopy does the boundary checks anyways, no need to check extraSystem.arraycopy (SRC, offset, This. memory, index, length); }

Other methods of implementation are very conventional, there is not much to mention the point of the place.

Hybridmemorysegment

This is another memory management implementation: It supports both on-heap memory and off-heap memory. At first glance, it seems strange, because there is already a realization of the on-heap, why do you want to engage in a hybrid, rather than off-heap? And it can be confusing to manipulate two different areas of memory in a class.

So let's take a look at how Flink is "graceful" in avoiding chaos. This is also due to the range of methods provided by the non-Secure operation class (unsafe) provided by the JVM

...)

These methods have the following characteristics:
(1) If the object o is not NULL, and the subsequent address or position is relative, then the relative position of the current object (such as an array) is directly manipulated, since this object is not NULL, then this situation naturally satisfies the on-heap scene;
(2) If the object o is null, and the subsequent address is the absolute address of a block of memory, the invocation of these methods is also equivalent to manipulating the block of memory. Here the object o is null, and the memory block that is being manipulated is not the JVM heap memory, which satisfies the off-heap scenario.

Remember MemorySegment the two properties that we mentioned when we introduced the class:

Heapmemory
Address

These two attribute combinations can be adapted to the above two scenarios. Moreover, MemorySegment one of the construction parameters:offheapaddress , has basically pointed out that the constructor is specifically for the off-heap.

MemorySegmentSome public implementations for specific data types are given, most of which also call the unsafe method with the above characteristics, so in fact, it has MemorySegment Hybrid meaning.

The problem is, so how does Flink get the memory address of some off-heap data? The answer is in the following code snippet

 /** the reflection fields with which we Access the Off-heap pointer from direct bytebuffers */ private  static  final  Field Address_field; static  {try  {Address_field = Java.nio.Buffer.class.getDeclaredField ( "address" );        Address_field.setaccessible (true ); } catch  (Throwable t) {throw  new  runtimeexception ( "Cannot initialize Hybridme        Morysegment:off-heap memory is incompatible with the this JVM. ", T); }    }

The

Obtains the field representation of the address property by reflecting the buffer class, and then

 private   Static  long  getaddress  (bytebuffer buffer {if  (buffer = = null ) {throw  new  nullpointerexception (         "buffer is null" ); } try  {return  (Long) Address_field. get         (buffer); } catch  (Throwable t) {throw  new  runtimeexception ( "Could not access direct byte buffer        Address. ", T); }    }

The address of the off-heap that gets a buffer is indicated.

Although the above MemorySegment two attributes are combined with the particularity of the unsafe correlation method, the implementation of hybridmemorysegment is clear and concise. But it also maintains a reference to the OFF-HEAP data it manages: Offheapbuffer. On the one hand is to hold that memory space is not released, on the other hand is to implement some of their own methods.

Memorysegmentfactory

MemorySegmentFactoryis used to create MemorySegment , and Flink seriously recommends using it to create MemorySegment instances instead of manual instantiation. The purpose of this is to allow the runtime to have only one instance of the subclass implementation of a memorysegment , rather than MemorySegment an instance of the two subclasses, as this would allow the JIT to have the overhead of loading and selecting, resulting in a significant decrease in performance. In this regard, Flink official blog dedicated to a blog to explain their comparison and test plan, see the final reference.

Memorysegmentfactory Related class diagrams

Such as:

Obviously, this is the factory method pattern in design mode.

MemorySegmentFactoryThere is an internal interface class in which the Factory MemorySegment internal classes of the two implementation classes implement the interface individually and define their own Factory implementations. This block is not special, just to prevent external direct instantiation and their HybridMemorySegmentFactory HeapMemorySegmentFactory respective constructors are set to private.

MemorySegmentFactoryThe class provides a Factory method similar to an interface, or a layer of logic that specifies a Factory specific instance (basically each method calls the ensureInitialized method first):

    privatestaticvoidensureInitialized() {        ifnull) {            factory = HeapMemorySegment.FACTORY;        }    }

As can be seen from the above, the MemorySegmentFactory default is to use an instance of the HeapMemorySegment class to implement MemorySegment .

View-built abstraction over memorysegment

In addition to MemorySegment the related implementations, Flink's core package also provides a higher abstraction built on top of it MemorySegment : DataView (Data View).

Data View-related class diagrams:

There are two interfaces, respectively, for the output view DataOutputView (data write related), input view DataInputView (data read-related). Each of the two interfaces provides a position-based seek action (that is, a data read and write operation at the specified location), respectively. In addition, there are two implementation classes, each of which wraps the corresponding stream interface. This piece is nothing special, do not do too much explanation.

The above is an interpretation of the data structure part of Flink's self-managed memory.

Reference

[1]https://flink.apache.org/news/2015/09/16/off-heap-memory.html

Focus on Flink public to get more flink of the special interpretation

Search Public Number: Apache_flink
Scan code Attention:

The basic data structure of flink memory management source Interpretation

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More