Multiple common decoders from 0 to 1▏netty codec framework using sample parsing

Source: Internet
Author: User
Tags data structures new set object serialization readable serialization ftp protocol jboss

From: http://weiyoux.com/keji/hulianwang/17822.html

Often we are also accustomed to calling encoding (Encode) serialization (serialization), which serializes objects into byte arrays for network transmissions, data persistence, or other purposes.

Conversely, decoding (Decode)/deserialization (deserialization) restores a byte array read from a network, disk, etc. to the original object (usually a copy of the original object) to facilitate subsequent business logic operations.

When making a remote cross process service invocation (for example, RPC calls), you need to encode or decode an object that needs to be transmitted over the network in order to complete a remote call using a specific codec technique. background: Common coding and decoding framework


Java serialization


The first serialization or codec technology that most Java programmers are exposed to is the Java Default-provided serialization mechanism. The Java object that needs to be serialized can only implement the Java.io.Serializable interface and generate a serialization ID, which is serializable and deserialized through Java.io.ObjectInput and Java.io.ObjectOutput.

Because of its simplicity and low development threshold, Java serialization has been widely used, but because of its many drawbacks, most RPC frameworks do not choose it.


The main features of the Java serial number are as follows:


cannot cross language

is the most deadly problem with Java serialization. For service invocations across processes, service providers may develop using C + + or other languages, and Java serialization is difficult when we need to interact with heterogeneous language processes. Because Java serialization technology is a private protocol within the Java language, it is not supported by other languages and is completely black box for users. Java serialized byte array, other languages can not be deserialized, which seriously hinders its scope of application;


the code stream after serialization is too large

For example, using binary codec technology to encode the same complex Pojo object, its code flow is only about 20% of Java serialization, the current mainstream coding and decoding framework, serialized after the code stream is much smaller than the native Java serialization;


poor serialization efficiency

in the same hardware conditions, the same Pojo object to do 100W serialization, binary encoding and Java native serialization performance comparison test as shown in the following figure: Java native serialization time is 16.2 times times the binary code, the efficiency is very poor.

Figure 1-1 Comparison of binary coding and Java native serialization performance

Google's protobuf


protobuf full name Google Protocol buffers, it is open source from Google, in Google's internal time-tested. It describes the data structure as a. proto file, and through the Code generation tool, you can generate Pojo objects corresponding to the data structure and PROTOBUF related methods and properties.

It features the following:


1 Structured data storage format (Xml,json, etc.);

2 Efficient codec and decoding performance;

3 Language-independent, platform-independent, good scalability;

4 The official support of Java, C + + and Python three languages.


first, let's take a look at why we don't use XML, although XML is very readable and scalable and well suited to describe data structures, the time overhead of XML parsing and the space overhead that XML sacrifices for readability are so high that it is not suitable for high-performance communication protocols. PROTOBUF uses binary encoding to have a greater advantage in space and performance.

Protobuf Another attractive place is its data description file and code generation mechanism, the advantages of using data description file to explain the data structure are as follows:

1 The text data structure Description language, can realize language and platform independent, especially suitable for the integration between heterogeneous systems;

2 by identifying the order of the fields, the forward compatibility of the protocol can be realized;

3 Automatic code generation, no need to manually write the same data structure of C + + and Java version, 4 convenient for subsequent management and maintenance. Structured documents are easier to manage and maintain than code.


the thrift of Apache


Thrift from Facebook, and in 2007 Facebook submitted thrift as an Open-source project to the Apache Foundation. For Facebook at the time, the creation of thrift was designed to address the large data transmission between the various systems of Facebook and the cross-platform nature of the system, so thrift could support a variety of programming languages, such as C + +, C #, Cocoa, Erlang , Haskell, Java, Ocami, Perl, PHP, Python, Ruby, and Smalltalk.

communication between many different languages, thrift can be used as a high-performance communication middleware, which supports data (object) serialization and multiple types of RPC services. Thrift is suitable for static data exchange, it needs to determine its data structure, when the data structure changes, you must edit the IDL file, generate code and compile, this is compared to other IDL tools can be considered as a thrift weakness. Thrift is suitable for building large data exchange and storage of common tools, for large systems in the internal data transfer, compared to JSON and XML in the performance and transmission size have obvious advantages.


The Go Thrift is made up of 5 main parts:


1 Language system and IDL compiler: responsible for generating the corresponding language interface code by the user-given IDL file;

2 TPROTOCOL:RPC protocol layer, you can select a variety of different object serialization methods, such as JSON and binary;

3 TTRANSPORT:RPC transmission layer, the same can choose different transport layer implementation, such as socket, NIO, memorybuffer, etc.

4) Tprocessor: As a link between the protocol layer and the Service implementation provided by the user, it is responsible for invoking the interface of the service implementation;

5) Tserver: Aggregation of Tprotocol, Ttransport and Tprocessor objects.


our focus is on the codec framework, which corresponds to the tprotocol. Because thrift RPC service calls are bound together with the codec framework, we usually take the RPC framework when we use thrift. However, its Tprotocol codec framework can be used independently as a class library.

similar to PROTOBUF, thrift describes interfaces and data structure definitions through IDL, which supports 8 Java basic types, maps, sets, and lists, supports optional and required definitions, and is very powerful. Because the order of the fields in the data structure can be defined, it can also support forward compatibility of the protocol.


the Thrfift supports three more typical codec types:


1) Universal binary codec;

2 compression of binary codec;

3 optimized optional field compression codec.


because of the support of binary compression codec, thrift codec performance is also excellent, far more than Java serialization and RMI and so on.

JBoss marshalling


JBoss marshalling is a serialized API package for Java objects that fixes many of the problems with the JDK's own serialization package, However, it maintains compatibility with the Java.io.Serializable interface, adding some tunable parameters and additional features, and these parameters and attributes can be configured through the factory class.


compared to the traditional Java serialization mechanism, its advantages are as follows:


1 Pluggable class parser, provide more convenient class loading customization strategy, through an interface can be customized;

2 Pluggable Object replacement technology, does not need to inherit the way;

3 The Pluggable predefined class cache table can reduce the length of the serialized byte array and improve the serialization performance of common types of objects;

4 without realizing the Java.io.Serializable interface, Java serialization can be realized;

5 Improve the serialization performance of the object through caching technology.


JBoss marshalling is more used within JBoss than the two codec frameworks described earlier, with limited application scope.

other coding and decoding framework


In addition to the above mentioned codec framework and technology, more commonly used are messagepack, Kryo, Hession and JSON and so on. Limited to the space limit, no longer one by one enumeration, interested friends can access the relevant information to learn. Netty codec framework: Netty Why to provide codec framework


As a high performance asynchronous and NIO communication framework, the codec framework is an important part of Netty. Although the Netty is not part of the micro-kernel, the codec framework is indispensable for the channelhandler of the coding and decoding framework.

Let's take a look at the topic in detail from a few angles, first of all, to see the logical architecture diagram of Netty:

Figure 2-1 Netty logical architecture Diagram

The inbound message read from the network needs to be decoded to convert the binary datagram into Application layer protocol message or business message, which can be recognized and processed by the application logic of the upper layer. Similarly, the user sends a outbound business message to the network, It needs to be encoded into a binary byte array (for Netty is bytebuf) to be able to be sent to the network end-to-end. The encoding and decoding functions are an integral part of the NIO framework, which is essential, whether it is implemented by a Business customization extension or the NIO framework's built-in codec capabilities.

in order to reduce the user's development difficulty, netty the commonly used functions and APIs to mask the underlying implementation details. Coding and decoding functions of customization, for familiar with the netty of the implementation of the bottom of the developers, directly based on channelhandler expansion of development, not very difficult. But for most beginners or users who are unwilling to understand the underlying implementation details, they need to provide them with simpler class libraries and APIs rather than Channelhandler.

Netty has done a very good job in this area, for the codec function, it provides a common codec framework for users to expand, but also provides a common codec class library for users to directly use. On the basis of ensuring customization extensibility, the user's development workload and development threshold are minimized and the development efficiency is improved.


Netty Preset codec function list is as follows: Base64, Protobuf, JBoss marshalling, spdy, etc.

Figure 2-2 Netty List of codec functions


Netty Codec framework: commonly used decoders


Linebasedframedecoder Decoder


Linebasedframedecoder is a carriage return newline decoder, and if the user sends a message with a carriage return line feed as the end of the message, you can decode the message directly using Netty Linebasedframedecoder. It is only necessary to add linebasedframedecoder to the channelpipeline when initializing the Netty server or client, without having to implement a new set of newline decoders.

Linebasedframedecoder's work is that it sequentially iterates through the readable bytes in the bytebuf to see if there are "n" or "RN", and if so, this position is the end position, and the byte from the readable index to the ending position interval constitutes a row. It is a decoder that ends with a newline character, supports two decoding methods that carry a terminator or does not carry a terminator, and supports the maximum length of the configured single line. If a newline character is still not found after continuous reading to the maximum length, an exception is thrown, ignoring the exception stream that was read before. Prevents system memory overflow because the datagram does not carry a line break to receive an unlimited backlog of bytebuf.


its use is as follows:

Before decoding: +------------------------------------------------------------------+ Received datagram "This is a netty example for using The NIO framework.rn when you "+------------------------------------------------------------------+ After decoding the Channelhandler received the following object: +------------------------------------------------------------------+ text message after decoding This is a netty example for using the NIO framework. " +------------------------------------------------------------------+



Here's a quick example of how to use a text-wrapping decoder:

In general, Linebasedframedecoder will be used in conjunction with Stringdecoder, combined into a text decoder by line switch, for text-class protocol parsing, text-wrapping decoder is very practical, such as the HTTP message header parsing, FTP protocol message resolution.

@Overrideprotected void Initchannel (Socketchannel arg0) throws Exception {Arg0.pipeline (). AddLast (New Linebasedframedecoder (1024)); Arg0.pipeline (). AddLast (New Stringdecoder ()); Arg0.pipeline (). AddLast (New Userserverhandler ());}


Delimiterbasedframedecoder decoder

when initializing the channel, first add the Linebasedframedecoder to the Channelpipeline and then add the string decoder in turn Stringdecoder, the business handler.

The Delimiterbasedframedecoder is a delimiter decoder that allows the user to specify the delimiter that ends the message, and it can automatically decode the message that is identified by the delimiter as the end of the code stream. The carriage return newline decoder is actually a special Delimiterbasedframedecoder decoder.

Separator decoder in the actual work also has a wide range of applications, the author engaged in the telecommunications industry, many simple text private protocol, are the special separator as the end of the message of the identity, especially for those who use long connection based on the text of the private protocol.

designation of delimiters: Unlike everyone's custom, the delimiter is not based on char or string as a constructor parameter, but Bytebuf, and here we give a practical example of its usage.

If the message is "$_" as a delimiter, the server or client initializes the Channelpipeline code instance as follows:

@Overridepublic void Initchannel (Socketchannel ch) throws Exception {bytebuf delimiter = unpooled. Copiedbuffer ("$_". Get Bytes ()); Ch.pipeline (). AddLast (New Delimiterbasedframedecoder (1024, delimiter)); Ch.pipeline (). AddLast (New Stringdecoder ()); Ch.pipeline (). AddLast (New Userserverhandler ());}


First, convert "$_" to a Bytebuf object, construct the Delimiterbasedframedecoder as a parameter, and add it to the channelpipeline. Then add the string decoder (usually used for text decoding) and user handler, note that the decoder and handler add order, if the order is reversed, will cause message decoding failed.

delimiterbasedframedecoder principle Analysis: When decoding, Determines whether the currently read BYTEBUF contains a separator bytebuf, if it is included, intercepts the corresponding bytebuf return, the source code is as follows: </

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.