Hadoop Serialization System

Source: Internet
Author: User
Keywords Rita value java OK xml

This article is my second time reading Hadoop 0.20.2 notes, encountered many problems in the reading process, and ultimately through a variety of ways to solve most of the. Hadoop the whole system is well designed, the source code is worth learning distributed students read, will be all notes one by one post, hope to facilitate reading Hadoop source code, less detours.

1 Serialization Core Technology

The objectwritable in 0.20.2 version Hadoop supports several types of data format serialization:

Data Type sample Description

ObjectWritable.NullInstanceObjectWritable.NullInstance this type of internal record has a null-owning data type

stringstring Call utf8.writestring Write UTF8 encoded string

Java basic type Byte,char,short,int,long,float,double,bool, different types of treats

Enumenum writes the name of the enumeration to restore

Writablevintwritable, Objectwritable.nullinstance implements the data type of the writable interface

Array elements must be written to each element by the above type

As you can see from this list, the current version of Hadoop (which, if you write Hadoop directly later, refers to the current version of 0.20.2) is the core of the serialization writable type. Writable is an interface that defines two interface methods, all classes that implement this interface must implement both methods, and the object of the class that implements this interface can be serialized, now look at the definition of the writable interface:

Public interface Writable {


* Serialize The fields of this object to out.


* @param out Dataouput to serialize this object into.

* @throws IOException


void Write (DataOutput out) throws IOException;


* Deserialize the fields of this object.



For efficiency, implementations should attempt to re-use storage in the

* Existing object where possible.


* @param in Datainput to deseriablize this object from.

* @throws IOException


void ReadFields (Datainput in) throws IOException;


Writabe defines two methods, one for serializing the object to the output stream, and one for restoring the object by reading a byte stream from the input stream. This is the core of all serialization systems: serializing all types of data for convenience on storage devices, or transmission over the network. There are many types of implementations of writable interfaces in Hadoop, some of which are not useful elsewhere, some are obsolete, and the following are a list of these types:



Figure 1 Types of writable interfaces implemented in Hadoop

2 Type Object Size comparison

In the list in Figure 1, you can see the Writablecomparable interface, which inherits the writable and comparable interfaces, implements the definition of serialization and size comparisons, and the Writablecomparable interface is as follows:

Public interface Writablecomparable extends writable, comparable {


Many data types inherit this class to provide serialization and size comparisons, with two types associated with size comparisons: interface Writablecomparable and Class Writablecomparator, all types that implement writablecomparable interfaces, will implement the following code:

/** compares NonBlank intwritables. */

public int compareTo (Object o) {

int thisvalue = This.value;

int thatvalue = ((intwritable) O). Value;

Return (Thisvalue


/** A Comparator optimized for intwritable. */

public static class Comparator extends Writablecomparator {

Public Comparator () {

Super (Intwritable.class);


public int Compare (byte] b1, int s1, int L1,

BYTE] b2, int s2, int l2) {

int thisvalue = ReadInt (B1, S1);

int thatvalue = readInt (b2, S2);

Return (Thisvalue



static {//register this comparator

Writablecomparator.define (Intwritable.class, New Comparator ());


The first CompareTo is the method implementation of the comparable interface, and then an internal static class is defined. The CompareTo method is compared based on object values, and this internal static is compared based on bytes, but the method that the actual comparison of the internal static class compare or converts bytes to the int type number is compared, which provides support for serialization.

This code is finally a section of static initialization code area. Any type that implements the Writablecomparable interface will have a CompareTo method, a comparator inner class, and a single line of code that statically initializes the code area. This line of code can be seen as the registration of all implementation writablecomparable. Registration is simply passing the class attribute of the current classes, as well as an writablecomparable implementation of the inner class comparator. In Writablecomparator, its internal static class comparator is saved for each writablecomparable class instead of an object. Comparator is an internal static class that means that it does not rely on instance members and methods of the perimeter class. So multiple objects of the same class can share a comparator.

3 Writable Type Factory

The Hadoop serialization system defines a writablefactory interface and a Writablefactories class, and the only form of implementing the Writablefactory interface is the following code:

static {//Register a ctor



New Writablefactory () {

Public writable newinstance () {return new blocklocation ();}



The goal is to register all writable types into writablefactories, using Writablefactories to create writable objects registered here, I think this unified management approach mainly provides the convenience of the writable type available to the management system. If the system is large and writable type objects are scattered around the system, it is sometimes not intuitive to see whether a type is a writable type.

4 Objectwritable Type

The core class of the Hadoop serialization class is the Objectwritable class, a synthesizer of data that handles all types of data, and because serialization writes and reads are reverse processes, write code is only described here:

/** Write a {@link writable}, {@link String}, primitive type, or an array of

* the preceding. */

public static void WriteObject (DataOutput out, Object instance,

Class Declaredclass,

Revisit conf) throws IOException {

if (instance = = NULL) {//null

Instance = new Nullinstance (declaredclass, conf);

Declaredclass = Writable.class;


Utf8.writestring (out, Declaredclass.getname ()); Synch Write declared

if (Declaredclass.isarray ()) {//array

int length = Array.getlength (instance);

Out.writeint (length);

for (int i = 0; i < length; i++) {

WriteObject (out, Array.get (instance, i),

Declaredclass.getcomponenttype (), Conf);


else if (Declaredclass = String.class) {//String

Utf8.writestring (out, (String) instance);

else if (declaredclass.isprimitive ()) {//primitive type

if (Declaredclass = = Boolean.type) {//Boolean

Out.writeboolean (((Boolean) instance). Booleanvalue ());

else if (Declaredclass = = Character.type) {//Char

Out.writechar ((Character) instance). Charvalue ();

else if (Declaredclass = Byte.type) {//Byte

Out.writebyte (((Byte) instance). Bytevalue ());

else if (Declaredclass = Short.type) {//short

Out.writeshort ((short) instance). Shortvalue ());

else if (Declaredclass = = Integer.type) {//INT

Out.writeint (((Integer) instance). Intvalue ());

else if (Declaredclass = Long.type) {//Long

Out.writelong (((Long) instance). Longvalue ());

else if (Declaredclass = Float.type) {//Float

Out.writefloat (((Float) instance). Floatvalue ());

else if (Declaredclass = Double.type) {//Double

Out.writedouble ((Double) instance) Doublevalue ());

else if (Declaredclass = = Void.type) {//Void

else {

throw new IllegalArgumentException ("Not a Primitive:" +declaredclass);


else if (Declaredclass.isenum ()) {//enum

Utf8.writestring (Out, ((Enum) instance). Name ());

else if (Writable.class.isAssignableFrom (Declaredclass)) {//writable

Utf8.writestring (out, Instance.getclass (). GetName ());

((writable) instance). Write (out);

else {

throw new IOException ("Can" T write: "+instance+" as "+declaredclass);



Meaning of each parameter:

Out-serializes the output stream;

Instance-an instance of this objectwritable class to serialize;

Type of declaredclass-instance;


The following is the operational logic for this method:

First, if instance is null, a nullinstance is created, Declaredclass is the type, and the Declaredclass is assigned writable.class;

Second, write the class name (the name of the Declaredclass), and then, depending on the type, the write policy is different:

1, if the instance is an array, first write the array length, and then write each element individually (same call writeobject);

2, if instance is a string, then call utf8.writestring write UTF word;

3, if the instance is the basic type, then write the specific data according to the situation, for example, if it is a Boolean, you may write 0 or 1;

4. If instance is an enumeration type, the name of the enumeration is written;

5. If instance is a writable type, write the instance class name before calling its write serialization method.

Note: When UTF8 a character string, write the short type character length before writing the actual byte sequence generated by the UTF8 character. So you can serialize strings using UTF8.

5 Other serialization Systems

A good serialization framework should be extreme in time and space, with scalability in mind and compatibility issues between old and new protocols. The following are the performance data for some common serialization frameworks (from this famous rating:):



5.1 Hessian

The Hessian is all called the Hessian Binary Web service protocol, which is a lightweight Web services framework provided by the Caucho Open source RPC framework: http://hessian.caucho.com. The Hessian base uses HTTP communication, and the servlet exposes the service, its communication efficiency is higher than that of WebService and Java.

The Hessian client completes one call processing process:


As you can see from here, for a client invocation request, the Hessian client serializes the request, sends the serialized data stream to the server through an HTTP request, starts executing the corresponding service after the server deserializes the precipitation request, and sends the execution result to the client through the same serialization and deserialization.

Sample code See Resources, all the examples on the Web deploy only one service, how to deploy multiple services? For example, the following web.xml configuration:









If you are configuring multiple services, you can copy the first servlet node multiple copies.


[1] Official website

[1] Hessian parsing and application (integrated spring)

[2] Hessian Introduction

[4] Charming Hessian, you need to understand

[5] Hessian Protocol-dubbo-alibaba Open Sesame

5.2 Kryo

Kryo is a fast and efficient graphical serialization framework for Java objects, featuring performance, efficiency, and ease of use. This project is used to serialize objects to files, databases, or networks.

From the above performance data can be seen, kryo both speed and space, the advantages are very obvious. But the net looks like Taobao Daniel said that this framework only Java implementation (not the problem), serialized bytecode does not contain field metadata, so in the old and new agreement between the difficult to do compatibility processing.

Sample code:

Kryo Kryo = new Kryo ();

// ...

Output output = new output (new FileOutputStream ("File.bin"));

SomeClass Someobject = ...

Kryo.writeobject (output, someobject);

Output.close ();

// ...

Input input = new input (new FileInputStream ("File.bin"));

SomeClass someobject = kryo.readobject (input, someclass.class);

Input.close ();

You can see that this is very easy to use.


[1] Official website

[2] The Kryo of the Java serialization framework

[3] Java object Serialization Framework Kryo

[4] Why kryo faster than Hessian

5.3 Protostuff-runtime

Google's Protobuf is an excellent serialization tool that is small in size after cross-language, fast, and serialized.

One drawback of protobuf is that it requires a precompiled process of data structures, first to write a configuration file in the proto format, and then to generate code for various language responses through the tools provided by PROTOBUF. Because Java has the ability to reflect and dynamic code generation, this precompilation process is not necessary and can be implemented when code executes.

Protostuff is based on Google Protobuf, but provides more functionality and simpler usage. Where Protostuff-runtime implements the ability to PROTOBUF serialization/deserialization of Java beans without precompiling:

Schema schema = Runtimeschema.getschema (Foo.class);

Linkedbuffer buffer = Getapplicationbuffer ();


try {

BYTE] Protostuff = Protostuffioutil.tobytearray (foo, schema, buffer);

finally {

Buffer.clear ();



Foo f = new Foo ();

Protostuffioutil.mergefrom (Protostuff, F, schema);

The limitation of Protostuff-runtime is that the schema must be passed in prior to serialization, and deserialization is not responsible for the creation of the object, only for replication, and therefore a default constructor is required. In addition, Protostuff can be serialized as a json/yaml/xml format by PROTOBUF configuration.


[1] Official website | Wiki Protostuffruntime | Wiki things you need to know

[2] Java serialization Lib Protostuff

5.4 Msgpack

Msgpack is a binary, efficient object serialization framework, similar to JSON but faster and smaller than JSON. Msgpack client-supported languages are particularly rich and are supported by common languages. Compared to JSON, the serialization and deserialization time is less than one-third, and the resulting serialized file is half the size. Official propaganda is 4 times times faster than Google Kyoto buffers.

Sample code:

Create Serialize objects.

List src = new ArrayList ();

Src.add ("Msgpack");

Src.add ("Kumofs");

Src.add ("Viver");

Messagepack msgpack = new Messagepack ();


BYTE] raw = Msgpack.write (SRC);

Deserialize directly using a template

List dst1 = Msgpack.read (Raw, Templates.tlist (templates.tstring));

System.out.println (dst1.get (0));

System.out.println (Dst1.get (1));

System.out.println (Dst1.get (2));

Or, Deserialze to Value then convert type.

Value dynamic = Msgpack.read (raw);

List Dst2 = new Accept (dynamic)

. Read (Templates.tlist (templates.tstring));

System.out.println (dst2.get (0));

System.out.println (Dst2.get (1));

System.out.println (Dst2.get (2));

Reference documents:

[1] msgpack.org

[2] Msgpack

[3] A serialization package that is 10 times times faster than JSON: Msgpack

[4] Binary data format messagepack: Faster and lighter than JSON

5.5 Json/jackson/databind

The Jackson project contains components: Core streaming Parser/generator, Jackson annotations and Jackson Data Processor, This databind library for JSON belongs to the Jackson Data Processor. The most commonly used feature of Json/jackson/databind is to construct Pojo (Plain old Java object) based on JSON, or, conversely, to serialize Pojo objects into JSON strings.

Here is the sample code:

Note:can use getters/setters as; Here we are ethically use public fields directly:

public class MyValue {

public String name;

public int age;

Note:if using Getters/setters, can keep fields ' protected ' or ' private '


Declare an object of type Com.fasterxml.jackson.databind.ObjectMapper for serialization and deserialization

Objectmapper mapper = new Objectmapper (); Create once, reuse

Deserialization, reading object data from JSON to class object

MyValue value = Mapper.readvalue (New File ("Data.json"), Myvalue.class);


Value = Mapper.readvalue (New URL ("Http://some.com/api/entry.json"), Myvalue.class);


Value = Mapper.readvalue ("{\ name\": \ "bob\", \ "age\":} ", Myvalue.class);

serialization, saving object information to JSON

Mapper.writevalue (New File ("Result.json"), Myresultobject);


BYTE] jsonbytes = Mapper.writevalueasbytes (Myresultobject);


String jsonstring = mapper.writevalueasstring (Myresultobject);

As you can see, using interfaces is still very simple. Jackson also supports generic collections and tree models. See Resources for details:

[1] Official website

5.6 Json/flexjson

Flexjson is a lightweight serialization and deserialization framework for serializing Java objects into JSON, or vice versa. Unlike other serialization systems:

Reference Flexjson can control whether a deep copy or a shallow copy of an object it serializes. Most JSON serialization tools try to serialize all the related objects in the entire object network of the object to JSON text, which sometimes poses problems, such as when you want to get a connection object on the server side, because the general JSON serialization tool transmits the entire network of objects, So this connection object may not be sent to the client. So serializing an object in the Object-oriented model and sending the object over the network can be troublesome;

(2) Many JSON serialization tools require a lot of boilerplate code to be written every time serialization or deserialization, Flexjson to minimize such boilerplate code by providing a higher-level API.

The entity classes used by all of the following code are the classes in this picture above. Users can control the depth of JSON serialization, such as a shallow copy by default, to see that a shallow copy serializes a direct member, as if the collection is not serialized, and the custom object may not be serialized:

Code output results

Public String dosomething (Object arg1, ...) {

Person p = ... load a person ...;

Jsonserializer serializer = new Jsonserializer ();

Return Serializer.serialize (P);


"Class": "Person",

"Name": "William Shakespeare",


"Nickname": "Bill"


The following code adds serialization to Phonenumbers:

Public String dosomething (Object arg1, ...) {

Person p = ... load a person ...;

return new Jsonserializer (). Include ("Phonenumbers"). Serialize (P);


Output results:


"Class": "Person",

"Name": "William Shakespeare",


"Nickname": "Bill"

"Phonenumbers": [


"Class": "Phone",

"Name": "Cell",

"Number": "555-123-4567"



"Class": "Phone",

' Name ': ' Home ',

"Number": "555-987-6543"



"Class": "Phone",

"Name": "Work",

"Number": "555-678-3542"




The Include method of the Jsonserializer method can accept multiple parameters, so that multiple members can be optionally serialized:

Public String dosomething (Object arg1, ...) {

Person p = ... load a person ...;

return new Jsonserializer (). Include ("Phonenumbers", "addresses"). Serialize (P);


You can even specify serialization-only Addresses.zipcode for serialization, and Flexjson can intelligently identify how it should be serialized. In addition, Flexjson can customize the string of JSON in many formats to meet the special needs of different systems, such as JS. Flexjson official information is of great reference value and can be considered for later translation. Specific references:

[1] Official website

[2] JSON converted to object, object converted to JSON Flexjson flexible use

5.7 Json/google-gson/databind

Gson is a Google Open source project that converts Java objects to JSON or converts JSON to Java objects. Gson can serialize and deserialize arbitrary Java objects, even if there is no source code for the object.

There are many open source projects that can serialize and deserialize between Java objects and JSON, most of which require you to add Java annotations before the class declarations, and you cannot serialize them without Java source code. Many serialization frameworks do not fully support generics, Gson these two as the most important design goals.

Gson's design objectives are as follows:

(1) Provide a simple Tojson () and Fromjson () call interface to convert Java objects to JSON, or vice versa;

(2) Convert an existing object that cannot be changed to JSON;

(3) Extensive support for Java generics;

(4) can be customized to express objects;

(5) supports serialization of arbitrary complex objects, including the serialization of complex relationships of depth inheritance hierarchy and generics.


[1] Official website

5.8 Json/fastjson/databind

Fastjson is a high-performance, full-featured JSON library written in the Java language. Fastjson uses the original algorithm to elevate the speed of parse to the extreme, surpassing all JSON libraries, including the once-touted Jackson. And it goes beyond Google's binary protocol Kyoto BUF. Supports various JDK types. Includes basic types, JavaBean, Collection, maps, enums, generics, and so on, and supports circular references. No extra jars are required to run directly on the JDK.

DataBind is a blog developed by the Chinese people.

5.9 Bson/jackson/databind

Bson is a binary encoded Json,bson4jackson integrated into Jackson to enable Jackson to support reading and writing Bson documents. Because Bson4jackson is fully integrated into Jackson, you can use the Jackson API to serialize POJOs to Bson. Bson is also the primary Data Interchange format for MongoDB databases.

Define a JavaBean object person:

public class Person {

Private String _name;

public void SetName (String name) {

_name = name;


Public String GetName () {

return _name;



The Jackson API is then used to serialize and deserialize:

Import Java.io.ByteArrayInputStream;

Import Java.io.ByteArrayOutputStream;

Import Com.fasterxml.jackson.databind.ObjectMapper;

Import De.undercouch.bson4jackson.BsonFactory;

public class Objectmappersample {

public static void Main (string] args) throws Exception {

Create Dummy POJO

Person Bob = new person ();

Bob.setname ("Bob");

Serialize data

Bytearrayoutputstream BAOs = new Bytearrayoutputstream ();

Objectmapper mapper = new Objectmapper (new Bsonfactory ());

Mapper.writevalue (BAOs, Bob);

Deserialize data

Bytearrayinputstream Bais = new Bytearrayinputstream (Baos.tobytearray ());

Person Clone_of_bob = Mapper.readvalue (Bais, Person.class);

Assert Bob.getname (). Equals (Clone_of_bob.getname ());



Reference documents:

[1] Binary JSON with Bson4jackson

[2] Official website

5.10 Xml/xstream+c

There are many serialization tools that use JSON as intermediate data, I think the one thing is that JSON is very simple to parse, on the other hand, it may be because the JSON store data is more compact, unlike XML, each element has open and closed two tags, and contains header information, tail information, too much redundant information is not suitable for network transmission, Too complex a format causes the parser to be complex and slow, which makes the JSON serialization framework so common.

XStream is a simple XML serialization framework. The sample code is as follows:

Package com.hmkcode.vo;

Import java.util.LinkedList;

Import java.util.List;

public class Aspires {

Private String title;

Private String URL;

Private Boolean published;

Private List categories;

private List tags;

Getters & Setters

}xstream xs = new XStream ();


String XML = Xs.toxml (Createarticle ());


Aspires aspires = (aspires) xs.fromxml (XML);

The resulting XML is as follows:


XStream has the following characteristics:

(1) easy to use. Provide high-level abstraction using interface, make frame easy to use;

(2) Most of the objects can be serialized without the need of mapping;

(3) High performance. High-speed low memory footprint has always been the most important design goal, so xstream can be applied to serialization of large complex objects with complex message relationships.

(4) XML clean. Serialization generated XML minimizes unnecessary information, makes people read well, and is smaller than the data generated by the Java serialization system;

(5) You do not need to modify the object. Serializes internal fields, including private and final fields. Supports non-public and internal classes, classes do not have to have default constructor methods;

(6) Full object graph support. Repeated reference relationships between objects will be preserved, supporting circular references;

(7) can be serialized and deserialized from other data structures instead of XML;

(8) Custom conversion policies. You can specify a specific type to be converted to XML;

(9) Error message. If you encounter problems parsing XML, XStream reports the details of the problem;

(10) Optional other type of output format. For example, XStream now adds support for JSON.

5.11 Avro, PROTOBUF, thrift

The most frequently mentioned serialization frameworks so far are Avro, PROTOBUF, and thrift, not only because these three frameworks are powerful, well performing, but also because these three frameworks are deployed in the most successful IT companies in the world. Avro used in HADOOP,PROTOBUF is a serialized framework used internally by Google, and Facebook uses thrift. There are also many discussions and comparisons of these systems on the Web. As the most powerful framework, these three systems are integrated using a variety of serialization and deserialization of the technology, its content is quite complex, thoroughly understand the difficult, therefore, here from the Internet to extract some ideas and understanding.

Avro and thrift are both cross-language, high-performance communication middleware based on binary. Both provide data serialization capabilities and RPC services. The overall function is similar, but philosophy is different. Thrift from Facebook for the communications between the various services in the background, the thrift design emphasizes a unified programming interface for multilingual communication frameworks. Avro from the father of Hadoop Doug Cutting, thrift is already quite popular in the Avro of the launch, the goal is not only to provide a similar thrift communication middleware is to build a new, standard cloud computing data exchange and storage Kyoto. Unlike the thrift concept, thrift that there is no perfect solution to all the problems, so try to keep a neutral framework, insert different implementations, and interact with each other. The Avro is biased towards practicality, excluding the possibility of confusion caused by multiple schemes, advocating the establishment of a unified standard and not mind the adoption of specific optimizations. The innovation of Avro is the integration of explicit, declarative schema and efficient binary data expression, emphasizing the self description of the data, overcoming the defects of the previous pure XML or binary system. Avro to Schema dynamic loading function, is Thrift programming interface does not have, conforms to the Hadoop Hive/pig and NoSQL and so on both belong to the ad hoc, but also pursues the performance application demand.

serialization system from XML to JSON, from JSON to Avro/google PBs, technology is constantly evolving, the time span of the intermediate experience is also getting shorter. This article even boils down to the history of serialization as "XML-> JSON-> Protobuf&thrift&avro", as this article says: "

XML is a standard Information Interchange format that is easy to understand, extensible, and widely supported. To implement a Web service, SOAP must always be supported.

In fact soap has some problems, by default it uses complex post commands to make calls, and SOAP encapsulation is used in request and response. Sacrificing performance for robust running and rich functionality is clumsy, especially when soap-style calls are hard to cache. As a result, while it is thought that discarding soap without using HTTP GET and post is a step backwards, in the mainstream Web services, HTTP gets and post are still used to improve performance and simplify programming, and rest protocols can also be exposed directly via HTTP.

However, people are still looking for more lightweight data formats and Web service implementations. Extra bytes require CPU and memory, network bandwidth and available storage are limited, and everyone wants to save as much as possible.

JSON can therefore be applied in scenarios where XML was previously required. With standard compression and JavaScript support, JSON is the preferred data Interchange format considering simplicity and speed. Bson is sometimes used to reduce the storage overhead of using strings directly, but because Bson has a similar problem with Kyoto buffer, it is not very popular.

Google's Kyoto buffer by compressing the integer data to save space to do better than Bson. For faster execution, Kyoto buffer replaces field names with integers, a trade-off between performance and analysis, readability, scalability, and the result that Kyoto buffer messages can only be handled by the code that specifies the build. Different versions of the message are accessed in different ways. Worse, the length prefix scheme makes particularly long message handling difficult and sometimes even sacrificing performance. Therefore, Kyoto buffer is not a particularly ideal serialization scheme.

Since Kyoto buffer is not open source, Apache Thrift is a serialization scheme developed by the experts for Google who is not in luxury. Thrift does not employ a small but very complex integer digital compression scheme, thrift implemented a complete RPC hierarchy so that it is not so lightweight, but to make the thrift more complete. Since Kyoto buffer has become an alternative to RPC, why not create a language-neutral RPC solution that replaces the RPC solution that the generation platform relies on, such as RMI,. NET remote, and Sun/onc RPC? So thrift is a complete replacement for soap. Unfortunately, trying to provide complete RPC functionality for all languages and platforms is not easy, and it's hard to keep pushing.

In most cases, the Apache Avro can be viewed as an alternative to XML. Avro uses both binary and JSON serialization, and its design is excellent and can be called the best of both worlds. Avro does not require prebuilt code to parse, and its architecture determines that even using binary serialization is possible. In addition, contrary to thrift, Avro follows the principle of "less is more". Therefore, Avro is a strong competitor in all top-level data formats. The only problem with Avro is that it's a bit more complicated and sometimes slower than using JSON directly.

Another article with similar themes:

Suppose you have some data to be stored in a file, or transmitted over the network. You will find that the technology you are using has undergone several changes:

1, use the programming language to build serialization facilities, such as Java serialization, Ruby ' s Marshal or Python's pickle, or make a own format;

2. When you realize that you are stuck with a very brain-free limit of using only one language, you seek to support a wide, language-neutral format of data, such as JSON (or you generate 1999, which is XML);

3, then, you will feel that the JSON is too bloated, parsing slow, its no point type of storage method is also very annoying, you want to use Unicode to represent the string, so you create a binary data format, like JSON, but binary;

4. Later, you find that people will cram all kinds of domains into objects, using inconsistent types, and you crave a pattern and some documentation. Perhaps you are also using a static type language and want to generate model classes from a pattern. You find that the JSON-like binary is still not compact enough because you're saving the domain name over and over again. If you have a pattern, you do not have to store the domain name, you can save more bytes.

When you reach the 4th level, your choice is usually thrift, Kyoto buffer or Avro. All three use schema definitions to provide efficient, cross-language data serialization, and Java code generation.

Original link: http://www.cnblogs.com/ahhuiyang/p/3852367.html

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.