Hadoop serialization vs. Java serialization

Source: Internet
Author: User

Serialization is the conversion of the state information of an in-memory object into a sequence of bytes for storage (persistence) and network transport

Deserialization is the conversion of a received sequence of bytes or persistent data from a hard disk into an in-memory object.

Serialization of 1.JDK

As long as the implementation of the serializable interface can be serialized and deserialized, it is important to add the serialized version ID Serialversionuid, which is used to identify the serialization of the class before the end of which. For example, if you want different versions of a class to be compatible with serialization, you need to ensure that different versions of the class have the same serialversionuid;

The Java serialization algorithm needs to consider:

Outputs the class metadata related to the object instance.

Recursively outputs a superclass description of a class until there are no more super-classes.

After the class metadata is finished, start outputting the actual data value of the object instance from the topmost superclass

Recursive output of data from top to bottom instances

So Java serialization is very powerful, the serialization of the information is very detailed, but the serialization of memory.

2.Hadoop serialization

Compared to the JDK relatively concise, in the urgent mass of information transmission is mainly by these serialized byte building to pass, so faster speed, smaller capacity.

Features of Hadoop serialization:

1. Compact: Bandwidth is the most valuable resource for information transmission in a cluster so we have to try to reduce the size of the message.

Java serialization is not flexible enough, in order to better control the entire process of serialization, so use writable

Java serialization preserves all information dependencies for a class, and Hadoop serialization does not require

2. Object reuse: The deserialization of the JDK will create the object continuously, which will certainly incur some overhead, but in the deserialization of Hadoop, the Readfield method of an object can be reused to recreate different objects.

The Java serialization will recreate the object each time it is serialized and memory consumption is large. Writable can be reused.

3. Extensibility

Hadoop writes its own serialization easily, using the writable interface to implement Hadoop to achieve a direct comparison of character streams to determine the size of two writable objects.

While Java is not, the serialization mechanism of Java saves each class's information for the first occurrence of the object, such as the class name, the second occurrence of the class object will have a class of reference, resulting in wasted space

Frameworks such as protocol Buffers,avro can be used with an open source serialization framework

Hadoop native serialization classes need to implement an interface called writeable, similar to the Serializable interface

Implementing the writable interface must implement two methods: Write (DataOutputStream out), Readfield (DataInputStream in) method.

Yarn serialization is a serialized framework developed with Google protocol Buffers,proto currently supports three languages C++,java,python so RPC this layer we can use other languages to make a fuss

Apache's thrift and Google's protocol buffer are also popular serialization frameworks, but use in Hadoop is limited and only used for RPC and data interaction

Hadoop serialization vs. Java serialization

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.