Serialization and writable interfaces in Hadoop

Source: Internet
Author: User
Tags comparable

This article address: http://www.cnblogs.com/archimedes/p/hadoop-writable-interface.html, reprint please indicate source address.

Introduction serialization and deserialization are transitions between structured objects and byte streams, primarily in the context of internal process communication and persistent storage.

Communication Format Requirements

Hadoop's internal communication between nodes uses the RPC,RPC protocol to translate the message into a binary byte stream to a remote node, and the remote node then deserializes the binary into its original message by deserializing it. The serialization of RPC requires the implementation of the following:

1. Compression, can play the effect of compression, occupy a small amount of broadband resources.

2. Fast, internal processes build high-speed links for distributed systems, so it must be fast between serialization and deserialization and not make transmission speed a bottleneck.

3. Extensible, new server adds a parameter to the new client, the old client can still be used.

4. Good compatibility, can support clients in multiple languages

Storage format Requirements

On the surface, the serialization framework may require additional features in terms of persisted storage, but in fact it is still the four point:

1. Compact, less space occupied

2. Fast, can read and write quickly

3. Extensible, old data can be read in old format

4. Good compatibility, can support read and write in multiple languages

Writable interface

The writable interface defines two methods:

One writes its state to the DataOutput binary stream, and the other reads its state from the Datainput binary stream:

 Package Org.apache.hadoop.io; import java.io.*;  Public Interface writable {    voidthrows  IOException;     void throws IOException;}

Let's see how the writable interface is associated with serialization and deserialization:

 PackageOrg.apache.hadoop.io;ImportJava.io.*;Importorg.apache.hadoop.util.StringUtils;ImportJunit.framework.Assert; Public classWritableexample { Public Static byte[] bytes =NULL; //serializes an object that implements the writable interface into a byte stream     Public Static byte[] Serialize (writable writable)throwsIOException {Bytearrayoutputstream out=NewBytearrayoutputstream (); DataOutputStream Dataout=NewDataOutputStream (out);        Writable.write (dataout);        Dataout.close (); returnOut.tobytearray (); }        //Converts a byte stream into an object that implements the writable interface     Public Static byte[] Deserialize (writable writable,byte[] bytes)throwsIOException {Bytearrayinputstream in=Newbytearrayinputstream (bytes); DataInputStream DataIn=NewDataInputStream (in);        Writable.readfields (DataIn);        Datain.close (); returnbytes; }         Public Static voidMain (string[] args) {//TODO auto-generated Method Stub        Try{intwritable writable=NewIntwritable (123); Bytes=serialize (writable); System.out.println ("After Serialize" +bytes); Assert.assertequals (Bytes.length,4); Assert.assertequals (stringutils.bytetohexstring (bytes),"0000007b"); Intwritable newwritable=Newintwritable ();              Deserialize (newwritable, bytes); System.out.println ("After deserialize" +bytes); Assert.assertequals (Newwritable.get (),123); } Catch(IOException ex) {} }}
Writablecomparable and Comparators

Intwritable implements the Writablecomparable,writablecomparable is a sub-interface of the writable interface and java.lang.comparable<t>.

 Package Org.apache.hadoop.io;  Public Interface Writablecomparable <T>  extends org.apache.hadoop.io.Writable, java.lang.comparable<t>  {}

MapReduce is sorted according to the size of the key value, so the comparison of types is very important, Rawcomparator is the enhanced version of comparator

 Package Org.apache.hadoop.io;  Public Interface Rawcomparator <T>  extends java.util.comparator<t> {    int Compare ( byte int int byte int int i3);}

It can be done without first deserializing to directly compare the size of the binary byte stream:

 PackageOrg.apache.hadoop.io;ImportJava.io.*;Importorg.apache.hadoop.util.StringUtils;ImportJunit.framework.Assert; Public classComparatorexample { Public Static byte[] Serialize (writable writable)throwsIOException {Bytearrayoutputstream out=NewBytearrayoutputstream (); DataOutputStream Dataout=NewDataOutputStream (out);        Writable.write (dataout);        Dataout.close (); returnOut.tobytearray (); }         Public Static voidMain (string[] args) {//TODO auto-generated Method StubRawcomparator<intwritable>Comparator;        Intwritable W1, W2; Comparator= Writablecomparator.get (intwritable.class); W1=NewIntwritable (123); W2=NewIntwritable (32); if(Comparator.compare (W1, W2) <= 0) System.exit (0); Try {            byte[] B1 =Serialize (W1); byte[] B2 =Serialize (W2); if(Comparator.compare (b1, 0, B1.length, B2, 0, B2.length) <= 0) {System.exit (0); }        } Catch(IOException ex) {} }}
Resources

The Hadoop authoritative guide

Serialization and writable interfaces in Hadoop

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.