Serialization and writable interfaces in Hadoop

Last Update:2015-03-11 Source: Internet

Author: User

Tags comparable

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article address: http://www.cnblogs.com/archimedes/p/hadoop-writable-interface.html, reprint please indicate source address.

Introduction serialization and deserialization are transitions between structured objects and byte streams, primarily in the context of internal process communication and persistent storage.

Communication Format Requirements

Hadoop's internal communication between nodes uses the RPC,RPC protocol to translate the message into a binary byte stream to a remote node, and the remote node then deserializes the binary into its original message by deserializing it. The serialization of RPC requires the implementation of the following:

1. Compression, can play the effect of compression, occupy a small amount of broadband resources.

2. Fast, internal processes build high-speed links for distributed systems, so it must be fast between serialization and deserialization and not make transmission speed a bottleneck.

3. Extensible, new server adds a parameter to the new client, the old client can still be used.

4. Good compatibility, can support clients in multiple languages

Storage format Requirements

On the surface, the serialization framework may require additional features in terms of persisted storage, but in fact it is still the four point:

1. Compact, less space occupied

2. Fast, can read and write quickly

3. Extensible, old data can be read in old format

4. Good compatibility, can support read and write in multiple languages

Writable interface

The writable interface defines two methods:

One writes its state to the DataOutput binary stream, and the other reads its state from the Datainput binary stream:

 Package Org.apache.hadoop.io; import java.io.*;  Public Interface writable {    voidthrows  IOException;     void throws IOException;}

Let's see how the writable interface is associated with serialization and deserialization:

 PackageOrg.apache.hadoop.io;ImportJava.io.*;Importorg.apache.hadoop.util.StringUtils;ImportJunit.framework.Assert; Public classWritableexample { Public Static byte[] bytes =NULL; //serializes an object that implements the writable interface into a byte stream     Public Static byte[] Serialize (writable writable)throwsIOException {Bytearrayoutputstream out=NewBytearrayoutputstream (); DataOutputStream Dataout=NewDataOutputStream (out);        Writable.write (dataout);        Dataout.close (); returnOut.tobytearray (); }        //Converts a byte stream into an object that implements the writable interface     Public Static byte[] Deserialize (writable writable,byte[] bytes)throwsIOException {Bytearrayinputstream in=Newbytearrayinputstream (bytes); DataInputStream DataIn=NewDataInputStream (in);        Writable.readfields (DataIn);        Datain.close (); returnbytes; }         Public Static voidMain (string[] args) {//TODO auto-generated Method Stub        Try{intwritable writable=NewIntwritable (123); Bytes=serialize (writable); System.out.println ("After Serialize" +bytes); Assert.assertequals (Bytes.length,4); Assert.assertequals (stringutils.bytetohexstring (bytes),"0000007b"); Intwritable newwritable=Newintwritable ();              Deserialize (newwritable, bytes); System.out.println ("After deserialize" +bytes); Assert.assertequals (Newwritable.get (),123); } Catch(IOException ex) {} }}

Writablecomparable and Comparators

Intwritable implements the Writablecomparable,writablecomparable is a sub-interface of the writable interface and java.lang.comparable<t>.

 Package Org.apache.hadoop.io;  Public Interface Writablecomparable <T>  extends org.apache.hadoop.io.Writable, java.lang.comparable<t>  {}

MapReduce is sorted according to the size of the key value, so the comparison of types is very important, Rawcomparator is the enhanced version of comparator

 Package Org.apache.hadoop.io;  Public Interface Rawcomparator <T>  extends java.util.comparator<t> {    int Compare ( byte int int byte int int i3);}

It can be done without first deserializing to directly compare the size of the binary byte stream:

 PackageOrg.apache.hadoop.io;ImportJava.io.*;Importorg.apache.hadoop.util.StringUtils;ImportJunit.framework.Assert; Public classComparatorexample { Public Static byte[] Serialize (writable writable)throwsIOException {Bytearrayoutputstream out=NewBytearrayoutputstream (); DataOutputStream Dataout=NewDataOutputStream (out);        Writable.write (dataout);        Dataout.close (); returnOut.tobytearray (); }         Public Static voidMain (string[] args) {//TODO auto-generated Method StubRawcomparator<intwritable>Comparator;        Intwritable W1, W2; Comparator= Writablecomparator.get (intwritable.class); W1=NewIntwritable (123); W2=NewIntwritable (32); if(Comparator.compare (W1, W2) <= 0) System.exit (0); Try {            byte[] B1 =Serialize (W1); byte[] B2 =Serialize (W2); if(Comparator.compare (b1, 0, B1.length, B2, 0, B2.length) <= 0) {System.exit (0); }        } Catch(IOException ex) {} }}

Resources

The Hadoop authoritative guide

Serialization and writable interfaces in Hadoop

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More