This article address: http://www.cnblogs.com/archimedes/p/hadoop-writable-interface.html, reprint please indicate source address.
Introduction serialization and deserialization are transitions between structured objects and byte streams, primarily in the context of internal process communication and persistent storage.
Communication Format Requirements
Hadoop's internal communication between nodes uses the RPC,RPC protocol to translate the message into a binary byte stream to a remote node, and the remote node then deserializes the binary into its original message by deserializing it. The serialization of RPC requires the implementation of the following:
1. Compression, can play the effect of compression, occupy a small amount of broadband resources.
2. Fast, internal processes build high-speed links for distributed systems, so it must be fast between serialization and deserialization and not make transmission speed a bottleneck.
3. Extensible, new server adds a parameter to the new client, the old client can still be used.
4. Good compatibility, can support clients in multiple languages
Storage format Requirements
On the surface, the serialization framework may require additional features in terms of persisted storage, but in fact it is still the four point:
1. Compact, less space occupied
2. Fast, can read and write quickly
3. Extensible, old data can be read in old format
4. Good compatibility, can support read and write in multiple languages
Writable interface
The writable interface defines two methods:
One writes its state to the DataOutput binary stream, and the other reads its state from the Datainput binary stream:
Package Org.apache.hadoop.io; import java.io.*; Public Interface writable { voidthrows IOException; void throws IOException;}
Let's see how the writable interface is associated with serialization and deserialization:
PackageOrg.apache.hadoop.io;ImportJava.io.*;Importorg.apache.hadoop.util.StringUtils;ImportJunit.framework.Assert; Public classWritableexample { Public Static byte[] bytes =NULL; //serializes an object that implements the writable interface into a byte stream Public Static byte[] Serialize (writable writable)throwsIOException {Bytearrayoutputstream out=NewBytearrayoutputstream (); DataOutputStream Dataout=NewDataOutputStream (out); Writable.write (dataout); Dataout.close (); returnOut.tobytearray (); } //Converts a byte stream into an object that implements the writable interface Public Static byte[] Deserialize (writable writable,byte[] bytes)throwsIOException {Bytearrayinputstream in=Newbytearrayinputstream (bytes); DataInputStream DataIn=NewDataInputStream (in); Writable.readfields (DataIn); Datain.close (); returnbytes; } Public Static voidMain (string[] args) {//TODO auto-generated Method Stub Try{intwritable writable=NewIntwritable (123); Bytes=serialize (writable); System.out.println ("After Serialize" +bytes); Assert.assertequals (Bytes.length,4); Assert.assertequals (stringutils.bytetohexstring (bytes),"0000007b"); Intwritable newwritable=Newintwritable (); Deserialize (newwritable, bytes); System.out.println ("After deserialize" +bytes); Assert.assertequals (Newwritable.get (),123); } Catch(IOException ex) {} }}
Writablecomparable and Comparators
Intwritable implements the Writablecomparable,writablecomparable is a sub-interface of the writable interface and java.lang.comparable<t>.
Package Org.apache.hadoop.io; Public Interface Writablecomparable <T> extends org.apache.hadoop.io.Writable, java.lang.comparable<t> {}
MapReduce is sorted according to the size of the key value, so the comparison of types is very important, Rawcomparator is the enhanced version of comparator
Package Org.apache.hadoop.io; Public Interface Rawcomparator <T> extends java.util.comparator<t> { int Compare ( byte int int byte int int i3);}
It can be done without first deserializing to directly compare the size of the binary byte stream:
PackageOrg.apache.hadoop.io;ImportJava.io.*;Importorg.apache.hadoop.util.StringUtils;ImportJunit.framework.Assert; Public classComparatorexample { Public Static byte[] Serialize (writable writable)throwsIOException {Bytearrayoutputstream out=NewBytearrayoutputstream (); DataOutputStream Dataout=NewDataOutputStream (out); Writable.write (dataout); Dataout.close (); returnOut.tobytearray (); } Public Static voidMain (string[] args) {//TODO auto-generated Method StubRawcomparator<intwritable>Comparator; Intwritable W1, W2; Comparator= Writablecomparator.get (intwritable.class); W1=NewIntwritable (123); W2=NewIntwritable (32); if(Comparator.compare (W1, W2) <= 0) System.exit (0); Try { byte[] B1 =Serialize (W1); byte[] B2 =Serialize (W2); if(Comparator.compare (b1, 0, B1.length, B2, 0, B2.length) <= 0) {System.exit (0); } } Catch(IOException ex) {} }}
Resources
The Hadoop authoritative guide
Serialization and writable interfaces in Hadoop