Reprinted please indicate Source Address: http://blog.csdn.net/lastsweetop/article/details/9193907
All source code on GitHub, https://github.com/lastsweetop/styhadoop
Serialization and deserialization are conversions between structured objects and throttling. They are mainly used for communication and persistent storage of internal processes.
The communication format requires that hadoop uses RPC for internal communication between nodes. RPC translates messages into binary byte streams and sends them to remote nodes, the remote node then forwards the binary data to the original information through deserialization. RPC serialization requires implementation of the following: 1. compression, which can be used to compress the data, requiring less bandwidth resources. 2. Fast. internal processes build high-speed links for distributed systems. Therefore, the serialization and deserialization must be fast, so the transmission speed cannot be a bottleneck. 3. scalable. The new Server adds a parameter to the new client, which can be used by the old client. 4. Good compatibility and support for clients in multiple languages
Storage format requirements on the surface it seems that the serialization framework may need some other features in persistent storage, but in fact it is still the four points: 1. compression, less space occupied 2. fast, fast read/write 3. scalable. You can read old data in the old format. good compatibility. It supports reading and writing hadoop's serialization format in multiple languages. hadoop's own serialization storage format implements the writable interface class. It only implements the first two points, which are compressed and fast. But it is not easy to expand or cross-language. Let's take a look at the writable interface. The writable interface defines two methods: 1. Write Data to the binary stream. 2. read data from the binary data stream.
package org.apache.hadoop.io;public interface Writable { void write(java.io.DataOutput p1) throws java.io.IOException; void readFields(java.io.DataInput p1) throws java.io.IOException;}
Let's take a look at how the writable interface is associated with serialization and deserialization:
Package COM. sweetop. styhadoop; import JUnit. framework. assert; import Org. apache. hadoop. io. intwritable; import Org. apache. hadoop. io. writable; import Org. apache. hadoop. util. stringutils; import Org. JUnit. before; import Org. JUnit. test; import Java. io. *;/*** created with intellij idea. * User: lastsweetop * Date: 13-7-4 * Time: * to change this template use file | Settings | file templates. */public class testwritable {byte [] bytes = NULL;/*** Initialize an intwritable instance and call the serialization Method * @ throws ioexception */@ before public void Init () throws ioexception {intwritable writable = new intwritable (163); bytes = serialize (writable );} /*** byte stream of four bytes after an intwritable serial number * and use the queue arrangement of big-Endian * @ throws ioexception */@ test public void testserialize () throws ioexception {assert. assertequals (bytes. length, 4); assert. assertequals (stringutils. bytetohexstring (bytes), "000000a3");}/*** creates an intwritable object with no value, and read bytes data into it by calling the deserialization Method * get the original value by calling its get method, 163 */@ test public void testdeserialize () throws ioexception {intwritable newwritable = new intwritable (); deserialize (newwritable, bytes); assert. assertequals (newwritable. get (), 163 );} /*** serialize an object that implements the writable interface into a word throttling * @ Param writable * @ return * @ throws ioexception */public static byte [] serialize (writable) throws ioexception {bytearrayoutputstream out = new bytearrayoutputstream (); dataoutputstream dataout = new dataoutputstream (out); writable. write (dataout); dataout. close (); Return out. tobytearray ();} /*** convert bytes into objects that implement the writable interface * @ Param writable * @ Param bytes * @ return * @ throws ioexception */public static byte [] deserialize (writable, byte [] bytes) throws ioexception {bytearrayinputstream in = new bytearrayinputstream (bytes); datainputstream dataIn = new datainputstream (in); writable. readfields (dataIn); dataIn. close (); Return bytes ;}}
Writablecomparable and comparators
Intwritable implements writablecomparable. You can see the source code of the interface. writablecomparable is a subinterface of the writable interface and Java. Lang. Comparable <t>.
package org.apache.hadoop.io;public interface WritableComparable <T> extends org.apache.hadoop.io.Writable, java.lang.Comparable<T> {}
Mapreduce sorts key values in the sorting part, so the type is quite important. rawcomparator is the enhanced version of comparator.
package org.apache.hadoop.io;public interface RawComparator <T> extends java.util.Comparator<T> { int compare(byte[] bytes, int i, int i1, byte[] bytes1, int i2, int i3);}
It can do this by directly comparing the size of the binary byte stream without deserialization:
Package COM. sweetop. styhadoop; import Org. apache. hadoop. io. intwritable; import Org. apache. hadoop. io. rawcomparator; import Org. apache. hadoop. io. writable; import Org. apache. hadoop. io. writablecomparator; import Org. eclipse. jdt. internal. core. assert; import Org. JUnit. before; import Org. JUnit. test; import Java. io. bytearrayoutputstream; import Java. io. dataoutputstream; import Java. io. ioexception;/*** created with intellij idea. * User: lastsweetop * Date: 13-7-5 * Time: * to change this template use file | Settings | file templates. */public class testcomparator {rawcomparator <intwritable> comparator; intwritable W1; intwritable W2;/*** get the comparator of intwritable and initialize two intwritable */@ before public void Init () {comparator = writablecomparator. get (intwritable. class); W1 = new intwritable (163); W2 = new intwritable (76);}/*** compare the size of two objects */@ test public void testcomparator () {assert. istrue (comparator. compare (W1, W2)> 0);}/*** directly compare the serial number * @ throws ioexception */@ test public void testcompare () throws ioexception {byte [] b1 = serialize (W1); byte [] b2 = serialize (W2); assert. istrue (comparator. compare (B1, 0, b1.length, B2, 0, b2.length)> 0 );} /*** serialize an object that implements the writable interface into word throttling ** @ Param writable * @ return * @ throws Java. io. ioexception */public static byte [] serialize (writable) throws ioexception {bytearrayoutputstream out = new bytearrayoutputstream (); dataoutputstream dataout = new dataoutputstream (out); writable. write (dataout); dataout. close (); Return out. tobytearray ();}}