Research on the serialization of Hadoop

Source: Internet
Author: User
Tags serialization

Hadoop differs from the Java-brought serialization mechanism by providing a set of serialization system interfaces and classes.

For basic data types, the writable interface represents the data that can be serialized, and this interface defines 2 methods, where the Write method can serialize the data to the DataOutput byte array given by the parameter. The Readfield method can read the serialized byte array from the Datinput and deserialize it into Hadoop data:

Public interface Writable { 
  /**  
   * Serialize The fields of this object to <code>out</code>. 
   *  
   * @param out <code>DataOuput</code> to serialize this object into. 
   * @throws IOException 
   *
  /Void Write (DataOutput out) throws IOException; 
     
  /**  
   * Deserialize The fields of this object from <code>in</code>.   
   *  
   * <p>for efficiency, implementations should attempt to re-use storage in the  
   * existing object where POS sible.</p> 
   *  
   * @param in <code>DataInput</code> to deseriablize this object from. 
   * @throws IOException 
   *
  /void ReadFields (Datainput in) throws IOException; 
}

But in Hadoop, the serialization process is generally used for map-reduce, and we don't see the intermediate artifacts of serialization. To capture the serialized trajectory, we wrote a tool method to serialize it into a byte array so that we could print the contents of the byte array and get the serialized product:

* */package com.charles.hadoop.serial; 
Import Java.io.ByteArrayOutputStream; 
Import Java.io.DataOutputStream; 
     
Import java.io.IOException; 
     
Import org.apache.hadoop.io.Writable; /** * * Description: This class provides a tool method to record the serialized trajectory * because, in Hadoop, serialization and deserialization are done in the writable interface, writable is the serialized Hadoop object * So we put the serialized The product is stored in a byte array to capture the content * * @author Charles.wang * @created June 2, 9:32:41 AM * */public class Hadoopserializa 
    Tionutil {//This method serializes the object of Hadoop (writable, which is serializable) into a byte array,//then returns the contents of the byte array to/from the parameter, the serialized numeric object 
         Return value: Serialized byte array public static byte[] Serialize (writable writable) throws IOException {//Create a byte array 
         Bytearrayoutputstream out = new Bytearrayoutputstream (); 
         Creates a dataoutputstream and wraps an array of bytes to store the serialized byte stream DataOutputStream dataout = new DataOutputStream (out); 
         Let the Hadoop object serialize to the byte array corresponding to the stream of bytes writable.write (dataout); 
         Dataout.close (); Returns the serialized byte stream
         return Out.tobytearray (); } 
         
      
     
}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.