Typical writable-class explanations in Hadoop

Source: Internet
Author: User
Tags comparable object serialization

This article address: http://www.cnblogs.com/archimedes/p/hadoop-writable.html, reprint please indicate source address.

Hadoop has a lot of writable classes in the Org.apache.hadoop.io package, in which the more important are Java basic classes, Text, writable collections, objectwritable, etc. The implementation of Java basic classes and objectwritable is highlighted.

1. Java basic type of writable package

The current Java base type corresponds to the writable package as shown in the following table. All of these writable classes inherit from Writablecomparable. In other words, they are comparable. At the same time, they all have the get () and set () methods, which are used to obtain and set the encapsulated value.

Writable package for Java basic types

Td>1
Java Basic type writable serialization after length
Boolean ( Boolean) booleanwritable 1
byte type (byte) bytewritable
int (int)

intwritable

vintwritable

4

-- td>

floating-point (float) floatwritable 4
Long Integer (long)

longwritable

vlongwritable

8

1~9

Double-precision floating-point type (double) doublewritable 8

In the table, when integers (int and long) are encoded, there are two choices for fixed-length formats (intwritable and longwritable) and variable-length formats (vintwritable and vlongwritable). Fixed-length format integer, the serialized data is fixed-length, and the variable-length format uses a more flexible encoding, for the smaller number of integers, they tend to save space . At the same time, because the encoding rules of vintwritable and vlongwritable are the same, the output of vintwritable can be read in vlongwritable. The following is an example of vintwritable, which illustrates the Java Basic Class encapsulation implementation of writable. The code is as follows:

 Public classVintwritableImplementswritablecomparable {Private intvalue; ...//set the value of the vintwritable    Public voidSetintValue) { This. Value =value;} //gets the value of the vintwritable    Public intGet () {returnvalue;}  Public voidReadFields (Datainput in)throwsIOException {Value=Writableutils.readvint (in); }    Public voidWrite (DataOutput out)throwsIOException {writableutils.writevint (out, value); } ...}

First, the writable encapsulation of each Java base type contains a member variable Value,get () and a set () method that corresponds to the base type that is used to perform a value/assignment operation on the variable. While the writable interface requires the ReadFields () and write () methods, Vintwritable reads/writes data by invoking the Readvint () and Writevint () provided in the Writable tool class. The implementations of Readvint () and Writevint () simply call Readvlong () and Writevlong (), so the data written by Writevint () can naturally be read through Readvlong ().

The Writevlong () method implements the variable-length encoding of integer values, which is encoded as follows:

If the input integer is greater than or equal to –112 at the same time less than or equal to 127, then the encoding requires 1 bytes; otherwise, the first byte of the serialized result, the symbol that holds the input integer and the number of subsequent encoded bytes. The number of symbols and subsequent bytes is based on the following encoding rules (another rule):

If it is a positive number, the coded value range falls between –113 and – 120 (closed interval), and subsequent bytes can be calculated by-(v+112).

If it is a negative number, the coded value range falls between –121 and – 128 (closed interval), and subsequent bytes can be calculated by-(v+120).

The subsequent encoding will be high before writing the input integer (minus the front full 0 bytes). The code is as follows:

 Public Final classWritableutils { PublicStati cvoid Writevint (DataOutput stream,intIthrowsIOException {Writevlong (stream, I); }   /**    * @paramStream Save serialized result output stream *@paramI serialized integer *@throwsjava.io.IOException*/    Public Static voidWritevlong (DataOutput Stream,LongIthrows... {      //integer in [-112, 127]      if(I >= -112 && i <= 127) {Stream.writebyte (byte) (i); return; }      //calculate the first byte of Case 2      intLen =-112; if(I < 0) {i^= -1l; Len=-120; }      LongTMP =i;  while(tmp! = 0) {tmp= tmp >> 8; Len--; } stream.writebyte ((byte) len); Len= (Len <-120)? -(len + +):-(len + 112); //output subsequent bytes       for(intidx = len; IDX! = 0; idx--) {         intShiftbits = (idx-1) * 8; LongMask = 0xFFL <<shiftbits; Stream.writebyte ((byte) ((I & Mask) >>shiftbits)); }   }}
2. Implementation of the Objectwritable class

For Java Primitives, strings, enumerations, writable, null values, and other subclasses of writable, Objectwritable provides a wrapper for fields that need to use more than one type. objectwritable can be applied to the serialization and deserialization of parameters in Hadoop remote procedure calls ; another typical application of objectwritable is to serialize objects of different types to a field , If you save objects of different types, such as longwritable values or text values, in a sequencefile value, you can declare the value as objectwritable.

Objectwritable implementations are verbose and require different processing depending on the various objects that may be encapsulated in the objectwritable. Objectwritable has three member variables, including the encapsulated object instance instance, the class object of the object runtime class, and the configuration object.

The Write method of objectwritable calls the static method Objectwritable.writeobject (), which can write various Java objects to the DataOutput interface.

The WriteObject () method first outputs the class name of the object (obtained through the GetName () method of the object's class object), and then serializes the object into the output stream according to the type of the object passed in, that is, the object outputs the class name of the object through the method. The object serializes the result pair into the output stream. In the logic of Objectwritable.writeobject (), there are 6 cases where null, Java arrays, string strings, Java primitives, enumerations, and writable subclasses need to be handled separately, due to the inheritance of classes, and the processing of writable. The result of serialization consists of the object class name, the actual class name of the object, and the object serialization result in three parts.

Why do you need an object's actual class name? According to a single root inheritance rule in Java, the Declaredclass passed in objectwritable can be either a class object of the class that passed in the instance object, or a class object of the parent class of the instance object. However, when serializing and deserializing, it is often not possible to serialize a subclass object using the parent class's serialization method, such as the Write method , so the actual class name of the object must be remembered in the serialization result. The relevant code is as follows:

 Public classObjectwritableImplementswritable, configurable {PrivateClass Declaredclass;//class object corresponding to the object stored in the objectwritable   PrivateObject instance;//objects that are reserved   PrivateConfiguration conf;  Publicobjectwritable () {} Publicobjectwritable (Object instance) {Set (instance); }    Publicobjectwritable (Class Declaredclass, Object instance) { This. Declaredclass =Declaredclass;  This. Instance =instance; } ... Public voidReadFields (Datainput in)throwsIOException {readobject (in, This, This. conf); }    Public voidWrite (DataOutput out)throwsIOException {writeobject (out, instance, Declaredclass, conf); } ... Public Static voidWriteObject (dataoutput out, Object instance, Class declaredclass,configuration conf)throws... {      if(Instance = =NULL) {//EmptyInstance =Newnullinstance (Declaredclass, conf); Declaredclass= writable.class; }      //write the canonical name of the Declaredclassutf8.writestring (out, Declaredclass.getname ()); if(Declaredclass.isarray ()) {//Array...} Else if(Declaredclass = = String.class) {//string...} Else if(Declaredclass.isprimitive ()) {//Basic Type         if(Declaredclass = = Boolean.type) {//BooleanOut.writeboolean ((Boolean) instance). Booleanvalue ()); } Else if(Declaredclass = = Character.type) {//Char...} } Else if(Declaredclass.isenum ()) {//Enum Type...} Else if(Writable.class. IsAssignableFrom (Declaredclass)) {         //Subclass of Writableutf8.writestring (out, Instance.getclass (). GetName ());      ((writable) instance). Write (out); } Else{...}  Public StaticObject ReadObject (datainput in, objectwritable objectwritable, Configuration conf) {... Class Instanceclass=NULL;      ...... Writable writable=writablefactories.newinstance (Instanceclass, conf);      Writable.readfields (in); Instance=writable; ...}

And the output corresponds, the objectwritable readfields () method calls the static method Objectwritable.readobject (), and the implementation of the method is similar to WriteObject (), The only thing worth investigating is the Writable Object Processing section, which relies on the Writablefactories class for the ReadObject () method. The Writablefactories class allows non-public writable subclasses to define an object factory that creates writable objects, such as in the ReadObject () code above, By Writablefactories static method Newinstance (), you can create a writable child object of type Instanceclass. The relevant code is as follows:

 Public classwritablefactories {//preserves the correspondence between types and writablefactory factories   Private Static FinalHashmap<class, writablefactory>class_to_factory=NewHashmap<class, writablefactory>(); ... Public StaticWritable newinstance (class<?extendsWritable>C, Configuration conf) {Writablefactory Factory=Writablefactories.getfactory (c); if(Factory! =NULL) {writable result=factory.newinstance (); if(Resultinstanceofconfigurable)         {(configurable) result). setconf (conf); }         returnresult; } Else {         //Use the traditional reflection tool reflectionutils to create objects         returnReflectionutils.newinstance (c, Conf); }   }}

The Writablefactories.newinstance () method finds the corresponding Writablefactory factory object based on the type of input, and then calls the object's newinstance () to create the object, if the object is configurable, Newinstance () also configures objects through the object's Setconf () method.

The

Writablefactories provides a registration mechanism that enables these writable subclasses to enlist the factory to writablefactories static member variable class_to_factory. The following is a typical writablefactory factory implementation, which is a block of data blocks from HDFs. where Writablefactories.setfactory () requires two parameters, the class object corresponding to the registration class and the implementation of the Writablefactory interface that can construct the registration class, in the following code, The implementation of Writablefactory is an anonymous class whose newinstance () method creates a new block object.

 public  class  Block implements  writable, Comparable<block> { static   {writablefactories.setfactory (Block.  class ,  new  writablefactory () {// 
     
     public  writable newinstance () {
     return  
     new  
      Block ();   }); } ...}  
    

Objectwritable, as a generic mechanism, is quite wasteful of resources, and it needs to write the name of the package type for each output. If the number of types is not many and can be known in advance, you can use a static type array to improve efficiency and use the array index as the serialization reference for the type. Genericwritable was introduced into the Org.apache.hadoop.io package for this purpose.

Typical writable-class explanations in Hadoop

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.