Java serialization and deserialization and talking about the serialization of Hadoop

Source: Internet
Author: User

1. What is serialization and deserializationThe Divine Horse is serialized, and serialization is the transformation of the state information of an object in memory into a sequence of bytes to facilitate storage (persistence) and network transmission. (Network transmission and hard drive persistence, you do not have a means to identify what these byte sequences are, what information, these byte sequences are garbage).
Deserialization is the conversion of persisted data that receives a sequence of bytes or hard disks into an in-memory object . 2. Serialization of JDKthe serialization of the JDK can be serialized and deserialized only if the serializable interface is implemented, but be sure to add the serialized version ID serialversionuidwhich is the first class that recognizes serialization? We show this serialized version ID to:

1) In some cases, you want the different versions of the class to be serializable compatible, so you need to ensure that the different versions of the class have the same serialversionuid;

2) In some cases, you do not want different versions of the class to be serializable compatible, so you need to ensure that different versions of the class have different serialversionuid.

The Java serialization algorithm takes into account these things:

Outputs the class metadata related to the object instance.

Recursively outputs a superclass description of a class until there are no more super-classes.

After the class metadata is finished, start outputting the actual data values of the object instances from the topmost superclass.

Recursive output of data from top to bottom instances

So the serialization of Java is really very powerful, the information obtained after serialization is very detailed, so the deserialization is easy.but this also has its disadvantage, after the serialization of memory, so not necessarily detailed is the benefit, simple is sometimes good. in Hadoop, Hadoop implements a set of its own serialization framework, and the serialization of Hadoop is relatively straightforward for JDK serialization. The transfer of information in a cluster is largely the result of these serialized byte sequences, so faster and smaller capacity becomes very important. say too much nonsense, or pull back the JDK serialization. Let's take a look at how the JDK Chinese is serialized.
first we have a class that needs to be serialized as follows (the serializable interface must be implemented)
Import Java.io.serializable;public class Block implements serializable{/** *  */private static final long Serialversionuid = 1l;private int id;private String name;public int getId () {return ID;} public void setId (int id) {this.id = ID;} Public String GetName () {return name;} public void SetName (String name) {this.name = name;} public Block (int ID, String name) {this.id = Id;this.name = name;}}

Let's test the results of the serialization:
Import Java.io.bytearrayinputstream;import Java.io.bytearrayoutputstream;import Java.io.fileinputstream;import Java.io.fileoutputstream;import Java.io.ioexception;import Java.io.objectinputstream;import Java.io.objectoutputstream;public class Testserializable {public static void main (string[] args) throws IOException, ClassNotFoundException {//writes serialized data to file out (persisted) fileoutputstream fos = new FileOutputStream ("./out"); O Bjectoutputstream Oos = new ObjectOutputStream (FOS), for (int i = 0; i <; i++) {Block b = new block (i, "B" +i); OOS.WR Iteobject (b);} Oos.flush (); Oos.close ();//read out a sequence of bytes of a serialized object (^.. ^) is the deserialization of fileinputstream FIS = new FileInputStream ("./out"), objectinputstream ois = new ObjectInputStream (FIS); Block B2 = (block) ois.readobject (); Ois.close (); System.out.println (B2.getname ());}}

Result of the test: (Take out the name of the first object)
B0
The size of the persisted data for generating 100 Objects is: 1.60 KB (1,643 bytes) An object with an average of 16 bytes, the class has only two fields an int, a string but the length of the string is 2, so we can feel this redundancy is still quite large.
3. The serialization of HadoopThe features of Hadoop serialization are:
1, Compact: Because the bandwidth is the most valuable resource in the information transmission in the cluster so we have to try to reduce the size of the message, Hadoop serialization is designed to better sit on this point.
2. Reusable objects: The deserialization of the JDK will create the object continuously, which will certainly incur some overhead, but in the deserialization of Hadoop, the Readfield method of an object can be reused to recreate different objects.
3. Scalability: There are many options for the current Hadoop serialization* The writable interface for Hadoop can be leveraged. * Use the Open Source Serialization framework protocol Buffers,avro and other frameworks. What we can notice is the HADOOP2. After X is the implementation of a cloud operating system called yarn, all applications (such as MapReduce, or other spark real-time or offline computing framework can be run on yarn), yarn is also responsible for resource scheduling and so on. yarn Serialization is a serialized framework developed with Google protocol Buffers,proto currently supports three languages C++,java,python so RPC this layer we can use other languages to make a fuss, Meet the needs of other language developers. i che 艸 芔 茻, pulled a bit far. back to the native serialization of Hadoop, Hadoop native serialization classes need to implement an interface called writeable, similar to the Serializable interface. and Hadoop also provides us with several serialization classes, both directly and indirectly implementing the writable interface. such as: Intwritable,longwritable,text and so on. implementing the writable interface must implement two methods: Write (DataOutputStream out), Readfield (DataInputStream in) method. Here is an example of a Hadoop serialization:
Package Hadoop;import Java.io.bytearrayoutputstream;import Java.io.datainput;import java.io.datainputstream;import Java.io.dataoutput;import Java.io.dataoutputstream;import Java.io.fileinputstream;import Java.io.FileOutputStream ; Import Java.io.ioexception;import Org.apache.hadoop.io.text;import Org.apache.hadoop.io.writable;import Org.junit.test;public class Testhadoop_serializable_writable {@Testpublic void Serializable () throws IOException { Bytearrayoutputstream out = new Bytearrayoutputstream ();D ataoutputstream dataout = new DataOutputStream (out); FileOutputStream fos = new FileOutputStream ("./hadoop_out"); for (int i = 0; i <; i++) {text T1 = new Text (string.val Ueof (i)); Text t2 = new text ("MW"); mywritable MW = new mywritable (T1,T2); Mw.write (dataout);} Dataout.close (); Fos.write (Out.tobytearray ()); Fos.flush (); Fos.close ();  FileInputStream fis = new FileInputStream ("./hadoop_out");D atainputstream dis = new DataInputStream (FIS); for (int i = 0; i < 10; i++) {mywritable MW = new MYWRItable (new text (), new text ()); Mw.readfields (DIS); System.out.println (Mw.getid () + "" + mw.getname ());}}} Class Mywritable implements writable {private text id;private text name;public mywritable (text ID, text name) {super (); thi s.id = Id;this.name = name;} Public synchronized Text GetId () {return ID;} Public synchronized void SetId (Text id) {this.id = ID;} Public synchronized Text GetName () {return name;} Public synchronized void SetName (Text name) {this.name = name;} @Overridepublic void Write (DataOutput out), throws IOException {Id.write (out), Name.write (out);} @Overridepublic void ReadFields (Datainput in) throws IOException {Id.readfields (in), Name.readfields (in);}}

we can see that we implement our own serialization class mywritable. He has two fields that are Text,text is a serialized class that comes with Hadoop, and can be seen as a string (like that)?! write () and Readfield () use a callback function that writes out the stream (DataOutputStream datainputstream), or reads it, using a callback function (hook).
the results of the above operation are as follows:generated sequence of bytes:
command-line results:
Finish it!




















Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.