Serialization in Spark

Source: Internet
Author: User
1. Serialization is often used for network transport and data persistence for storage and transport, and Spark creates serializers in two ways
Serializer = Instantiateclassfromconf[serializer] (
"Spark.serializer" "Org.apache.spark.serializer.JavaSerializer")
Logdebug (${serializer.getclass}")
The serialization method is not used in Blockmanager for the time being
Closureserializer = Instantiateclassfromconf[serializer] (
"Spark.closure.serializer" "Org.apache.spark.serializer.JavaSerializer")
Two typical serialization scenarios in 2.Spark
Serializing scenario A: Performing RDD operations such as map, first performing cleanf, internal left F parsing, and F serialization
Private def ensureserializable (Func:anyref) {
try {
if (sparkenv.get! = null) {
SparkEnv.get.closureSerializer.newInstance (). Serialize (func)
}
Valclosureserializer = Instantiateclassfromconf[serializer] (
"Spark.closure.serializer", "Org.apache.spark.serializer.JavaSerializer")
Conclusion:
The Spark.closure.serializer configuration determines how the function is serialized
In the serialized scene B:blockmanager
Val Blockmanager = new Blockmanager (Executorid, rpcenv, Blockmanagermaster,
Serializer, Conf, Memorymanager, Mapoutputtracker, Shufflemanager,
Blocktransferservice, SecurityManager, Numusablecores)
Val serializer = Instantiateclassfromconf[serializer] (
"Spark.serializer", "Org.apache.spark.serializer.JavaSerializer")
Conclusion:
Spark.serializer determines the way in which Blockmanager work is serialized
3.Spark default Java serializer, I recommend the use of Kryo serialization to improve performance, the following analysis of the two kinds of serialization mechanism similarities and differences
A. First look at the serialization mechanism of Java
Serialize[t:classtag] (t:t): Bytebuffer = {
Bytearrayoutputstream ()
out = Serializestream (BOS)
Out.writeobject (t)
Out.close ()
Bytebuffer.wrap (Bos.tobytearray)
}
B. Another look at the serialization mechanism of Kyro
Create two deferred workflows (Kyro input and output streams)
Private lazy val output = Ks.newkryooutput ()
Private lazy val input = new Kryoinput ()

Override Def Serialize[t:classtag] (t:t): Bytebuffer = {
Set Position=0 && total=0  
Output.clear ()
Get Kryo Object
Val Kryo = Borrowkryo ()
{
    Start serialization of the T function
Kryo.writeclassandobject (output, T)
{
....
{
    Use current Kryo as cache
Releasekryo (Kryo)
}
Bytebuffer.wrap (Output.tobytes)
}
===================================kryo analysis of the ornate demarcation line ============================================
A: Gets the Kryo object and initializes
Kryoserializer
The process
First, check if there is a cached Kryo object, and if there is a state that the rest clears out Kryo, set the cache to null and return
If a new kryo is not created in the cache, the new Kryo is a tedious process, including creating a new Kryo object,
Public Kryo () {
This (new Defaultclassresolver (), New Mapreferenceresolver ());
}
Public Kryo (Classresolver classresolver, Referenceresolver referenceresolver) {
if (Classresolver = = null) throw new IllegalArgumentException ("Classresolver cannot be null.");

This.classresolver = Classresolver;
Classresolver.setkryo (this);

This.referenceresolver = Referenceresolver;
if (referenceresolver! = null) {
Referenceresolver.setkryo (this);
References = true;
}

Adddefaultserializer (Byte[].class, Bytearrayserializer.class);
Adddefaultserializer (Char[].class, Chararrayserializer.class);
Adddefaultserializer (Short[].class, Shortarrayserializer.class);
Adddefaultserializer (Int[].class, Intarrayserializer.class);
Adddefaultserializer (Long[].class, Longarrayserializer.class);
Adddefaultserializer (Float[].class, Floatarrayserializer.class);
Adddefaultserializer (Double[].class, Doublearrayserializer.class);
Adddefaultserializer (Boolean[].class, Booleanarrayserializer.class);
Adddefaultserializer (String[].class, Stringarrayserializer.class);
Adddefaultserializer (Object[].class, Objectarrayserializer.class);
Adddefaultserializer (Biginteger.class, Bigintegerserializer.class);
Adddefaultserializer (Bigdecimal.class, Bigdecimalserializer.class);
Adddefaultserializer (Class.class, Classserializer.class);
Adddefaultserializer (Date.class, Dateserializer.class);
Adddefaultserializer (Enum.class, Enumserializer.class);
Adddefaultserializer (Enumset.class, Enumsetserializer.class);
Adddefaultserializer (Currency.class, Currencyserializer.class);
Adddefaultserializer (Stringbuffer.class, Stringbufferserializer.class);
Adddefaultserializer (Stringbuilder.class, Stringbuilderserializer.class);
Adddefaultserializer (Collections.EMPTY_LIST.getClass (), collectionsemptylistserializer.class);
Adddefaultserializer (Collections.EMPTY_MAP.getClass (), collectionsemptymapserializer.class);
Adddefaultserializer (Collections.EMPTY_SET.getClass (), collectionsemptysetserializer.class);
Adddefaultserializer (Collections.singletonlist (null). GetClass (), collectionssingletonlistserializer.class);
Adddefaultserializer (Collections.singletonmap (null, NULL). GetClass (), collectionssingletonmapserializer.class);
Adddefaultserializer (Collections.singleton (null). GetClass (), collectionssingletonsetserializer.class);
Adddefaultserializer (Collection.class, Collectionserializer.class);
Adddefaultserializer (Treemap.class, Treemapserializer.class);
Adddefaultserializer (Map.class, Mapserializer.class);
Adddefaultserializer (Kryoserializable.class, Kryoserializableserializer.class);
Adddefaultserializer (Timezone.class, Timezoneserializer.class);
Adddefaultserializer (Calendar.class, Calendarserializer.class);
Lowprioritydefaultserializercount = Defaultserializers.size ();

Primitives and string. Primitive wrappers automatically use the same registration as primitives.
Register (Int.class, new Intserializer ());
Register (String.class, new Stringserializer ());
Register (Float.class, new Floatserializer ());
Register (Boolean.class, new Booleanserializer ());
Register (Byte.class, new Byteserializer ());
Register (Char.class, new Charserializer ());
Register (Short.class, new Shortserializer ());
Register (Long.class, new Longserializer ());
Register (Double.class, new Doubleserializer ());
}

Register the types that you must use in your spark application, such as
Kryo.register (Classof[httpbroadcast[_]], new Kryojavaserializer ())
Kryo.register (Classof[array[tuple2[any, any]])
Kryo.register (Classof[genericrecord], new Genericavroserializer (Avroschemas))
Conclusion: The Register method has many overloads that can register different serializers for the same class, and the serializer determines the way to perform serialization (read, write), which is actually from
Objectmap<class, registration> in the Class to obtain the corresponding
Registratio then update
Registratio in the

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.