1. Serialization is often used for network transport and data persistence for storage and transport, and Spark creates serializers in two ways
Serializer = Instantiateclassfromconf[serializer] (
"Spark.serializer" "Org.apache.spark.serializer.JavaSerializer")
Logdebug (${serializer.getclass}")
The serialization method is not used in Blockmanager for the time being
Closureserializer = Instantiateclassfromconf[serializer] (
"Spark.closure.serializer" "Org.apache.spark.serializer.JavaSerializer")
Two typical serialization scenarios in 2.Spark
Serializing scenario A: Performing RDD operations such as map, first performing cleanf, internal left F parsing, and F serialization
Private def ensureserializable (Func:anyref) {
try {
if (sparkenv.get! = null) {
SparkEnv.get.closureSerializer.newInstance (). Serialize (func)
}
Valclosureserializer = Instantiateclassfromconf[serializer] (
"Spark.closure.serializer", "Org.apache.spark.serializer.JavaSerializer")
Conclusion:
The Spark.closure.serializer configuration determines how the function is serialized
In the serialized scene B:blockmanager
Val Blockmanager = new Blockmanager (Executorid, rpcenv, Blockmanagermaster,
Serializer, Conf, Memorymanager, Mapoutputtracker, Shufflemanager,
Blocktransferservice, SecurityManager, Numusablecores)
Val serializer = Instantiateclassfromconf[serializer] (
"Spark.serializer", "Org.apache.spark.serializer.JavaSerializer")
Conclusion:
Spark.serializer determines the way in which Blockmanager work is serialized
3.Spark default Java serializer, I recommend the use of Kryo serialization to improve performance, the following analysis of the two kinds of serialization mechanism similarities and differences
A. First look at the serialization mechanism of Java
Serialize[t:classtag] (t:t): Bytebuffer = {
Bytearrayoutputstream ()
out = Serializestream (BOS)
Out.writeobject (t)
Out.close ()
Bytebuffer.wrap (Bos.tobytearray)
}
B. Another look at the serialization mechanism of Kyro
Create two deferred workflows (Kyro input and output streams)
Private lazy val output = Ks.newkryooutput ()
Private lazy val input = new Kryoinput ()
Override Def Serialize[t:classtag] (t:t): Bytebuffer = {
Set Position=0 && total=0
Output.clear ()
Get Kryo Object
Val Kryo = Borrowkryo ()
{
Start serialization of the T function
Kryo.writeclassandobject (output, T)
{
....
{
Use current Kryo as cache
Releasekryo (Kryo)
}
Bytebuffer.wrap (Output.tobytes)
}
===================================kryo analysis of the ornate demarcation line ============================================
A: Gets the Kryo object and initializes
Kryoserializer
The process
First, check if there is a cached Kryo object, and if there is a state that the rest clears out Kryo, set the cache to null and return
If a new kryo is not created in the cache, the new Kryo is a tedious process, including creating a new Kryo object,
Public Kryo () {
This (new Defaultclassresolver (), New Mapreferenceresolver ());
}
Public Kryo (Classresolver classresolver, Referenceresolver referenceresolver) {
if (Classresolver = = null) throw new IllegalArgumentException ("Classresolver cannot be null.");
This.classresolver = Classresolver;
Classresolver.setkryo (this);
This.referenceresolver = Referenceresolver;
if (referenceresolver! = null) {
Referenceresolver.setkryo (this);
References = true;
}
Adddefaultserializer (Byte[].class, Bytearrayserializer.class);
Adddefaultserializer (Char[].class, Chararrayserializer.class);
Adddefaultserializer (Short[].class, Shortarrayserializer.class);
Adddefaultserializer (Int[].class, Intarrayserializer.class);
Adddefaultserializer (Long[].class, Longarrayserializer.class);
Adddefaultserializer (Float[].class, Floatarrayserializer.class);
Adddefaultserializer (Double[].class, Doublearrayserializer.class);
Adddefaultserializer (Boolean[].class, Booleanarrayserializer.class);
Adddefaultserializer (String[].class, Stringarrayserializer.class);
Adddefaultserializer (Object[].class, Objectarrayserializer.class);
Adddefaultserializer (Biginteger.class, Bigintegerserializer.class);
Adddefaultserializer (Bigdecimal.class, Bigdecimalserializer.class);
Adddefaultserializer (Class.class, Classserializer.class);
Adddefaultserializer (Date.class, Dateserializer.class);
Adddefaultserializer (Enum.class, Enumserializer.class);
Adddefaultserializer (Enumset.class, Enumsetserializer.class);
Adddefaultserializer (Currency.class, Currencyserializer.class);
Adddefaultserializer (Stringbuffer.class, Stringbufferserializer.class);
Adddefaultserializer (Stringbuilder.class, Stringbuilderserializer.class);
Adddefaultserializer (Collections.EMPTY_LIST.getClass (), collectionsemptylistserializer.class);
Adddefaultserializer (Collections.EMPTY_MAP.getClass (), collectionsemptymapserializer.class);
Adddefaultserializer (Collections.EMPTY_SET.getClass (), collectionsemptysetserializer.class);
Adddefaultserializer (Collections.singletonlist (null). GetClass (), collectionssingletonlistserializer.class);
Adddefaultserializer (Collections.singletonmap (null, NULL). GetClass (), collectionssingletonmapserializer.class);
Adddefaultserializer (Collections.singleton (null). GetClass (), collectionssingletonsetserializer.class);
Adddefaultserializer (Collection.class, Collectionserializer.class);
Adddefaultserializer (Treemap.class, Treemapserializer.class);
Adddefaultserializer (Map.class, Mapserializer.class);
Adddefaultserializer (Kryoserializable.class, Kryoserializableserializer.class);
Adddefaultserializer (Timezone.class, Timezoneserializer.class);
Adddefaultserializer (Calendar.class, Calendarserializer.class);
Lowprioritydefaultserializercount = Defaultserializers.size ();
Primitives and string. Primitive wrappers automatically use the same registration as primitives.
Register (Int.class, new Intserializer ());
Register (String.class, new Stringserializer ());
Register (Float.class, new Floatserializer ());
Register (Boolean.class, new Booleanserializer ());
Register (Byte.class, new Byteserializer ());
Register (Char.class, new Charserializer ());
Register (Short.class, new Shortserializer ());
Register (Long.class, new Longserializer ());
Register (Double.class, new Doubleserializer ());
}
Register the types that you must use in your spark application, such as
Kryo.register (Classof[httpbroadcast[_]], new Kryojavaserializer ())
Kryo.register (Classof[array[tuple2[any, any]])
Kryo.register (Classof[genericrecord], new Genericavroserializer (Avroschemas))
Conclusion: The Register method has many overloads that can register different serializers for the same class, and the serializer determines the way to perform serialization (read, write), which is actually from
Objectmap<class, registration> in the Class to obtain the corresponding
Registratio then update
Registratio in the