first, the serialization mechanism of Java
The Objectoutputstream/objectinputstream object input stream mechanism for serialization.
The advantage of this default serialization mechanism is that it is easy to handle, do not need to do anything manually, as long as the variables used inside the operator, the implementation of the serializable interface, can be serialized.
However, the disadvantage is that the default serialization mechanism is inefficient, the serialization speed is slow, the data after serialization, the memory space occupied relatively large.
You can manually serialize the optimization of the format.
Spark supports the Kryo serialization mechanism. The Kryo serialization mechanism is faster than the default Java serialization mechanism, and the serialized data is smaller.
is probably 1/10 of Java.
So reduce the transmission of data and reduce memory consumption. Second, kryo serialization mechanism:
1. The external variable used in the operator function.
2. Serialize when the RDD is persisted, Storagelever.memory_only_ser
3.shuffle Three, the effect
1. External variables used in operator functions, after using Kryo: Optimize the performance of network transmission, can optimize the memory consumption in the cluster.
2. Persist the RDD, optimize the memory usage and consumption, the less memory the persistent rdd consumes, the more objects that are created when the task executes, the more frequently the GC is not occupied.
3.shuffle: Can optimize the performance of network transmission. four, how to use. First Step: Set a property in Sparkconf.
Set ("Spark.serializer", "Org.apache.spark.serializer.KryoSerializer")
The reason why Kryo is not used as the default serialization class library is because Kryo requires that, if it is to achieve its best performance, it must be
Register your custom class (for example, your operator function uses an object variable of an external custom type, which requires you to register your class, otherwise kryo the best performance) Step two: Register the classes that you used, that need to be serialized by Kryo, some custom.
. registerkryoclasses (New Class[]{categorysortkey.class});