Spark Performance Tuning-using Kryo serialization in real-world projects

Source: Internet
Author: User
Tags memory usage requires serialization shuffle
first, the serialization mechanism of Java

The Objectoutputstream/objectinputstream object input stream mechanism for serialization.

The advantage of this default serialization mechanism is that it is easy to handle, do not need to do anything manually, as long as the variables used inside the operator, the implementation of the serializable interface, can be serialized.

However, the disadvantage is that the default serialization mechanism is inefficient, the serialization speed is slow, the data after serialization, the memory space occupied relatively large.

You can manually serialize the optimization of the format.

Spark supports the Kryo serialization mechanism. The Kryo serialization mechanism is faster than the default Java serialization mechanism, and the serialized data is smaller.
is probably 1/10 of Java.

So reduce the transmission of data and reduce memory consumption. Second, kryo serialization mechanism:

1. The external variable used in the operator function.
2. Serialize when the RDD is persisted, Storagelever.memory_only_ser
3.shuffle Three, the effect

1. External variables used in operator functions, after using Kryo: Optimize the performance of network transmission, can optimize the memory consumption in the cluster.

2. Persist the RDD, optimize the memory usage and consumption, the less memory the persistent rdd consumes, the more objects that are created when the task executes, the more frequently the GC is not occupied.

3.shuffle: Can optimize the performance of network transmission. four, how to use. First Step: Set a property in Sparkconf.

Set ("Spark.serializer", "Org.apache.spark.serializer.KryoSerializer")

The reason why Kryo is not used as the default serialization class library is because Kryo requires that, if it is to achieve its best performance, it must be
Register your custom class (for example, your operator function uses an object variable of an external custom type, which requires you to register your class, otherwise kryo the best performance) Step two: Register the classes that you used, that need to be serialized by Kryo, some custom.

. registerkryoclasses (New Class[]{categorysortkey.class});

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.