Spark uses Kryoregistrator Java code example

Source: Internet
Author: User

Reprinted by reference from: http://www.cnblogs.com/tovin/p/3833985.html

Recently, when using the spark development process, it was found that when the amount of data is large, the cache data consumes a lot of memory. To reduce memory consumption, test the use of Kryo serialization

The code contains three classes, Kryotest, Myregistrator, and Qualify.

We know that the default use of a Java-brought serialization mechanism is in spark. If you want to use Kryo serialization, you only need to add the red part of the Kryotest class to specify the spark serialization class

You also need to add the Myregistrator class to register classes that need to be serialized with Kryo

 Public classKryotest { Public Static voidMain (string[] args) {sparkconf conf=Newsparkconf (); Conf.setmaster ("Local"); Conf.setappname ("Kryotest"); Conf.set ("Spark.serializer", "Org.apache.spark.serializer.KryoSerializer"); Conf.set ("Spark.kryo.registrator", "Myregistrator"); Javasparkcontext SC=Newjavasparkcontext (conf); Javardd<String> Rdd = Sc.textfile ("/home/hdpusr/qualifying.txt"); Javardd<Qualify> map = Rdd.map (NewFunction<string, qualify>() {            /*(non-javadoc) * @see Org.apache.spark.api.java.function.function#call (java.lang.Object)*/             PublicQualify Call (String v1)throwsException {//TODO auto-generated Method StubString s[] = V1.split (","); Qualify Q=NewQualify (); Q.seta (Integer.parseint (s[0])); Q.setb (Long.parselong (s[1])); Q.SETC (s[2]); returnQ;        }        });        Map.persist (Storagelevel.memory_and_disk_ser ());    System.out.println (Map.count ()); }}
ImportOrg.apache.spark.serializer.KryoRegistrator;ImportCom.esotericsoftware.kryo.Kryo; Public classMyregistratorImplementskryoregistrator{/*(non-javadoc) * @see org.apache.spark.serializer.kryoregistrator#registerclasses ( Com.esotericsoftware.kryo.Kryo)*/     Public voidregisterclasses (Kryo arg0) {//TODO auto-generated Method StubArg0.register (Qualify.class); }}
Importjava.io.Serializable; Public classQualifyImplementsserializable{intA; Longb;    String C;  Public intGeta () {returnA; }     Public voidSetA (inta) { This. A =A; }     Public LongGetb () {returnb; }     Public voidSETB (Longb) { This. B =b; }     PublicString GetC () {returnC; }     Public voidsetc (String c) { This. C =C; }    }

Let's look at the comparison between using Java serialization and Kryo serialization

Java serialization

  

Kryo serialization

From the actual running data can be seen or can save a lot of memory. It is recommended to use Kryo serialization this way when memory is not enough

Spark uses Kryoregistrator Java code example

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.