Reprinted by reference from: http://www.cnblogs.com/tovin/p/3833985.html
Recently, when using the spark development process, it was found that when the amount of data is large, the cache data consumes a lot of memory. To reduce memory consumption, test the use of Kryo serialization
The code contains three classes, Kryotest, Myregistrator, and Qualify.
We know that the default use of a Java-brought serialization mechanism is in spark. If you want to use Kryo serialization, you only need to add the red part of the Kryotest class to specify the spark serialization class
You also need to add the Myregistrator class to register classes that need to be serialized with Kryo
Public classKryotest { Public Static voidMain (string[] args) {sparkconf conf=Newsparkconf (); Conf.setmaster ("Local"); Conf.setappname ("Kryotest"); Conf.set ("Spark.serializer", "Org.apache.spark.serializer.KryoSerializer"); Conf.set ("Spark.kryo.registrator", "Myregistrator"); Javasparkcontext SC=Newjavasparkcontext (conf); Javardd<String> Rdd = Sc.textfile ("/home/hdpusr/qualifying.txt"); Javardd<Qualify> map = Rdd.map (NewFunction<string, qualify>() { /*(non-javadoc) * @see Org.apache.spark.api.java.function.function#call (java.lang.Object)*/ PublicQualify Call (String v1)throwsException {//TODO auto-generated Method StubString s[] = V1.split (","); Qualify Q=NewQualify (); Q.seta (Integer.parseint (s[0])); Q.setb (Long.parselong (s[1])); Q.SETC (s[2]); returnQ; } }); Map.persist (Storagelevel.memory_and_disk_ser ()); System.out.println (Map.count ()); }}
ImportOrg.apache.spark.serializer.KryoRegistrator;ImportCom.esotericsoftware.kryo.Kryo; Public classMyregistratorImplementskryoregistrator{/*(non-javadoc) * @see org.apache.spark.serializer.kryoregistrator#registerclasses ( Com.esotericsoftware.kryo.Kryo)*/ Public voidregisterclasses (Kryo arg0) {//TODO auto-generated Method StubArg0.register (Qualify.class); }}
Importjava.io.Serializable; Public classQualifyImplementsserializable{intA; Longb; String C; Public intGeta () {returnA; } Public voidSetA (inta) { This. A =A; } Public LongGetb () {returnb; } Public voidSETB (Longb) { This. B =b; } PublicString GetC () {returnC; } Public voidsetc (String c) { This. C =C; } }
Let's look at the comparison between using Java serialization and Kryo serialization
Java serialization
Kryo serialization
From the actual running data can be seen or can save a lot of memory. It is recommended to use Kryo serialization this way when memory is not enough
Spark uses Kryoregistrator Java code example