Last night I listened to Liaoliang's spark IMF saga 18th lesson: Rdd Persistence, broadcast, accumulator, homework is unpersist test, read the accumulator source code see internal working mechanism:
scala> val rdd = sc.parallelize (1 to 1000) Rdd:org.apache.spark.rdd.rdd[int]= Parallelcollectionrdd[0] at parallelize at <console>:21Scala>Rdd.persistres0:rdd.type= Parallelcollectionrdd[0] at parallelize at <console>:21Scala>Rdd.count16/01/24 11:42:56 INFO dagscheduler:job 0 finished:count at <console>:24, took 1.451543Sres1:long= 100016/01/24 11:43:14 INFO dagscheduler:job 2 finished:count at <console>:24, took 0.094119Sres3:long= 1000Scala>rdd.unpersist ()16/01/24 11:43:43 INFO parallelcollectionrdd:removing RDD 0From persistence list16/01/24 11:43:43 INFO blockmanager:removing RDD 0Res5:rdd.type= Parallelcollectionrdd[0] at parallelize at <console>:21Scala>Rdd.count16/01/24 11:44:56 INFO dagscheduler:job 0 finished:count at <console>:24, took 1.475321Sres1:long= 1000
After Persisit, Count executes a lot faster, but after unpersist, execution slows down.
ACCUMULATOR ACCUMULATOR: Globally unique, for executor can only be modified but not readable, only to driver readable, only increase
Val sum = sc.accumulator (0= sc.parallelize (1 to 5= D1.foreach (item = sum+= item) println (SUM)
The result is 15.
Follow-up courses can be referred to Sina Weibo Liaoliang _dt Big Data Dream Factory: Http://weibo.com/ilovepains
Liaoliang China Spark First person, public number Dt_spark
Forward please specify the source.
Spark IMF legendary action 18th lesson: Rdd Persistence, broadcast, accumulator summary