Tonight listen to Liaoliang's spark IMF legendary action 16th course Rdd, class notes are as follows:
Rdd operation type: Transformation, action, Contoller
Reduce must conform to the Exchange law and the binding law
Val textlines = Linecount.reducebykey (_+_,1) TextLines.collect.foreach (pair=> println (pair._1 + "=" +pair._2)) def Collect (): array[t] = withscope { val results = Sc.runjob (this, (iter:iterator[t]) = Iter.toarray) Array.con Cat (Results: _*) }
You can see that the array returned by Collect is a series of elements of tuple
If the degree of parallelism is not set, it is determined by the number of original files and the file size.
Two stages are likely to be executed on different nodes.
One of the drawings:
Follow-up courses can be referred to Sina Weibo Liaoliang _dt Big Data Dream Factory: Http://weibo.com/ilovepains
Liaoliang China Spark First person, public number Dt_spark
Forward please specify the source.
Spark IMF legendary action 16th lesson on Rdd Combat summary