Spark is a fast developing distributed parallel data processing framework in recent years , understanding and mastering spark is essential for learning big data. But spark relies on the function unit, what is its function programming process? How are we going to apply it?
I. Functional programming of SPARK
Spark depending on the function unit, the function is the basic unit of its programming, with only input and output, no state and side effect. The key concept is to use the function as input to other functions, but in the process of using the function is an anonymous function, because this function only satisfies the current calculation, so there is no need to solidify down for other applications.
many RDD operations pass functions as parameters, Here we look at the Rdd map Operation Pseudocode, which applies the function fn to each record of the RDD. But that's not the real code it executes, it's just a code to look at the logic it's dealing with.
650) this.width=650; "Src=" Http://s2.51cto.com/wyfs02/M00/8C/FA/wKiom1iAZgiiA132AABUin_qCHk320.png-wh_500x0-wm_3 -wmp_4-s_2485696303.png "title=" 11.png "alt=" Wkiom1iazgiia132aabuin_qchk320.png-wh_50 "/>
Example: Passing a named function
650) this.width=650; "Src=" Http://s3.51cto.com/wyfs02/M01/8C/FA/wKiom1iAZhqztMPyAAEpGTxyBNc172.png-wh_500x0-wm_3 -wmp_4-s_3250345437.png "title=" 22.png "alt=" Wkiom1iazhqztmpyaaepgtxybnc172.png-wh_50 "/>
Anonymous functions are inline-defined functions that do not have identifiers, and are best suited for temporary disposable functions. Supported in many programming languages, such as:
(1) Python:lambda X
(2) Scala:x =
(3) Java 8:x
Example: Passing anonymous functions
(1) Python
650) this.width=650; "Src=" Http://s1.51cto.com/wyfs02/M02/8C/FA/wKiom1iAZiiAHf7IAAAJwboQwf4470.png-wh_500x0-wm_3 -wmp_4-s_3080003682.png "title=" 33.png "alt=" Wkiom1iaziiahf7iaaajwboqwf4470.png-wh_50 "/>
(2) Scala
650) this.width=650; "Src=" Http://s5.51cto.com/wyfs02/M00/8C/FA/wKiom1iAZjSgrd4CAAA6yh5E_sE817.png-wh_500x0-wm_3 -wmp_4-s_3404575053.png "title=" 44.png "alt=" Wkiom1iazjsgrd4caaa6yh5e_se817.png-wh_50 "/>
(1) Python
650) this.width=650; "Src=" Http://s4.51cto.com/wyfs02/M01/8C/FA/wKiom1iAZkPgamqgAADNe6mXUI4035.png-wh_500x0-wm_3 -wmp_4-s_2729183571.png "title=" 55.png "alt=" Wkiom1iazkpgamqgaadne6mxui4035.png-wh_50 "/>
(2) Scala
650) this.width=650; "Src=" Http://s4.51cto.com/wyfs02/M02/8C/FA/wKiom1iAZlPCRK_nAAB593fGRkU584.png-wh_500x0-wm_3 -wmp_4-s_550178173.png "title=" 66.png "alt=" Wkiom1iazlpcrk_naab593fgrku584.png-wh_50 "/>
Spark as an important subcategory of today's big data, learning must be mastered in depth. But big data is still in the beginning development, and did not form a complete mature theory system, need our multi-faceted, multi-channel mining learning. This recommended "Big data CN" public platform, which introduces a lot of big data related knowledge, very good!
This article is from the "11872756" blog, please be sure to keep this source http://11882756.blog.51cto.com/11872756/1893173
Functional programming of Spark