One. Spark Source code compilation
Analytical:
wget http://archive.apache.org/dist/spark/spark-1.6.0/spark-1.6.0-bin-hadoop2.6-ZXVF spark-1.6.0-bin-hadoop2.6. TGZCD Spark-1.6.0-bin-hadoop2.6. /SBT/SBT Gen-idea
Description: After a lengthy wait, the above command generates the SBT project, and we can use the idea to open in SBT project mode.
Two. Detailed RDD implementation
You can persist Rdd,cache () by using the persist () or the cache () method as a shortcut to use persist (). To avoid the overhead of cache loss recalculation, we can use the checkpoint mechanism of spark so that when the downstream rdd goes wrong, you can continue to compute from the checkpoint Rdd.
Three. Scheduler module detailed
Four. Deploy module detailed
Five. Executor module detailed
Six. Shuffle module detailed
Seven. Storage module detailed
Reference documents:
[1] Scala Tutorial: Simple Build Tool sbt:http://www.importnew.com/4311.html
[2] Spark's cache and checkpoint:http://www.fuqingchuan.com/2015/06/949.html?utm_source=tuicool&utm_medium= Referral
[3] Spark Technology Insider: An in-depth understanding of the design and implementation principles of spark kernel architecture
Spark Source Learning and Summary 1