Sc.textfile ("hdfs://..."). FlatMap (Line =>line.split ("")). Map (w = (w,1)). Reducebykey (_+_). foreach (println)
Do not use Reducebykey
Sc.textfile ("hdfs://..."). FlatMap (L=>l.split ("")). Map (w=> (w,1)). Groupbykey (). Map (P: (string,iterable[ INT]) = = (p._1,p._2.sum)). Collect
The call path to create from Spark-shell to Sparkcontext:
Spark-shell, Spark-submit->spark-class->sparksubmit.main->sparkiloop, CreateSparkContext
Spackcontext the incoming parameter during initialization is sparkconf
First, generate sparkconf according to the initialization, and then create sparkenv according to Sparkconf.
Second, create TaskScheduler, according to the mode of operation of Spark select corresponding Schedulerbackend, and start TaskScheduler
Private[spark] var TaskScheduler = Sparkcontext.createtaskscheduler (this,master,appname) TaskScheduler.start ()
Createtaskscheduler is the most critical, based on the master environment variable to determine how spark is currently deployed, resulting in different subclasses of the corresponding schedulerbackend. The purpose of the Taskscheduler.start is to initiate the corresponding schedulerbackend.
Third, the TaskScheduler instance created from the previous step creates a Dagscheduler for the incoming parameter and starts the run.
Private[spark] var dagscheduler = new Dagscheduler (TaskScheduler) Dagscheduler.start ()
Four, start WebUI.
Ui.start ()
The most simple ~wordcount¬