Scala> val result = rdd. flatMap (_. split (""). map (_, 1). performancebykey (_ + _). collect 14/12/18 01:14:51 INFO spark. SparkContext: Starting job: collect at <console>: 14 14/12/18 01:14:51 INFO scheduler. DAGScheduler: Registering RDD 9 (map at <console>: 14) 14/12/18 01:14:51 INFO schedtions. dagschedtions: Got job 0 (collect at <console>: 14) with 1 output partitions (allowLocal = false) 14/12/18 01:14:51 INFO scheduler. DAGScheduler: Final stage: Stage 0 (collect at <console>: 14) 14/12/18 01:14:51 INFO scheduler. dagschedents: Parents of final stage: List (Stage 1) 14/12/18 01:14:51 INFO scheduler. dagschedents: Missing parents: List (Stage 1) 14/12/18 01:14:51 INFO scheddd. dagschedents: Submitting Stage 1 (MappedRDD [9] at map at <console>: 14), which has no missing parents 14/12/18 01:14:51 INFO storage. MemoryStore: ensureFreeSpace (3440) called with curMem = 413313, maxMem = 280248975 14/12/18 01:14:51 INFO storage. MemoryStore: Block broadcast_4 stored as values in memory (estimated size 3.4 KB, free 266.9 MB) 14/12/18 01:14:51 INFO scheduler. dagschedting: Submitting 1 missing tasks from Stage 1 (MappedRDD [9] at map at <console>: 14) 14/12/18 01:14:51 INFO scheduler. TaskSchedulerImpl: Adding task set 1.0 with 1 tasks 14/12/18 01:14:51 INFO scheduler. TaskSetManager: Starting task 0.0 in stage 1.0 (TID 0, localhost, ANY, 1185 bytes) 14/12/18 01:14:51 INFO executor. Executor: Running task 0.0 in stage 1.0 (TID 0) 14/12/18 01:14:51 INFO rdd. HadoopRDD: Input split: hdfs: // 192.168.0.245: 8020/test/README. md: 0 + 4811 14/12/18 01:14:51 INFO Configuration. deprecation: mapred. tip. id is deprecated. Instead, use mapreduce. task. id 14/12/18 01:14:51 INFO Configuration. deprecation: mapred. task. id is deprecated. Instead, use mapreduce. task. attempt. id 14/12/18 01:14:51 INFO Configuration. deprecation: mapred. task. is. map is deprecated. Instead, use mapreduce. task. ismap 14/12/18 01:14:51 INFO Configuration. deprecation: mapred. task. partition is deprecated. Instead, use mapreduce. task. partition 14/12/18 01:14:51 INFO Configuration. deprecation: mapred. job. id is deprecated. Instead, use mapreduce. job. id 14/12/18 01:14:52 INFO executor. Executor: Finished task 0.0 in stage 1.0 (TID 0). 1860 bytes result sent to driver 14/12/18 01:14:53 INFO scheduler. DAGScheduler: Stage 1 (map at <console>: 14) finished in 1.450 s 14/12/18 01:14:53 INFO scheduler. dagschedages: looking for newly runnable stages 14/12/18 01:14:53 INFO scheduler. DAGScheduler: running: Set () 14/12/18 01:14:53 INFO scheduler. DAGScheduler: waiting: Set (Stage 0) 14/12/18 01:14:53 INFO scheduler. TaskSetManager: Finished task 0.0 in stage 1.0 (TID 0) in 1419 MS on localhost (1/1) 14/12/18 01:14:53 INFO scheduler. dagschedled: failed: Set () 14/12/18 01:14:53 INFO schedset. TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 14/12/18 01:14:53 INFO scheduler. dagschedents: Missing parents for Stage 0: List () 14/12/18 01:14:53 INFO scheduler. dagschedting: Submitting Stage 0 (ShuffledRDD [10] at performancebykey at <console>: 14), which is now runnable 14/12/18 01:14:53 INFO storage. MemoryStore: ensureFreeSpace (2112) called with curMem = 416753, maxMem = 280248975 14/12/18 01:14:53 INFO storage. MemoryStore: Block broadcast_5 stored as values in memory (estimated size 2.1 KB, free 266.9 MB) 14/12/18 01:14:53 INFO scheduler. dagschedting: Submitting 1 missing tasks from Stage 0 (ShuffledRDD [10] at performancebykey at <console>: 14) 14/12/18 01:14:53 INFO scheduler. TaskSchedulerImpl: Adding task set 0.0 with 1 tasks 14/12/18 01:14:53 INFO scheduler. TaskSetManager: Starting task 0.0 in stage 0.0 (TID 1, localhost, PROCESS_LOCAL, 948 bytes) 14/12/18 01:14:53 INFO executor. Executor: Running task 0.0 in stage 0.0 (TID 1) 14/12/18 01:14:53 INFO storage. BlockFetcherIterator $ BasicBlockFetcherIterator: maxBytesInFlight: 50331648, targetRequestSize: 10066329 14/12/18 01:14:53 INFO storage. BlockFetcherIterator $ BasicBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks 14/12/18 01:14:53 INFO storage. BlockFetcherIterator $ BasicBlockFetcherIterator: Started 0 remote fetches in 5 MS 14/12/18 01:14:53 INFO executor. Executor: Finished task 0.0 in stage 0.0 (TID 1). 8680 bytes result sent to driver 14/12/18 01:14:53 INFO scheduler. DAGScheduler: Stage 0 (collect at <console>: 14) finished in 0.108 s 14/12/18 01:14:53 INFO scheduler. TaskSetManager: Finished task 0.0 in stage 0.0 (TID 1) in 99 MS on localhost (1/1) 14/12/18 01:14:53 INFO schedset. TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 14/12/18 01:14:53 INFO spark. SparkContext: Job finished: collect at <console>: 14, took 1.884598939 s Result: Array [(String, Int)] = Array (For, 5), (Programs, 1), (gladly, 1), (Because, 1), (, 1), (agree, 1), (cluster ., 1), (webpage, 1), (its, 1), (-Pyarn, 3), (under, 2), (legal, 1), (APIs, 1 ), (1.x,, 1), (computation, 1), (Try, 1), (MRv1, 1), (have, 2), (Thrift, 2), (add, 2), (through, 1), (several, 1), (This, 2), (Whether, 1), ("yarn-cluster", 1), (%, 2), (graph, 1), (storage, 1), (To, 2), (setting, 2), (any, 2), (Once, 1 ), (application, 1), (JDBC, 3), (use:, 1), (prefer, 1), (SparkPi, 2), (engine, 1), (version, 3), (file, 1), (documentation, 1), (processing, 2), (Along, 1), (the, 28), (explicitly ,, 1), (entry, 1), (author ., 1), (are, 2), (systems ., 1), (params, 1), (not, 2), (different, 1), (refer, 1), (Interactive, 2), (given ., 1), (if, 5), ('-pyarn':, 1), (build, 3), (when, 3), (be, 2), (Tests, 1), (file's, 1), (Apache, 6 ),(. /bin/run-e... |