Scala> val count = file.flatmap (line = Line.split ("")). Map (Word = = (word,1)). Reducebykey (_+_) Scala> val count = file.flatmap (line = Line.split ("")). Map (Word = = (word,1)). Reducebykey (_+_) 15/07/12 21:38:43 INFO fileinputformat:total input paths to process:1 count:org.apache.spark.rdd.rdd[(String, Int)] = shuffledrdd[8] at Reducebykey at <console>:23 Scala> Scala> Count.collect () 15/07/12 21:39:25 INFO sparkcontext:starting job:collect at <console>:26 15/07/12 21:39:25 INFO dagscheduler:registering RDD 7 (map at <console>:23) 15/07/12 21:39:25 INFO dagscheduler:got Job 0 (collect at <console>:26) with 3 output partitions (allowlocal=false) 15/07/12 21:39:25 INFO dagscheduler:final stage:resultstage 1 (collect at <console>:26) 15/07/12 21:39:25 INFO dagscheduler:parents of final stage:list (Shufflemapstage 0) 15/07/12 21:39:25 INFO dagscheduler:missing parents:list (shufflemapstage 0) 15/07/12 21:39:25 INFO dagscheduler:submitting shufflemapstage 0 (mappartitionsrdd[7] at map at <console>:23), whic H has no missing parents 15/07/12 21:39:25 INFO memorystore:ensurefreespace (4128) called with curmem=297554, maxmem=278302556 15/07/12 21:39:25 INFO memorystore:block broadcast_2 stored as values in memory (estimated size 4.0 KB, free 265.1 MB) 15/07/12 21:39:25 INFO memorystore:ensurefreespace (2305) called with curmem=301682, maxmem=278302556 15/07/12 21:39:25 INFO memorystore:block broadcast_2_piece0 stored as bytes in memory (estimated size 2.3 KB, free 265.1 MB) 15/07/12 21:39:25 INFO blockmanagerinfo:added broadcast_2_piece0 in memory on localhost:60268 (size:2.3 KB, free:265.4 MB) 15/07/12 21:39:25 INFO sparkcontext:created broadcast 2 from broadcast at dagscheduler.scala:874 15/07/12 21:39:25 INFO dagscheduler:submitting 3 missing tasks from Shufflemapstage 0 (mappartitionsrdd[7) at map at < CONSOLE>:23) 15/07/12 21:39:25 INFO taskschedulerimpl:adding task set 0.0 with 3 tasks 15/07/12 21:39:25 INFO tasksetmanager:starting task 0.0 in stage 0.0 (TID 0, localhost, any, 1406 bytes) 15/07/12 21:39:25 INFO tasksetmanager:starting Task 1.0 in Stage 0.0 (TID 1, localhost, any, 1406 bytes) 15/07/12 21:39:25 INFO executor:running Task 1.0 in Stage 0.0 (TID 1) 15/07/12 21:39:25 INFO executor:running task 0.0 in stage 0.0 (TID 0) 15/07/12 21:39:25 INFO hadooprdd:input split:hdfs://9.125.73.217:9000/hbase/hbase.version:0+3 15/07/12 21:39:25 INFO hadooprdd:input split:hdfs://9.125.73.217:9000/hbase/hbase.version:3+3 15/07/12 21:39:25 INFO deprecation:mapred.tip.id is deprecated. Instead, use Mapreduce.task.id 15/07/12 21:39:25 INFO deprecation:mapred.task.id is deprecated. Instead, use Mapreduce.task.attempt.id 15/07/12 21:39:25 INFO deprecation:mapred.task.partition is deprecated. Instead, use Mapreduce.task.partition 15/07/12 21:39:25 INFO deprecation:mapred.job.id is deprecated. Instead, use Mapreduce.job.id 15/07/12 21:39:25 INFO deprecation:mapred.task.is.map is deprecated. Instead, use Mapreduce.task.ismap 15/07/12 21:39:25 INFO executor:finished task 0.0 in stage 0.0 (TID 0). 2003 bytes result sent to driver 15/07/12 21:39:25 INFO executor:finished Task 1.0 in Stage 0.0 (TID 1). 2003 bytes result sent to driver 15/07/12 21:39:25 INFO tasksetmanager:starting Task 2.0 in stage 0.0 (TID 2, localhost, any, 1406 bytes) 15/07/12 21:39:25 INFO executor:running Task 2.0 in stage 0.0 (TID 2) 15/07/12 21:39:25 INFO tasksetmanager:finished Task 1.0 in Stage 0.0 (TID 1) inch 162 ms on localhost (1/3) 15/07/12 21:39:25 INFO tasksetmanager:finished task 0.0 in stage 0.0 (TID 0) inch 179 ms on localhost (2/3) 15/07/12 21:39:25 INFO hadooprdd:input split:hdfs://9.125.73.217:9000/hbase/hbase.version:6+1 15/07/12 21:39:25 INFO executor:finished Task 2.0 in stage 0.0 (TID 2). 2003 bytes result sent to driver 15/07/12 21:39:25 INFO dagscheduler:shufflemapstage 0 (map at <console>:23) finished in 0.205 s 15/07/12 21:39:25 INFO dagscheduler:looking for newly runnable stages 15/07/12 21:39:25 INFO DAGScheduler:running:Set () 15/07/12 21:39:25 INFO DAGScheduler:waiting:Set (resultstage 1) 15/07/12 21:39:25 INFO DAGScheduler:failed:Set () 15/07/12 21:39:25 INFO dagscheduler:missing Parents for Resultstage 1:list () 15/07/12 21:39:25 INFO dagscheduler:submitting resultstage 1 (shuffledrdd[8] at Reducebykey), <console>:23 is now runnable 15/07/12 21:39:25 INFO tasksetmanager:finished Task 2.0 in stage 0.0 (TID 2) in MS on localhost (3/3) 15/07/12 21:39:25 INFO taskschedulerimpl:removed TaskSet 0.0, whose tasks all completed, from pool 15/07/12 21:39:25 INFO memorystore:ensurefreespace (2288) called with curmem=303987, maxmem=278302556 15/07/12 21:39:25 INFO memorystore:block broadcast_3 stored as values in memory (estimated size 2.2 KB, free 265.1 MB) 15/07/12 21:39:25 INFO memorystore:ensurefreespace (1377) called with curmem=306275, maxmem=278302556 15/07/12 21:39:25 INFO memorystore:block broadcast_3_piece0 stored as bytes in memory (estimated size 1377.0 B, free 265. 1 MB) 15/07/12 21:39:25 INFO blockmanagerinfo:added broadcast_3_piece0 in memory on localhost:60268 (size:1377.0 B, free:265. 4 MB) 15/07/12 21:39:25 INFO sparkcontext:created broadcast 3 from broadcast at dagscheduler.scala:874 15/07/12 21:39:25 INFO dagscheduler:submitting 3 missing tasks from Resultstage 1 (shuffledrdd[8 "at Reducebykey" at <c ONSOLE>:23) 15/07/12 21:39:25 INFO taskschedulerimpl:adding Task Set 1.0 with 3 tasks 15/07/12 21:39:25 INFO tasksetmanager:starting task 0.0 in Stage 1.0 (TID 3, localhost, process_local, 1165 bytes) 15/07/12 21:39:25 INFO tasksetmanager:starting Task 1.0 in Stage 1.0 (TID 4, localhost, process_local, 1165 bytes) 15/07/12 21:39:25 INFO executor:running task 0.0 in Stage 1.0 (TID 3) 15/07/12 21:39:25 INFO executor:running Task 1.0 in Stage 1.0 (TID 4) 15/07/12 21:39:25 INFO shuffleblockfetcheriterator:getting 1 non-empty blocks out of 3 blocks 15/07/12 21:39:25 INFO shuffleblockfetcheriterator:getting 1 non-empty blocks out of 3 blocks 15/07/12 21:39:25 INFO shuffleblockfetcheriterator:started 0 remote fetches in 7 ms 15/07/12 21:39:25 INFO shuffleblockfetcheriterator:started 0 remote fetches in 8 ms 15/07/12 21:39:25 INFO executor:finished Task 1.0 in Stage 1.0 (TID 4). 1031 bytes result sent to driver 15/07/12 21:39:25 INFO executor:finished task 0.0 in Stage 1.0 (TID 3). 1029 bytes result sent to driver 15/07/12 21:39:25 INFO tasksetmanager:starting Task 2.0 in Stage 1.0 (TID 5, localhost, process_local, 1165 bytes) 15/07/12 21:39:25 INFO executor:running Task 2.0 in Stage 1.0 (TID 5) 15/07/12 21:39:25 INFO tasksetmanager:finished task 0.0 in Stage 1.0 (TID 3) with MS on localhost (1/3) 15/07/12 21:39:25 INFO shuffleblockfetcheriterator:getting 0 non-empty blocks out of 3 blocks 15/07/12 21:39:25 INFO shuffleblockfetcheriterator:started 0 remote fetches in 0 ms 15/07/12 21:39:25 INFO tasksetmanager:finished Task 1.0 in Stage 1.0 (TID 4) in + MS on localhost (2/3) 15/07/12 21:39:25 INFO executor:finished Task 2.0 in Stage 1.0 (TID 5). 882 bytes result sent to driver 15/07/12 21:39:25 INFO tasksetmanager:finished Task 2.0 in Stage 1.0 (TID 5) in 6 ms on localhost (3/3) 15/07/12 21:39:25 INFO taskschedulerimpl:removed TaskSet 1.0, whose tasks all completed, from pool 15/07/12 21:39:25 INFO dagscheduler:resultstage 1 (collect at <console>:26) finished in 0.043 s 15/07/12 21:39:25 INFO dagscheduler:job 0 finished:collect at <console>:26, took 0.352074 s res1:array[(String, Int)] = Array ((? 8,1), (pbuf,1)) Scala> |