Small problems may be a big problem, I hope the great God to help answer
Ask the great God to help solve the same code:
Setmaster ("local") can be run, but the error is set to Setmaster ("local[3]") or Setmaster ("local[*]"). local running mode in Spark
There are 3 local operating modes in spark, as follows
(1) Local mode: single-threaded operation;
(2) local[k] Mode: Local K-thread operation;
(3) local[*] mode: Run with as many threads as possible locally.
Second, read the following data
three, involved in the array of Cross-border code as follows
Val pieces = Line.replaceall ("\" "," ")
val carid = Pieces.split (', ') (0)
val lngstr = Pieces.split (', ') (1)
VA L LATSTR = pieces.split (', ') (2)
var lng=bigdecimal (0)
var lat=bigdecimal (0)
try {
LNG = Myround ( BigDecimal (LNGSTR), 3)
lat = Myround (BigDecimal (LATSTR), 3)
}catch {case
e:numberformatexception = > println ("... help ...) ..." +lngstr+ "..." +latstr ")
}
If the read file data is dirty, the original data read does not have the value of the first 3 columns, and then causes the array to go out of bounds after the comma, causing the array to cross the bounds, so why master is allowed when it is local.
Iv. problems encountered in local, Local[k], local[*]
Master initialization in Spark is as follows:
Val sparkconf = new sparkconf (). Setmaster ("local"). Setappname ("Count Test")
The problem is described as follows:
Code other departments do not change, just change the Setmaster (), there are different results:
(1) if Setmaster () is written setmaster ("local"), the code runs correctly;
(2) if written setmaster ("local[3]") the error;
(3) If written setmaster ("local[*]") then the error, and (2) the same error content;
and Setmaster ("local[3]") Error content and Setmaster ("local[*]") error content, the error content is as follows:
17/07/31 13:39:01 Info hadooprdd:input split:file:/e:/data/gps201608.csv:0+7683460 17/07/31 13:39:02 Info Blockmanagerinfo:removed Broadcast_1_piece0 on localhost:50541 in memory (size:1848.0 B, free:133.6 MB) 17/07/31 13:39: The ERROR executor:exception in task 0.0 in Stage 1.0 (TID 2) java.lang.ArrayIndexOutOfBoundsException at Java.lang.Syste
M.arraycopy (Native method) at Scala.collection.mutable.resizablearray$class.ensuresize (resizablearray.scala:100) At Scala.collection.mutable.ArrayBuffer.ensureSize (arraybuffer.scala:47) at Scala.collection.mutable.ArrayBuffer. $plus $eq (arraybuffer.scala:83) at count$.count$ $mystatistics $ (count.scala:76) at count$ $anonfun $2.apply ( count.scala:87) at count$ $anonfun $2.apply (count.scala:87) at scala.collection.iterator$ $anon $11.next ( iterator.scala:328) at org.apache.spark.storage.MemoryStore.unrollSafely (memorystore.scala:277) at Org.apache.spark.CacheManager.putInBlockManager (cachemanager.scala:171) at org.apache.spark.CacheManager.geTorcompute (cachemanager.scala:78) at Org.apache.spark.rdd.RDD.iterator (rdd.scala:242) at Org.apache.spark.rdd.MapPartitionsRDD.compute (mappartitionsrdd.scala:35) at Org.apache.spark.rdd.RDD.computeOrReadCheckpoint (rdd.scala:277) at Org.apache.spark.rdd.RDD.iterator (Rdd.scala : 244) at Org.apache.spark.rdd.MapPartitionsRDD.compute (mappartitionsrdd.scala:35) at Org.apache.spark.rdd.RDD.computeOrReadCheckpoint (rdd.scala:277) at Org.apache.spark.rdd.RDD.iterator (Rdd.scala : 244) at Org.apache.spark.scheduler.ResultTask.runTask (resulttask.scala:63) at Org.apache.spark.scheduler.Task.run (task.scala:70) at Org.apache.spark.executor.executor$taskrunner.run (executor.scala:213) at Java.util.concurrent.ThreadPoolExecutor.runWorker (threadpoolexecutor.java:1149) at Java.util.concurrent.threadpoolexecutor$worker.run (threadpoolexecutor.java:624) at Java.lang.Thread.run ( thread.java:748) 17/07/31 13:39:05 WARN tasksetmanager:lost task 0.0 in Stage 1.0 (TID 2, localhost): Java.lang.ArrayInde XoUtofboundsexception at Java.lang.System.arraycopy (Native) at scala.collection.mutable.resizablearray$ Class.ensuresize (resizablearray.scala:100) at Scala.collection.mutable.ArrayBuffer.ensureSize (Arraybuffer.scala : at Scala.collection.mutable.ArrayBuffer. $plus $eq (arraybuffer.scala:83) at count$.count$ $mystatistics $ ( count.scala:76) at count$ $anonfun $2.apply (count.scala:87) at count$ $anonfun $2.apply (count.scala:87) at scala.collection.iterator$ $anon $11.next (iterator.scala:328) at org.apache.spark.storage.MemoryStore.unrollSafely (memorystore.scala:277) at Org.apache.spark.CacheManager.putInBlockManager (cachemanager.scala:171) at Org.apache.spark.CacheManager.getOrCompute (cachemanager.scala:78) at Org.apache.spark.rdd.RDD.iterator (Rdd.scala : 242) at Org.apache.spark.rdd.MapPartitionsRDD.compute (mappartitionsrdd.scala:35) at Org.apache.spark.rdd.RDD.computeOrReadCheckpoint (rdd.scala:277) at Org.apache.spark.rdd.RDD.iterator (Rdd.scala : 244) at Org.apache.spark.rdd.MapPartitionsrdd.compute (mappartitionsrdd.scala:35) at Org.apache.spark.rdd.RDD.computeOrReadCheckpoint (Rdd.scala : 277) at Org.apache.spark.rdd.RDD.iterator (rdd.scala:244) at Org.apache.spark.scheduler.ResultTask.runTask ( resulttask.scala:63) at Org.apache.spark.scheduler.Task.run (task.scala:70) at org.apache.spark.executor.executor$ Taskrunner.run (executor.scala:213) at Java.util.concurrent.ThreadPoolExecutor.runWorker (Threadpoolexecutor.java : 1149) at Java.util.concurrent.threadpoolexecutor$worker.run (threadpoolexecutor.java:624) at Java.lang.Thread.run ( thread.java:748) 17/07/31 13:39:05 ERROR tasksetmanager:task 0 in stage 1.0 failed 1 times; Aborting job 17/07/31 13:39:05 info taskschedulerimpl:cancelling Stage 1 17/07/31 13:39:05 INFO taskschedulerimpl:stage 1 was cancelled 17/07/31 13:39:05 INFO Executor:executor are trying to kill task 1.0 in Stage 1.0 (TID 3) 17/07/31 13:39:0 5 INFO dagscheduler:resultstage 1 (saveastextfile at count.scala:90) failed in 3.252 s 17/07/31 13: 39:05 Info Executor:executor killed Task 1.0 in Stage 1.0 (TID 3) 17/07/31 13:39:05 info dagscheduler:job 1 Failed:sav Eastextfile at Count.scala:90, took 3.278665 s Exception in thread "main" Org.apache.spark.SparkException:Job aborted To stage Failure:task 0 in stage 1.0 failed 1 times, most recent failure:lost Task 0.0 in Stage 1.0 (TID 2, localhost): Java.lang.ArrayIndexOutOfBoundsException at Java.lang.System.arraycopy (Native method) at Scala.collection.mutable.resizablearray$class.ensuresize (resizablearray.scala:100) at Scala.collection.mutable.ArrayBuffer.ensureSize (arraybuffer.scala:47) at scala.collection.mutable.arraybuffer.$ Plus$eq (arraybuffer.scala:83) at count$.count$ $mystatistics $ (count.scala:76) at count$ $anonfun $2.apply ( count.scala:87) at count$ $anonfun $2.apply (count.scala:87) at scala.collection.iterator$ $anon $11.next ( iterator.scala:328) at org.apache.spark.storage.MemoryStore.unrollSafely (memorystore.scala:277) at Org.apache.spark.CacheManager.putInblockmanager (cachemanager.scala:171) at Org.apache.spark.CacheManager.getOrCompute (cachemanager.scala:78) at Org.apache.spark.rdd.RDD.iterator (rdd.scala:242) at Org.apache.spark.rdd.MapPartitionsRDD.compute ( mappartitionsrdd.scala:35) at Org.apache.spark.rdd.RDD.computeOrReadCheckpoint (rdd.scala:277) at Org.apache.spark.rdd.RDD.iterator (rdd.scala:244) at Org.apache.spark.rdd.MapPartitionsRDD.compute ( mappartitionsrdd.scala:35) at Org.apache.spark.rdd.RDD.computeOrReadCheckpoint (rdd.scala:277) at Org.apache.spark.rdd.RDD.iterator (rdd.scala:244) at Org.apache.spark.scheduler.ResultTask.runTask ( resulttask.scala:63) at Org.apache.spark.scheduler.Task.run (task.scala:70) at org.apache.spark.executor.executor$ Taskrunner.run (executor.scala:213) at Java.util.concurrent.ThreadPoolExecutor.runWorker (Threadpoolexecutor.java : 1149) at Java.util.concurrent.threadpoolexecutor$worker.run (threadpoolexecutor.java:624) at Java.lang.Thread.run ( thread.java:748) Driver Stacktrace:at Org.apache.spark.scheduler.dagscheduler.org$apache$spark$scheduler$dagscheduler$ $failJobAndIndependentStages ( dagscheduler.scala:1273) at org.apache.spark.scheduler.dagscheduler$ $anonfun $abortstage$1.apply ( dagscheduler.scala:1264) at org.apache.spark.scheduler.dagscheduler$ $anonfun $abortstage$1.apply ( dagscheduler.scala:1263) at Scala.collection.mutable.resizablearray$class.foreach (ResizableArray.scala:59) at Scala.collection.mutable.ArrayBuffer.foreach (arraybuffer.scala:47) at Org.apache.spark.scheduler.DAGScheduler.abortStage (dagscheduler.scala:1263) at org.apache.spark.scheduler.dagscheduler$ $anonfun $handletasksetfailed$1.apply (dagscheduler.scala:730) at org.apache.spark.scheduler.dagscheduler$ $anonfun $handletasksetfailed$1.apply (dagscheduler.scala:730) at Scala. Option.foreach (option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed ( dagscheduler.scala:730) at Org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive (Dagscheduler.scala : 1457) at Org.apAche.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive (dagscheduler.scala:1418) at org.apache.spark.util.eventloop$ $anon $1.run (eventloop.scala:48) 17/07/31 13:39:05 WARN Task 1.0 In Stage 1.0 (TID 3, localhost): taskkilled (killed intentionally) 17/07/31 13:39:05 INFO taskschedulerimpl:removed TaskS Et 1.0, whose tasks have all completed, from pool 17/07/31 13:39:05 INFO sparkcontext:invoking Stop () from Shutdown hook
The error screen screenshot is as follows:
v. Strange questions
After a while, I did not move, found that the master set to Local[3] can run, generate 2 results files, but are empty, as shown in the following figure: