Small problem may have big problem, hope the big God help answer. The Setmaster ("local") of a single-threaded and multi-threaded problem in Spark native run mode can be run, but set to Setmaster ("local[3]") or Setmaste__spark

Source: Internet
Author: User

Small problems may be a big problem, I hope the great God to help answer

Ask the great God to help solve the same code:

Setmaster ("local") can be run, but the error is set to Setmaster ("local[3]") or Setmaster ("local[*]"). local running mode in Spark

There are 3 local operating modes in spark, as follows

(1) Local mode: single-threaded operation;
(2) local[k] Mode: Local K-thread operation;
(3) local[*] mode: Run with as many threads as possible locally.
Second, read the following data


three, involved in the array of Cross-border code as follows

Val pieces = Line.replaceall ("\" "," ")
      val carid = Pieces.split (', ') (0)
      val lngstr = Pieces.split (', ') (1)
      VA L LATSTR = pieces.split (', ') (2)

      var lng=bigdecimal (0)
      var lat=bigdecimal (0)

      try {
        LNG = Myround ( BigDecimal (LNGSTR), 3)
        lat = Myround (BigDecimal (LATSTR), 3)
      }catch {case
        e:numberformatexception = > println ("... help ...) ..." +lngstr+ "..." +latstr ")
      }
If the read file data is dirty, the original data read does not have the value of the first 3 columns, and then causes the array to go out of bounds after the comma, causing the array to cross the bounds, so why master is allowed when it is local.

Iv. problems encountered in local, Local[k], local[*]

Master initialization in Spark is as follows:

Val sparkconf = new sparkconf (). Setmaster ("local"). Setappname ("Count Test")

The problem is described as follows:

Code other departments do not change, just change the Setmaster (), there are different results:

(1) if Setmaster () is written setmaster ("local"), the code runs correctly;

(2) if written setmaster ("local[3]") the error;

(3) If written setmaster ("local[*]") then the error, and (2) the same error content;

and Setmaster ("local[3]") Error content and Setmaster ("local[*]") error content, the error content is as follows:

17/07/31 13:39:01 Info hadooprdd:input split:file:/e:/data/gps201608.csv:0+7683460 17/07/31 13:39:02 Info Blockmanagerinfo:removed Broadcast_1_piece0 on localhost:50541 in memory (size:1848.0 B, free:133.6 MB) 17/07/31 13:39: The ERROR executor:exception in task 0.0 in Stage 1.0 (TID 2) java.lang.ArrayIndexOutOfBoundsException at Java.lang.Syste
	M.arraycopy (Native method) at Scala.collection.mutable.resizablearray$class.ensuresize (resizablearray.scala:100) At Scala.collection.mutable.ArrayBuffer.ensureSize (arraybuffer.scala:47) at Scala.collection.mutable.ArrayBuffer. $plus $eq (arraybuffer.scala:83) at count$.count$ $mystatistics $ (count.scala:76) at count$ $anonfun $2.apply ( count.scala:87) at count$ $anonfun $2.apply (count.scala:87) at scala.collection.iterator$ $anon $11.next ( iterator.scala:328) at org.apache.spark.storage.MemoryStore.unrollSafely (memorystore.scala:277) at Org.apache.spark.CacheManager.putInBlockManager (cachemanager.scala:171) at org.apache.spark.CacheManager.geTorcompute (cachemanager.scala:78) at Org.apache.spark.rdd.RDD.iterator (rdd.scala:242) at Org.apache.spark.rdd.MapPartitionsRDD.compute (mappartitionsrdd.scala:35) at Org.apache.spark.rdd.RDD.computeOrReadCheckpoint (rdd.scala:277) at Org.apache.spark.rdd.RDD.iterator (Rdd.scala : 244) at Org.apache.spark.rdd.MapPartitionsRDD.compute (mappartitionsrdd.scala:35) at Org.apache.spark.rdd.RDD.computeOrReadCheckpoint (rdd.scala:277) at Org.apache.spark.rdd.RDD.iterator (Rdd.scala : 244) at Org.apache.spark.scheduler.ResultTask.runTask (resulttask.scala:63) at Org.apache.spark.scheduler.Task.run (task.scala:70) at Org.apache.spark.executor.executor$taskrunner.run (executor.scala:213) at Java.util.concurrent.ThreadPoolExecutor.runWorker (threadpoolexecutor.java:1149) at Java.util.concurrent.threadpoolexecutor$worker.run (threadpoolexecutor.java:624) at Java.lang.Thread.run ( thread.java:748) 17/07/31 13:39:05 WARN tasksetmanager:lost task 0.0 in Stage 1.0 (TID 2, localhost): Java.lang.ArrayInde XoUtofboundsexception at Java.lang.System.arraycopy (Native) at scala.collection.mutable.resizablearray$ Class.ensuresize (resizablearray.scala:100) at Scala.collection.mutable.ArrayBuffer.ensureSize (Arraybuffer.scala : at Scala.collection.mutable.ArrayBuffer. $plus $eq (arraybuffer.scala:83) at count$.count$ $mystatistics $ ( count.scala:76) at count$ $anonfun $2.apply (count.scala:87) at count$ $anonfun $2.apply (count.scala:87) at scala.collection.iterator$ $anon $11.next (iterator.scala:328) at org.apache.spark.storage.MemoryStore.unrollSafely (memorystore.scala:277) at Org.apache.spark.CacheManager.putInBlockManager (cachemanager.scala:171) at Org.apache.spark.CacheManager.getOrCompute (cachemanager.scala:78) at Org.apache.spark.rdd.RDD.iterator (Rdd.scala : 242) at Org.apache.spark.rdd.MapPartitionsRDD.compute (mappartitionsrdd.scala:35) at Org.apache.spark.rdd.RDD.computeOrReadCheckpoint (rdd.scala:277) at Org.apache.spark.rdd.RDD.iterator (Rdd.scala : 244) at Org.apache.spark.rdd.MapPartitionsrdd.compute (mappartitionsrdd.scala:35) at Org.apache.spark.rdd.RDD.computeOrReadCheckpoint (Rdd.scala : 277) at Org.apache.spark.rdd.RDD.iterator (rdd.scala:244) at Org.apache.spark.scheduler.ResultTask.runTask ( resulttask.scala:63) at Org.apache.spark.scheduler.Task.run (task.scala:70) at org.apache.spark.executor.executor$ Taskrunner.run (executor.scala:213) at Java.util.concurrent.ThreadPoolExecutor.runWorker (Threadpoolexecutor.java : 1149) at Java.util.concurrent.threadpoolexecutor$worker.run (threadpoolexecutor.java:624) at Java.lang.Thread.run ( thread.java:748) 17/07/31 13:39:05 ERROR tasksetmanager:task 0 in stage 1.0 failed 1 times; Aborting job 17/07/31 13:39:05 info taskschedulerimpl:cancelling Stage 1 17/07/31 13:39:05 INFO taskschedulerimpl:stage 1 was cancelled 17/07/31 13:39:05 INFO Executor:executor are trying to kill task 1.0 in Stage 1.0 (TID 3) 17/07/31 13:39:0 5 INFO dagscheduler:resultstage 1 (saveastextfile at count.scala:90) failed in 3.252 s 17/07/31 13: 39:05 Info Executor:executor killed Task 1.0 in Stage 1.0 (TID 3) 17/07/31 13:39:05 info dagscheduler:job 1 Failed:sav  Eastextfile at Count.scala:90, took 3.278665 s Exception in thread "main" Org.apache.spark.SparkException:Job aborted  To stage Failure:task 0 in stage 1.0 failed 1 times, most recent failure:lost Task 0.0 in Stage 1.0 (TID 2, localhost): Java.lang.ArrayIndexOutOfBoundsException at Java.lang.System.arraycopy (Native method) at Scala.collection.mutable.resizablearray$class.ensuresize (resizablearray.scala:100) at Scala.collection.mutable.ArrayBuffer.ensureSize (arraybuffer.scala:47) at scala.collection.mutable.arraybuffer.$ Plus$eq (arraybuffer.scala:83) at count$.count$ $mystatistics $ (count.scala:76) at count$ $anonfun $2.apply ( count.scala:87) at count$ $anonfun $2.apply (count.scala:87) at scala.collection.iterator$ $anon $11.next ( iterator.scala:328) at org.apache.spark.storage.MemoryStore.unrollSafely (memorystore.scala:277) at Org.apache.spark.CacheManager.putInblockmanager (cachemanager.scala:171) at Org.apache.spark.CacheManager.getOrCompute (cachemanager.scala:78) at Org.apache.spark.rdd.RDD.iterator (rdd.scala:242) at Org.apache.spark.rdd.MapPartitionsRDD.compute ( mappartitionsrdd.scala:35) at Org.apache.spark.rdd.RDD.computeOrReadCheckpoint (rdd.scala:277) at Org.apache.spark.rdd.RDD.iterator (rdd.scala:244) at Org.apache.spark.rdd.MapPartitionsRDD.compute ( mappartitionsrdd.scala:35) at Org.apache.spark.rdd.RDD.computeOrReadCheckpoint (rdd.scala:277) at Org.apache.spark.rdd.RDD.iterator (rdd.scala:244) at Org.apache.spark.scheduler.ResultTask.runTask ( resulttask.scala:63) at Org.apache.spark.scheduler.Task.run (task.scala:70) at org.apache.spark.executor.executor$ Taskrunner.run (executor.scala:213) at Java.util.concurrent.ThreadPoolExecutor.runWorker (Threadpoolexecutor.java : 1149) at Java.util.concurrent.threadpoolexecutor$worker.run (threadpoolexecutor.java:624) at Java.lang.Thread.run ( thread.java:748) Driver Stacktrace:at Org.apache.spark.scheduler.dagscheduler.org$apache$spark$scheduler$dagscheduler$ $failJobAndIndependentStages ( dagscheduler.scala:1273) at org.apache.spark.scheduler.dagscheduler$ $anonfun $abortstage$1.apply ( dagscheduler.scala:1264) at org.apache.spark.scheduler.dagscheduler$ $anonfun $abortstage$1.apply ( dagscheduler.scala:1263) at Scala.collection.mutable.resizablearray$class.foreach (ResizableArray.scala:59) at Scala.collection.mutable.ArrayBuffer.foreach (arraybuffer.scala:47) at Org.apache.spark.scheduler.DAGScheduler.abortStage (dagscheduler.scala:1263) at org.apache.spark.scheduler.dagscheduler$ $anonfun $handletasksetfailed$1.apply (dagscheduler.scala:730) at org.apache.spark.scheduler.dagscheduler$ $anonfun $handletasksetfailed$1.apply (dagscheduler.scala:730) at Scala. Option.foreach (option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed ( dagscheduler.scala:730) at Org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive (Dagscheduler.scala : 1457) at Org.apAche.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive (dagscheduler.scala:1418) at org.apache.spark.util.eventloop$ $anon $1.run (eventloop.scala:48) 17/07/31 13:39:05 WARN Task 1.0 In Stage 1.0 (TID 3, localhost): taskkilled (killed intentionally) 17/07/31 13:39:05 INFO taskschedulerimpl:removed TaskS  Et 1.0, whose tasks have all completed, from pool 17/07/31 13:39:05 INFO sparkcontext:invoking Stop () from Shutdown hook
The error screen screenshot is as follows:



v. Strange questions

After a while, I did not move, found that the master set to Local[3] can run, generate 2 results files, but are empty, as shown in the following figure:









Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.