summary
Today we only talk about the code to read the method, do not carry out the complex technology in Spark. Surely we all know that Spark was developed using scala, but because of the large number of syntactic sugars in scala, code often follows and discovers clues. Second, Spark interacts with Akka based on how to know who Recipient it?
new Throwable (). printStackTrace
We have to rely on the log often in the code following, and for each sentence output in the log, we want to know who is calling them. However, they suffer from the spark did not know much, or are not familiar with scala, a short time can not figure out that there is no easier way?
My approach is to add the following sentence where the log appears
new Throwable (). printStackTrace ()
Now give a practical example to illustrate the problem.
For example, we start spark-shell, enter a very simple sc.textFile ("README.md"), will output the following log
14/07/05 19:53:27 INFO MemoryStore: ensureFreeSpace (32816) called with curMem = 0, maxMem = 308910489 14/07/05 19:53:27 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 32.0 KB, free 294.6 MB) 14/07/05 19:53:27 DEBUG BlockManager: Put block broadcast_0 local took 78 ms 14/07/05 19:53:27 DEBUG BlockManager: Putting block broadcast_0 without replication took 79 ms res0: org .apache.spark.rdd.RDD [String] = README.md MappedRDD [1] at textFile at: 13
Then I would like to know is the second sentence where the tryToPut function is called by what to do?
The solution is to open MemoryStore.scala, find the following statement
logInfo ("Block% s stored as% s in memory (estimated size% s, free% s)". format (blockId, valuesOrBytes, Utils.bytesToString (size), Utils.bytesToString (freeMemory)))
In this sentence, add the following statement
new Throwable (). printStackTrace ()
Then, re-source compilation
sbt / sbt assembly
Open spark-shell again and run sc.textFile ("README.md") to get the following output, from which you can see clearly who the tryToPut caller is
14/07/05 19:53:27 INFO MemoryStore: ensureFreeSpace (32816) called with curMem = 0, maxMem = 308910489 14/07/05 19:53:27 WARN MemoryStore: just show the calltrace by entering some modified code java. lang.Throwable at org.apache.spark.storage.MemoryStore.tryToPut (MemoryStore.scala: 182) at org.apache.spark.storage.MemoryStore.putValues (MemoryStore.scala: 76) at org.apache.spark.storage. MemoryStore.putValues (MemoryStore.scala: 92) at org.apache.spark.storage.BlockManager.doPut (BlockManager.scala: 699) at org.apache.spark.storage.BlockManager.put (BlockManager.scala: 570) at org .apache.spark.storage.BlockManager.putSingle (BlockManager.scala: 821) at org.apache.spark.broadcast.HttpBroadcast. (HttpBroadcast.scala: 52) at org.apache.spark.broadcast.HttpBroadcastFactory.newBroadcast (HttpBroadcastFactory. scala: 35) at org.apache.spark.broadcast.HttpBroadcastFactory.newBroadcast (HttpBroadcastFactory.scala: 29) at org.apache.spark.broadcast.BroadcastManager.newBroadcast (BroadcastM anager.scala: 62) at org.apache.spark.SparkContext.broadcast (SparkContext.scala: 787) at org.apache.spark.SparkContext.hadoopFile (SparkContext.scala: 556) at org.apache.spark.SparkContext.textFile (SparkContext.scala: 468) at $ line5. $ Read $$ iwC $$ iwC $$ iwC $$ iwC. (: 13) at $ line5. $ Read $$ iwC $$ iwC $$ iwC. (: 18) $ read $$ iwC $$ iwC. (: 20) at $ line5. $ read $$ iwC. (: 22) at $ line5. $ read. (: 24) at $ line5. $ read $. $ Eval $. () At $ line5. $ Read () at $ line5. $ Eval $. (: 7) at $ line5. $ Eval $. reflect.NativeMethodAccessorImpl.invoke0 (Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke ( Method.java: 483) at org.apache.spark.repl.SparkIMain $ ReadEvalPrint.call (SparkIMain.scala: 788) at org.apache.spark.repl.SparkIMain $ Request.loadAndRun (SparkIMain.scala: 1056) at org .apach e.spark.repl.SparkIMain.loadAndRunReq $ 1 (SparkIMain.scala: 614) at org.apache.spark.repl.SparkIMain.interpret (SparkIMain.scala: 645) at org.apache.spark.repl.SparkIMain.interpret (SparkIMain .scala: 609) at org.apache.spark.repl.SparkILoop.reallyInterpret $ 1 (SparkILoop.scala: 796) at org.apache.spark.repl.SparkILoop.interpretStartingWith (SparkILoop.scala: 841) at org.apache.spark .repl.SparkILoop.command (SparkILoop.scala: 753) at org.apache.spark.repl.SparkILoop.processLine $ 1 (SparkILoop.scala: 601) at org.apache.spark.repl.SparkILoop.innerLoop $ 1 (SparkILoop.scala : 608) at org.apache.spark.repl.SparkILoop.loop (SparkILoop.scala: 611) at org.apache.spark.repl.SparkILoop $$ anonfun $ process $ 1.apply $ mcZ $ sp (SparkILoop.scala: 936 ) at org.apache.spark.repl.SparkILoop $$ anonfun $ process $ 1.apply (SparkILoop.scala: 884) at org.apache.spark.repl.SparkILoop $$ anonfun $ process $ 1.apply (SparkILoop.scala: 884 ) at scala.tools.nsc.util.ScalaClassLoader $ .savingContextLoader (ScalaClassLoader.scal a: 135) at org.apache.spark.repl.SparkILoop.process (SparkILoop.scala: 884) at org.apache.spark.repl.SparkILoop.process (SparkILoop.scala: 982) at org.apache.spark.repl .Main $ .main (Main.scala: 31) at org.apache.spark.repl.Main.main (Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke ( NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke (Method.java:483) at org.apache.spark.deploy.SparkSubmit $. launch (SparkSubmit.scala: 303) at org.apache.spark.deploy.SparkSubmit $ .main (SparkSubmit.scala: 55) at org.apache.spark.deploy.SparkSubmit.main (SparkSubmit.scala) 14/07/05 19:53:27 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 32.0 KB, free 294.6 MB) 14/07/05 19:53:27 DEBUG BlockManager: Put block broadcast_0 local took 78 ms 14/07/05 19:53:27 DEBUG BlockManager: Put ting block broadcast_0 without replication took 79 ms res0: org.apache.spark.rdd.RDD [String] = README.md MappedRDD [1] at textFile at: 13 git Synchronization
After the code has been modified, if you do not want to submit the code, then how to synchronize the latest content locally?
git reset --hard git pull origin master Akka Message Tracking
Who is the recipient of the tracking message, relatively easy, as long as the use of grep on it, of course, the premise is to have a little understanding of the actor model.
Or give an example, we know that CoarseGrainedSchedulerBackend will send LaunchTask message, then who is the receiver? Just need to execute the following script can be.
grep LaunchTask -r core / src / main
From the following output, it is clear that CoarseGrainedExecutorBackend is the receiver of LaunchTask. After receiving the function's business processing, you only have to look at the receiver's receive function.
core / src / main / scala / org / apache / spark / executor / CoarseGrainedExecutorBackend.scala: case LaunchTask (data) => core / src / main / scala / org / apache / spark / executor / CoarseGrainedExecutorBackend.scala: logError (" Received LaunchTask command but executor was null ") core / src / main / scala / org / apache / spark / scheduler / cluster / CoarseGrainedClusterMessage.scala: case class LaunchTask (data: SerializableBuffer) extends CoarseGrainedClusterMessage core / src / main / scala / org /apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala: executorActor (task.executorId)! LaunchTask (new SerializableBuffer (serializedTask)) Summary
Today's content is relatively simple, there is no technical content, to make a description of their own, lest a long time, do not remember.