Spark system code to read

Source: Internet
Author: User
Keywords Spark code follows
Tags apache apply based block broadcast code code follows content

summary

Today we only talk about the code to read the method, do not carry out the complex technology in Spark. Surely we all know that Spark was developed using scala, but because of the large number of syntactic sugars in scala, code often follows and discovers clues. Second, Spark interacts with Akka based on how to know who Recipient it?

new Throwable (). printStackTrace

We have to rely on the log often in the code following, and for each sentence output in the log, we want to know who is calling them. However, they suffer from the spark did not know much, or are not familiar with scala, a short time can not figure out that there is no easier way?

My approach is to add the following sentence where the log appears

new Throwable (). printStackTrace ()

Now give a practical example to illustrate the problem.

For example, we start spark-shell, enter a very simple sc.textFile ("README.md"), will output the following log

14/07/05 19:53:27 INFO MemoryStore: ensureFreeSpace (32816) called with curMem = 0, maxMem = 308910489 14/07/05 19:53:27 INFO MemoryStore: Block broadcast_0 stored as values ​​in memory (estimated size 32.0 KB, free 294.6 MB) 14/07/05 19:53:27 DEBUG BlockManager: Put block broadcast_0 local took 78 ms 14/07/05 19:53:27 DEBUG BlockManager: Putting block broadcast_0 without replication took 79 ms res0: org .apache.spark.rdd.RDD [String] = README.md MappedRDD [1] at textFile at: 13

Then I would like to know is the second sentence where the tryToPut function is called by what to do?

The solution is to open MemoryStore.scala, find the following statement

logInfo ("Block% s stored as% s in memory (estimated size% s, free% s)". format (blockId, valuesOrBytes, Utils.bytesToString (size), Utils.bytesToString (freeMemory)))

In this sentence, add the following statement

new Throwable (). printStackTrace ()

Then, re-source compilation

sbt / sbt assembly

Open spark-shell again and run sc.textFile ("README.md") to get the following output, from which you can see clearly who the tryToPut caller is

14/07/05 19:53:27 INFO MemoryStore: ensureFreeSpace (32816) called with curMem = 0, maxMem = 308910489 14/07/05 19:53:27 WARN MemoryStore: just show the calltrace by entering some modified code java. lang.Throwable at org.apache.spark.storage.MemoryStore.tryToPut (MemoryStore.scala: 182) at org.apache.spark.storage.MemoryStore.putValues ​​(MemoryStore.scala: 76) at org.apache.spark.storage. MemoryStore.putValues ​​(MemoryStore.scala: 92) at org.apache.spark.storage.BlockManager.doPut (BlockManager.scala: 699) at org.apache.spark.storage.BlockManager.put (BlockManager.scala: 570) at org .apache.spark.storage.BlockManager.putSingle (BlockManager.scala: 821) at org.apache.spark.broadcast.HttpBroadcast. (HttpBroadcast.scala: 52) at org.apache.spark.broadcast.HttpBroadcastFactory.newBroadcast (HttpBroadcastFactory. scala: 35) at org.apache.spark.broadcast.HttpBroadcastFactory.newBroadcast (HttpBroadcastFactory.scala: 29) at org.apache.spark.broadcast.BroadcastManager.newBroadcast (BroadcastM anager.scala: 62) at org.apache.spark.SparkContext.broadcast (SparkContext.scala: 787) at org.apache.spark.SparkContext.hadoopFile (SparkContext.scala: 556) at org.apache.spark.SparkContext.textFile (SparkContext.scala: 468) at $ line5. $ Read $$ iwC $$ iwC $$ iwC $$ iwC. (: 13) at $ line5. $ Read $$ iwC $$ iwC $$ iwC. (: 18) $ read $$ iwC $$ iwC. (: 20) at $ line5. $ read $$ iwC. (: 22) at $ line5. $ read. (: 24) at $ line5. $ read $. $ Eval $. () At $ line5. $ Read () at $ line5. $ Eval $. (: 7) at $ line5. $ Eval $. reflect.NativeMethodAccessorImpl.invoke0 (Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke ( Method.java: 483) at org.apache.spark.repl.SparkIMain $ ReadEvalPrint.call (SparkIMain.scala: 788) at org.apache.spark.repl.SparkIMain $ Request.loadAndRun (SparkIMain.scala: 1056) at org .apach e.spark.repl.SparkIMain.loadAndRunReq $ 1 (SparkIMain.scala: 614) at org.apache.spark.repl.SparkIMain.interpret (SparkIMain.scala: 645) at org.apache.spark.repl.SparkIMain.interpret (SparkIMain .scala: 609) at org.apache.spark.repl.SparkILoop.reallyInterpret $ 1 (SparkILoop.scala: 796) at org.apache.spark.repl.SparkILoop.interpretStartingWith (SparkILoop.scala: 841) at org.apache.spark .repl.SparkILoop.command (SparkILoop.scala: 753) at org.apache.spark.repl.SparkILoop.processLine $ 1 (SparkILoop.scala: 601) at org.apache.spark.repl.SparkILoop.innerLoop $ 1 (SparkILoop.scala : 608) at org.apache.spark.repl.SparkILoop.loop (SparkILoop.scala: 611) at org.apache.spark.repl.SparkILoop $$ anonfun $ process $ 1.apply $ mcZ $ sp (SparkILoop.scala: 936 ) at org.apache.spark.repl.SparkILoop $$ anonfun $ process $ 1.apply (SparkILoop.scala: 884) at org.apache.spark.repl.SparkILoop $$ anonfun $ process $ 1.apply (SparkILoop.scala: 884 ) at scala.tools.nsc.util.ScalaClassLoader $ .savingContextLoader (ScalaClassLoader.scal a: 135) at org.apache.spark.repl.SparkILoop.process (SparkILoop.scala: 884) at org.apache.spark.repl.SparkILoop.process (SparkILoop.scala: 982) at org.apache.spark.repl .Main $ .main (Main.scala: 31) at org.apache.spark.repl.Main.main (Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke ( NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke (Method.java:483) at org.apache.spark.deploy.SparkSubmit $. launch (SparkSubmit.scala: 303) at org.apache.spark.deploy.SparkSubmit $ .main (SparkSubmit.scala: 55) at org.apache.spark.deploy.SparkSubmit.main (SparkSubmit.scala) 14/07/05 19:53:27 INFO MemoryStore: Block broadcast_0 stored as values ​​in memory (estimated size 32.0 KB, free 294.6 MB) 14/07/05 19:53:27 DEBUG BlockManager: Put block broadcast_0 local took 78 ms 14/07/05 19:53:27 DEBUG BlockManager: Put ting block broadcast_0 without replication took 79 ms res0: org.apache.spark.rdd.RDD [String] = README.md MappedRDD [1] at textFile at: 13 git Synchronization

After the code has been modified, if you do not want to submit the code, then how to synchronize the latest content locally?

git reset --hard git pull origin master Akka Message Tracking

Who is the recipient of the tracking message, relatively easy, as long as the use of grep on it, of course, the premise is to have a little understanding of the actor model.

Or give an example, we know that CoarseGrainedSchedulerBackend will send LaunchTask message, then who is the receiver? Just need to execute the following script can be.

grep LaunchTask -r core / src / main

From the following output, it is clear that CoarseGrainedExecutorBackend is the receiver of LaunchTask. After receiving the function's business processing, you only have to look at the receiver's receive function.

core / src / main / scala / org / apache / spark / executor / CoarseGrainedExecutorBackend.scala: case LaunchTask (data) => core / src / main / scala / org / apache / spark / executor / CoarseGrainedExecutorBackend.scala: logError (" Received LaunchTask command but executor was null ") core / src / main / scala / org / apache / spark / scheduler / cluster / CoarseGrainedClusterMessage.scala: case class LaunchTask (data: SerializableBuffer) extends CoarseGrainedClusterMessage core / src / main / scala / org /apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala: executorActor (task.executorId)! LaunchTask (new SerializableBuffer (serializedTask)) Summary

Today's content is relatively simple, there is no technical content, to make a description of their own, lest a long time, do not remember.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.