You are welcome to reprint it. Please indicate the source, huichiro.
Summary
The previous blog shows how to modify the source code to view the call stack. Although it is also very practical, compilation is required for every modification, which takes a lot of time and is inefficient, it is also an invasive modification that is not elegant. This article describes how to use intellij idea to track and debug spark source code.
Prerequisites
This document assumes that the development environment is on the Linux platform and the following software has been installed. I personally use arch Linux.
- JDK
- Scala
- SBT
- Intellij-idea-Community-Edition
Install Scala plug-in
To install the scala plug-in for idea, follow these steps:
- Select File> setting.
2 Step 2: selectInstall jetbrains plugin,Enter Scala on the left side of the pop-up window and click Install, as shown in
3. Scala plug-in installation is complete. Restart idea to take effect.
Because idea 13 already supports SBT, you do not need to install the SBT plug-in for idea.
Download and import source code
Download the source code. Assume that you use git to synchronize the latest source code.
git clone https://github.com/apache/spark.git
Generate an idea Project
sbt/sbt gen-idea
Import Spark Source Code
1. Select File-> Import project and specify the Spark Source Code directory in the pop-up window.
2. Select SBT project as the project type and click Next
3. Click Finish in the new pop-up window.
After the import settings are complete, it takes a long time for idea to compile the imported source code and generate a file index.
If the following message "is waiting for. SBT. Ivy. Lock" appears in the prompt bar, the lock file cannot be created and needs to be deleted manually.
cd $HOME/.ivy2rm *.lock
After the lock is manually deleted, restart idea. After the lock is restarted, the last incomplete SBT process will continue.
Source code compilation
When using idea to compile spark source code, there will be multiple errors in the middle. The root cause of the problem is that the dependency is not well resolved when SBT/SBT gen-idea is used.
The solution is as follows,
1. Select File> project structures.
2. Add a new module to dependencies on the right,
Select spark-core
Other modules such as streaming-Twitter, streaming-Kafka, streaming-flume, and streaming-mqtt have similar solutions.
Note that the processing for errors reported by example compilation is slightly different. When dependencies is specified, the module dependency is selected instead of the library, and the SQL is selected in the pop-up window.
For how to solve the compilation error problem, you can look at this link, http://apache-spark-user-list.1001560.n3.nabble.com/Errors-occurred-while-compiling-module-spark-streaming-zeromq-IntelliJ-IDEA-13-0-2-td1282.html
Debug logquery
1. Select Run-> edit deployments.
2. Add an application. Pay attention to the configuration items in the window on the right, including main class, Vm options, working directory, and use classpath of module.
-Dspark. Master = Local specifies the spark running mode, which can be modified as needed.
3. At this point, you can find a "run logquery" item in the run menu and try to run it to ensure the compilation is successful.
4. Set the breakpoint. Double-click on the left side of the source file to mark the breakpoint, and click Run-> "Debug logquery". As shown in, you can view the variables and call stacks.
Reference
- Http://8liang.cn/intellij-idea-spark-development