Apache Spark Source Code go-18-use intellij idea to debug Spark Source Code

Source: Internet
Author: User
Tags arch linux

You are welcome to reprint it. Please indicate the source, huichiro.

Summary

The previous blog shows how to modify the source code to view the call stack. Although it is also very practical, compilation is required for every modification, which takes a lot of time and is inefficient, it is also an invasive modification that is not elegant. This article describes how to use intellij idea to track and debug spark source code.

Prerequisites

This document assumes that the development environment is on the Linux platform and the following software has been installed. I personally use arch Linux.

  1. JDK
  2. Scala
  3. SBT
  4. Intellij-idea-Community-Edition
Install Scala plug-in

To install the scala plug-in for idea, follow these steps:

  1. Select File> setting.

2 Step 2: selectInstall jetbrains plugin,Enter Scala on the left side of the pop-up window and click Install, as shown in

3. Scala plug-in installation is complete. Restart idea to take effect.

Because idea 13 already supports SBT, you do not need to install the SBT plug-in for idea.

Download and import source code

Download the source code. Assume that you use git to synchronize the latest source code.

git clone https://github.com/apache/spark.git

Generate an idea Project

sbt/sbt gen-idea

Import Spark Source Code

1. Select File-> Import project and specify the Spark Source Code directory in the pop-up window.

2. Select SBT project as the project type and click Next

3. Click Finish in the new pop-up window.

 

After the import settings are complete, it takes a long time for idea to compile the imported source code and generate a file index.

If the following message "is waiting for. SBT. Ivy. Lock" appears in the prompt bar, the lock file cannot be created and needs to be deleted manually.

cd $HOME/.ivy2rm *.lock

After the lock is manually deleted, restart idea. After the lock is restarted, the last incomplete SBT process will continue.

Source code compilation

When using idea to compile spark source code, there will be multiple errors in the middle. The root cause of the problem is that the dependency is not well resolved when SBT/SBT gen-idea is used.

The solution is as follows,

1. Select File> project structures.

2. Add a new module to dependencies on the right,

Select spark-core

Other modules such as streaming-Twitter, streaming-Kafka, streaming-flume, and streaming-mqtt have similar solutions.

Note that the processing for errors reported by example compilation is slightly different. When dependencies is specified, the module dependency is selected instead of the library, and the SQL is selected in the pop-up window.

For how to solve the compilation error problem, you can look at this link, http://apache-spark-user-list.1001560.n3.nabble.com/Errors-occurred-while-compiling-module-spark-streaming-zeromq-IntelliJ-IDEA-13-0-2-td1282.html

Debug logquery

1. Select Run-> edit deployments.

2. Add an application. Pay attention to the configuration items in the window on the right, including main class, Vm options, working directory, and use classpath of module.

-Dspark. Master = Local specifies the spark running mode, which can be modified as needed.

3. At this point, you can find a "run logquery" item in the run menu and try to run it to ensure the compilation is successful.

4. Set the breakpoint. Double-click on the left side of the source file to mark the breakpoint, and click Run-> "Debug logquery". As shown in, you can view the variables and call stacks.

 

Reference

  1. Http://8liang.cn/intellij-idea-spark-development
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.