IntelliJ idea Spark Source analysis

Source: Internet
Author: User

In order to be able to follow up on the development of Spark source code and to carry out a detailed reading analysis of the source code after some experience with spark, this article details how to use IntelliJ idea to import the latest Spark source code from Github and compile it.

Preparatory work

First, you need to have JDK 1.6+ installed in your system, and Scala is installed. After downloading the latest version of IntelliJ idea, first install (first Open will recommend you to install) Scala plugin, related methods are not much to say. At this point, you should be able to run Scala on the command line in your system. My system environment is as follows:

1. Mac OS X (10.9.5)

2. JDK 1.7.71

3. Scala 2.10.4

4. IntelliJ Idea 14

In addition, it is recommended that you start with pre-built spark, to understand how spark works, how to use it, to write some spark applications and then expand the source code to read, and try to modify the code to manually compile.

Importing Spark Engineering from Github

After opening IntelliJ idea, in the menu bar, select Vcs→check out from Version control→git, then fill in the address of the Spark project in the Git Repository URL and specify the local path, as shown in.

After clicking Clone in the window, start cloning the project from Github, which will take about 3-10 minutes to test your speed.

Compiling Spark

When clone is complete, IntelliJ idea will automatically prompt you if the project has a corresponding pom.xml file, whether it is open or not. This directly selects Open the Pom.xml file, and then the system automatically resolves the dependencies of the project, which can take a different amount of time depending on your network and system-related environment.

After this step is complete, manually edit the Pom.xml file in the Spark root directory to find the line that specifies the Java version (java.version), depending on your system environment, if you are using jdk1.7, you may need to change its value to 1.7 (default is 1.6).

Then open the shell terminal, and on the command line go to the Spark project root that you just imported, execute

SBT/SBT Assembly

The compile command will all be compiled with the default configuration to compile spark, and if you want to specify the version of the relevant component, you can view the Build-spark in the Spark website (http://spark.apache.org/docs/latest/ building-spark.html), see all the common compilation options. The process currently does not require a VPN to complete, in order to estimate the time required for compiling, you can open a new shell terminal, constantly look at the size of the Spark project directory, I finally adopted the default configuration, after the successful compilation of the Spark directory size of 2.0G.

Conclusion

At this point, in order to verify your compilation results, you can enter the Spark/bin directory on the command line, run Spark-shell, if everything starts normally, the compilation succeeds. If you modify the source of the Spark, you can re-use SBT to compile, and the compilation time will not be as long as the first compilation.


IntelliJ idea Spark Source analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.