Reference article:
Build a development environment for spark Source reading and code debugging
Apache Spark Source Reading Environment
Tools |
version |
Scala |
2.12.2 |
Java |
1.8.0_92 |
SBt |
0.13.13 |
Maven |
3.3.9 |
Idea |
CE 2017.1.4 |
MacOS |
10.12.5 |
git clone
git clone https://github.com/apache/spark.git
compiling source code
Build/mvn-t 4-dskiptests Clean Package
More time-consuming, I compile in the company, intermittent time-consuming about 2 hours. The compilation succeeds as shown in the following figure:
Idea Import
Menu, File-and open as Project {spark Dir}/pom.xml
The process is pretty fast, and it's done in a few minutes. The project structure after success is as follows:
Incremental Development
In the development process, we have a little bit of modification, want to look at the effect, do not want to recompile all the dependencies, and just want to build a new change. You can use SBT's incremental compilation.
BUILD/SBT Clean package//First full-volume compilation
export spark_prepend_classes=true//increment
or
BUILD/SBT ~compile//incremental Compilation
scalastyle Check and Idea Auto format conflict
Scalastyle-config.xml has a code style check configured, but there is a partial conflict with the idea default format code, which I use to report the following error
Use Javadoc style indent comment using
Javadoc style indentation for multiline comments
//One line character over 100 File lines
length exceeds characters
//note at the beginning to insert a space insert a space after the start of the
comment
In order to maintain the same as the official Spark Division, we modified the idea's code style to first access the Codestyle
Preferences->editor->code Style->scala
Modified as follows: Scaladoc
Cancel enable Scaladoc formatting cancel use Scaladoc indent for leading asterisk wrapping and braces
Method Declaration Parameters
Cancel Align when multiline tick ensure right margin was not exceeded (set right margin=100 under Code style)
Up and down a comment, because the source code itself is not standardized, ending with **/end. After the deletion, it resumed. This just solves the error message, but there are a lot of code style and source inconsistencies, so finally in order not to cause the previous style because the local idea of the formatting caused by changes, it is recommended to local format code. Do not use global formatting. See if you can find the code style configuration file used by the source developer and import the best directly.
Code Style Guide