You are welcome to reprint it. Please indicate the source, huichiro.
Summary
There is nothing to say about source code compilation. For Java projects, as long as Maven or ant simple commands are clicked, they will be OK. However, when it comes to spark, it seems that things are not so simple. According to the spark officical document, there will always be compilation errors in one way or another, which is annoying.
Today, I had nothing to worry about, and I tried again. I had to make a record for later use.
Preparation
The Linux installed on my compilation machine is archlinux, and the following software is installed:
- Scala 1, 2.11
- Maven
- Git
Download source code
The first step is to download the source code on GitHub.
git clone https://github.com/apache/spark.git
Source code compilation
Instead of using Maven directly or using SBT directly, you can use the compiled script in spark.Make-distribution.sh
export SCALA_HOME=/usr/share/scalacd $SPARK_HOME./make-distribution.sh
If everything goes well, the target file is generated in the $ spark_home/ASSEMBLY/target/scala-2.10 directory, such
assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar
Compile with SBT
The main reason why SBT compilation fails is that some jar files cannot be accessed due to gFW. The solution is to add a proxy.
There are several ways to add a proxy. Which of the following is useful? Try it one by one for the latest spark. Run the following command.
export http_proxy=http://proxy-server:port
Method 2: SetJava_opts
JAVA_OPTS="-Dhttp.proxyServer=proxy-server -Dhttp.proxyPort=portNumber"
Run Test Cases
Since the JAR file can be compiled smoothly, you must also modify the two lines of code to try the effect. If you know that your launch has not taken effect, it is the best way to run the test case.
Assume that some source code under $ spark_home/core has been modified and re-compiled, run the following command:
export SCALA_HOME=/usr/share/scalamvn package -DskipTests
If you want to run the test case set randomsamplersuite in the $ spark_home/Core directory, run the following command.
export SPARK_LOCAL_IP=127.0.0.1export SPARK_MASTER_IP=127.0.0.1mvn -Dsuites=org.apache.spark.util.random.RandomSamplerSuite test