1, download spark source code extracted to the directory/usr/local/spark-1.5.0-cdh5.5.1 to see if there are pom.xml file 2, switch to the directory/usr/local/spark-1.5.0-cdh5.5.1 execution: When compiling the spark source code, you need to download the dependency pack from the Internet, so the entire build process machine must be in a networked state. The compilation executes the following script:
[hadoop@hadoop spark-1.5.0-cdh5.5.1]$ export maven_opts= "-xmx2g -xx:maxpermsize=512m -x:reservedcodecachesize=512m "[hadoop@hadoop spark-1.5.0-cdh5.5.1] $mvn-pyarn-dhadoop.version=2.5.0- cdh5.3.3 -dscala-3.9.9-phive-phive-thriftserver -dskiptests Clean Package Compile error resolution: vi /usr/local/ Maven/conf/settings.xml Add red section: <servers> <server> <id>hadoop</id> <username>hadoop</username > <password>hadoop</password> <privateKey>/home/hadoop/.ssh/id_rsa</privateKey> < Passphrase>some_passphrase</passphrase> <filepermissions>664 </filePermissions> <directorypermissions>775</directorypErmissions> <configuration></configuration> </server> </servers> vi /usr/local/spark/pom.xml Add the following red section: <modules > <module>core</module> <module>bagel</ Module> <module>graphx</module> <module> Mllib</module> <module>tools</module> < Module>network/common</module> <module>network/shuffle</module> <module>streaming</module> <module>sql/ Catalyst</module> <module>sql/core</module> <module>sql/hive</module> <module>unsafe</module> <moDule>assembly</module> <module>external/twitter</module> <module>external/flume</module> <module>external/ Flume-sink</module> <!--&NBSP;DISABLED&NBSP;IN&NBSP;CDH. <module>external/flume-assembly</module> --> <module>sql/hive-thriftserver</module> <module >external/mqtt</module> <module>external/mqtt-assembly</module> <module>external/zeromq</module> <module> Examples</module> <module>repl</module> < Module>launcher</module> <module>external/kafka</module> <!--&NBSP;DISABLED&NBSP;IN&NBSP;CDH <module>external/kafka-assembly</module> --> </modules> 3, Build Spark Deployment package:./make-distribution.sh --name 2.5.0 --tgz --with-tachyon
There is a script make-distribution.sh that generates the deployment package under the Spark source root directory, which can be packaged by executing the following command./make-distribution.sh [--name] [--tgz] [-- With-tachyon] <maven Build options>
L --name name and--tgz combine to generate spark-$VERSION-bin-$NAME. TGZ deployment package, without this parameter, name is the version number of Hadoop
L --tgz generates spark-$VERSION-bin.tgz in the root directory, does not generate tgz files without this parameter, generates only the/dist directory
L --with-tachyon support for memory file system Tachyon without this parameter does not support Tachyon