1. Download and compile Spark source code
Download Spark http://spark.apache.org/downloads.html I downloaded the 1.2.0 version
Unzip and compile, before compiling, you can modify the corresponding Pom.xml configuration according to the environment of your machine, my environment is hadoop2.4.1 to modify a small version number, compile includes support for hive, yarn, ganglia, etc.
Tar xzf ~/source/spark-1.2.0.tgzcd spark-1.2.0vi pom.xml./make-distribution.sh--name 2.4.1--with-tachyon--tgz- pspark-ganglia-lgpl-pyarn-pkinesis-asl-phive-0.13.1-phive-thriftserver-phadoop-2.4-djava.version=1.6- Dhadoop.version=2.4.1-dskiptests
Note: After each release of Spark, the Pom.xml configuration may be adjusted accordingly, depending on the configuration in the Pom.xml file, adjust the parameters at compile time.
2. Spark-related configuration
Unzip the compiled. tgz file, configure the environment variables and the spark configuration file as follows:
Environment variables: (list spark-related configurations only)
Export Scala_home=/home/ocdc/bin/scala-2.10.4export path= $SCALA _home/bin: $PATHexport spark_home=/home/ocdc/bin/ Spark-1.2.0-bin-2.4.1export path= $PATH: $SPARK _home/bin: $SPARK _home/sbin:
spark-env.sh
Export Spark_master_ip=masterexport spark_master_port=17077export spark_master_webui_port=18080export SPARK_WORKER _cores=1export Spark_worker_memory=1gexport Spark_worker_webui_port=18081export SPARK_WORKER_INSTANCES=1# When configuring HA for master, this key needs to be configured, and ZK needs to start early #export spark_daemon_java_opts= "-dspark.deploy.recoverymode=zookeeper- dspark.deploy.zookeeper.url=master:2181,node1:2181,node2:2181 "
Slaves
Node1node2node3
Spark-default.conf
Spark.master spark://master:17077spark.eventlog.enabled Truespark.eventLog.dir Hdfs://cluster1:8021/eventlogdirspark.executor.memory 512mspark.driver.memory 512m
Copy Spark to each node
Scp-r ~/bin/spark-1.2.0-bin-2.4.1/[email protected]:~/bin/scp-r ~/bin/spark-1.2.0-bin-2.4.1/[email protected]:~/ Bin/scp-r ~/BIN/SPARK-1.2.0-BIN-2.4.1/[Email protected]:~/bin/
3. Start Spark (master single point)
CD $SPARK _homesbin/start-all.sh
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/57/A5/wKiom1ShA7uz5CC_AAQFvDJHD6k547.jpg "title=" 1.png " alt= "Wkiom1sha7uz5cc_aaqfvdjhd6k547.jpg"/>
4. Spark (zookeeper-based master node ha)
Configuring the Zookeeper cluster, using the master, Node1, Node2 three nodes, the configuration for the zookeeper cluster is skipped here. Three node boot zookeeper
zkserver.sh start
spark-env.sh configuration file Increase zookeeper related configuration (note: Because Ha, MASTER can be more than one, so in the configuration file cannot specify SPARK_MASTER_IP, otherwise it will not start properly)
#export spark_master_ip=masterexport spark_master_port=17077export Spark_master_webui_port=18080export SPARK_ Worker_cores=1export Spark_worker_memory=1gexport Spark_worker_webui_port=18081export SPARK_WORKER_INSTANCES= 1export spark_daemon_java_opts= "-dspark.deploy.recoverymode=zookeeper-dspark.deploy.zookeeper.url=master:2181, node1:2181,node2:2181 "
Master node starts spark
sbin/start-all.sh
Node1 node Start ha
sbin/start-master.sh
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M02/57/A7/wKiom1ShE87BLeNzAAGa7a8CDD4044.jpg "style=" float: none; "title=" 1.png "alt=" Wkiom1she87blenzaaga7a8cdd4044.jpg "/>
650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/57/A4/wKioL1ShFIHAhITnAAGjsX-t-dk297.jpg "style=" float: none; "title=" 2.png "alt=" Wkiol1shfihahitnaagjsx-t-dk297.jpg "/>
5. Start Spark-shell
1) Single Master process start
Bin/spark-shell--master spark://master:17077
2) Ha mode startup
Bin/spark-shell--master spark://master:17077,node1:17077
6. Start History-server
The Node1 node starts History-server, and the configuration is already configured in spark-defaults.conf
Sbin/start-history-server.sh Hdfs://cluster1:8020/eventlogdir
Spark compiled installation and deployment