Spark compiled installation and deployment

Last Update:2014-12-29 Source: Internet

Author: User

Tags zookeeper

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Download and compile Spark source code

Download Spark http://spark.apache.org/downloads.html I downloaded the 1.2.0 version

Unzip and compile, before compiling, you can modify the corresponding Pom.xml configuration according to the environment of your machine, my environment is hadoop2.4.1 to modify a small version number, compile includes support for hive, yarn, ganglia, etc.

Tar xzf ~/source/spark-1.2.0.tgzcd spark-1.2.0vi pom.xml./make-distribution.sh--name 2.4.1--with-tachyon--tgz- pspark-ganglia-lgpl-pyarn-pkinesis-asl-phive-0.13.1-phive-thriftserver-phadoop-2.4-djava.version=1.6- Dhadoop.version=2.4.1-dskiptests

Note: After each release of Spark, the Pom.xml configuration may be adjusted accordingly, depending on the configuration in the Pom.xml file, adjust the parameters at compile time.

2. Spark-related configuration

Unzip the compiled. tgz file, configure the environment variables and the spark configuration file as follows:

Environment variables: (list spark-related configurations only)

Export Scala_home=/home/ocdc/bin/scala-2.10.4export path= $SCALA _home/bin: $PATHexport spark_home=/home/ocdc/bin/ Spark-1.2.0-bin-2.4.1export path= $PATH: $SPARK _home/bin: $SPARK _home/sbin:

spark-env.sh

Export Spark_master_ip=masterexport spark_master_port=17077export spark_master_webui_port=18080export SPARK_WORKER _cores=1export Spark_worker_memory=1gexport Spark_worker_webui_port=18081export SPARK_WORKER_INSTANCES=1# When configuring HA for master, this key needs to be configured, and ZK needs to start early #export spark_daemon_java_opts= "-dspark.deploy.recoverymode=zookeeper- dspark.deploy.zookeeper.url=master:2181,node1:2181,node2:2181 "

Slaves

Node1node2node3

Spark-default.conf

Spark.master spark://master:17077spark.eventlog.enabled Truespark.eventLog.dir Hdfs://cluster1:8021/eventlogdirspark.executor.memory 512mspark.driver.memory 512m

Copy Spark to each node

Scp-r ~/bin/spark-1.2.0-bin-2.4.1/[email protected]:~/bin/scp-r ~/bin/spark-1.2.0-bin-2.4.1/[email protected]:~/ Bin/scp-r ~/BIN/SPARK-1.2.0-BIN-2.4.1/[Email protected]:~/bin/

3. Start Spark (master single point)

CD $SPARK _homesbin/start-all.sh

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/57/A5/wKiom1ShA7uz5CC_AAQFvDJHD6k547.jpg "title=" 1.png " alt= "Wkiom1sha7uz5cc_aaqfvdjhd6k547.jpg"/>

4. Spark (zookeeper-based master node ha)

Configuring the Zookeeper cluster, using the master, Node1, Node2 three nodes, the configuration for the zookeeper cluster is skipped here. Three node boot zookeeper

zkserver.sh start

spark-env.sh configuration file Increase zookeeper related configuration (note: Because Ha, MASTER can be more than one, so in the configuration file cannot specify SPARK_MASTER_IP, otherwise it will not start properly)

#export spark_master_ip=masterexport spark_master_port=17077export Spark_master_webui_port=18080export SPARK_ Worker_cores=1export Spark_worker_memory=1gexport Spark_worker_webui_port=18081export SPARK_WORKER_INSTANCES= 1export spark_daemon_java_opts= "-dspark.deploy.recoverymode=zookeeper-dspark.deploy.zookeeper.url=master:2181, node1:2181,node2:2181 "

Master node starts spark

sbin/start-all.sh

Node1 node Start ha

sbin/start-master.sh

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M02/57/A7/wKiom1ShE87BLeNzAAGa7a8CDD4044.jpg "style=" float: none; "title=" 1.png "alt=" Wkiom1she87blenzaaga7a8cdd4044.jpg "/>

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/57/A4/wKioL1ShFIHAhITnAAGjsX-t-dk297.jpg "style=" float: none; "title=" 2.png "alt=" Wkiol1shfihahitnaagjsx-t-dk297.jpg "/>

5. Start Spark-shell

1) Single Master process start

Bin/spark-shell--master spark://master:17077

2) Ha mode startup

Bin/spark-shell--master spark://master:17077,node1:17077

6. Start History-server

The Node1 node starts History-server, and the configuration is already configured in spark-defaults.conf

Sbin/start-history-server.sh Hdfs://cluster1:8020/eventlogdir

Spark compiled installation and deployment

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More