Spark compiled installation and deployment

Source: Internet
Author: User
Tags zookeeper

1. Download and compile Spark source code

Download Spark http://spark.apache.org/downloads.html I downloaded the 1.2.0 version

Unzip and compile, before compiling, you can modify the corresponding Pom.xml configuration according to the environment of your machine, my environment is hadoop2.4.1 to modify a small version number, compile includes support for hive, yarn, ganglia, etc.

Tar xzf ~/source/spark-1.2.0.tgzcd spark-1.2.0vi pom.xml./make-distribution.sh--name 2.4.1--with-tachyon--tgz- pspark-ganglia-lgpl-pyarn-pkinesis-asl-phive-0.13.1-phive-thriftserver-phadoop-2.4-djava.version=1.6- Dhadoop.version=2.4.1-dskiptests

Note: After each release of Spark, the Pom.xml configuration may be adjusted accordingly, depending on the configuration in the Pom.xml file, adjust the parameters at compile time.

2. Spark-related configuration

Unzip the compiled. tgz file, configure the environment variables and the spark configuration file as follows:

Environment variables: (list spark-related configurations only)

Export Scala_home=/home/ocdc/bin/scala-2.10.4export path= $SCALA _home/bin: $PATHexport spark_home=/home/ocdc/bin/ Spark-1.2.0-bin-2.4.1export path= $PATH: $SPARK _home/bin: $SPARK _home/sbin:

spark-env.sh

Export Spark_master_ip=masterexport spark_master_port=17077export spark_master_webui_port=18080export SPARK_WORKER _cores=1export Spark_worker_memory=1gexport Spark_worker_webui_port=18081export SPARK_WORKER_INSTANCES=1# When configuring HA for master, this key needs to be configured, and ZK needs to start early #export spark_daemon_java_opts= "-dspark.deploy.recoverymode=zookeeper- dspark.deploy.zookeeper.url=master:2181,node1:2181,node2:2181 "

Slaves

Node1node2node3

Spark-default.conf

Spark.master spark://master:17077spark.eventlog.enabled Truespark.eventLog.dir Hdfs://cluster1:8021/eventlogdirspark.executor.memory 512mspark.driver.memory 512m

Copy Spark to each node

Scp-r ~/bin/spark-1.2.0-bin-2.4.1/[email protected]:~/bin/scp-r ~/bin/spark-1.2.0-bin-2.4.1/[email protected]:~/ Bin/scp-r ~/BIN/SPARK-1.2.0-BIN-2.4.1/[Email protected]:~/bin/

3. Start Spark (master single point)

CD $SPARK _homesbin/start-all.sh

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/57/A5/wKiom1ShA7uz5CC_AAQFvDJHD6k547.jpg "title=" 1.png " alt= "Wkiom1sha7uz5cc_aaqfvdjhd6k547.jpg"/>

4. Spark (zookeeper-based master node ha)

Configuring the Zookeeper cluster, using the master, Node1, Node2 three nodes, the configuration for the zookeeper cluster is skipped here. Three node boot zookeeper

zkserver.sh start

spark-env.sh configuration file Increase zookeeper related configuration (note: Because Ha, MASTER can be more than one, so in the configuration file cannot specify SPARK_MASTER_IP, otherwise it will not start properly)

#export spark_master_ip=masterexport spark_master_port=17077export Spark_master_webui_port=18080export SPARK_ Worker_cores=1export Spark_worker_memory=1gexport Spark_worker_webui_port=18081export SPARK_WORKER_INSTANCES= 1export spark_daemon_java_opts= "-dspark.deploy.recoverymode=zookeeper-dspark.deploy.zookeeper.url=master:2181, node1:2181,node2:2181 "

Master node starts spark

sbin/start-all.sh

Node1 node Start ha

sbin/start-master.sh

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M02/57/A7/wKiom1ShE87BLeNzAAGa7a8CDD4044.jpg "style=" float: none; "title=" 1.png "alt=" Wkiom1she87blenzaaga7a8cdd4044.jpg "/>

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/57/A4/wKioL1ShFIHAhITnAAGjsX-t-dk297.jpg "style=" float: none; "title=" 2.png "alt=" Wkiol1shfihahitnaagjsx-t-dk297.jpg "/>

5. Start Spark-shell

1) Single Master process start

Bin/spark-shell--master spark://master:17077

2) Ha mode startup

Bin/spark-shell--master spark://master:17077,node1:17077

6. Start History-server

The Node1 node starts History-server, and the configuration is already configured in spark-defaults.conf

Sbin/start-history-server.sh Hdfs://cluster1:8020/eventlogdir


Spark compiled installation and deployment

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.