Spark1.0.0 source code compilation and Installation

Last Update:2014-10-02 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Recently, I want to review what I learned. It is a good way to write a blog to help others make it easier for me. At the beginning, I am very sorry for your shortcomings.

Install jdk1.6 or above, Scala, Maven, ant, and hadoop2.20 before compilation, for example (/etc/profile ):

Spark compilation provides two methods:

Maven Compilation: Add: Export maven_opts = "-xmx2g-XX: maxpermsize = 512 M-XX: reservedcodecachesize = 512 M" to the/etc/profile file"
Run the following command: MVN-pyarn-dhadoop. Version = 2.2.0-dyarn. Version = 2.2.0-dskiptests clean package.
SBT compilation configuration file: Project/sparkbuilder. Scala under the spark installation directory execution: spark_hadoop_version = 2.2.0 spark_yarn = true SBT/SBT assembly

Note: The First compilation will take a long time because you need to download the dependent jar package. (If the compilation fails, it is basically due to network reasons. If you re-execute the compilation command, it will be OK );
After spark is compiled, the entire directory size is about 900 mb. If you copy data to other nodes through SCP, the space will be occupied. Therefore, you need to generate a spark deployment package in the next step;

Spark deployment package Generation Command make-distribution.sh
-- Hadoop version: Version Number of hadoop. If this parameter is not added, the hadoop version is 1.0.4.
-- With-yarn: whether to support hadoop yarn. If no parameter is added, yarn is not supported.
-- With-Hive: whether to support hive in Spark SQL. If this parameter is not added, hive is not supported.
-- Skip-Java-test: Indicates whether Java testing has been performed during compilation. If this parameter is not added, it is skipped.
-- With-Tachyon: whether to support tachyon in the memory file system. If this parameter is not added, tachyon is not supported.
-- Tgz: generate the spark-$ VERSION-bin.tgz in the root directory. If this parameter is not added, the tgz file is not generated and only generated
/Dist directory.
-- Name: And-tgz

Example:
Generate deployment packages that support yarn and hive:
./Make-distribution.sh -- hadoop 2.2.0 -- With-yarn -- With-hive -- tgz

Note: The parameters are ordered. During installation, enter the prompt (If your JDK version is not 1.6) and press Yes to press Enter;

After the spark deployment package is generated, decompress SCP to each node and change the following Configuration:

Configure the conf/Slave file in the spark installation directory and add the IP addresses or host names of each node (if the host name is configured, configure the correspondence between the IP addresses of the/etc/sysconfig/hosts file and the host name) 4.2 configure the conf/spark-env.sh file under the spark installation directory

Export spark_master_ip = chenx [Master host name]
Export spark_master_port = 7077 [access port]
Export spark_worker_cores = 1 [number of cores used]
Export spark_worker_instances = 1
Export spark_worker_memory = 3 GB [memory usage]

The following table lists the HA configurations of spark;

// Spark file system-based ha Configuration

Export spark_daemon_java_opts = "-dspark. Deploy. recoverymode = filesystem-dspark. Deploy. recoverydirectory =/temp/recovery"

// Spark ha configuration based on zookeeper

Export spark_daemon_java_opts = "-dspark. Deploy. recoverymode = zookeeper-dspark. Deploy. zookeeper. url = hadoop1: 2181, hadoop2: 2181, hadoop3: 2181-dspar
K. Deploy. zookeeper. dir =/temp/recover"

Last run:./sbin/start-all.sh

Run the JPS command in standalone mode. The master and worker processes are OK;

Spark1.0.0 source code compilation and Installation

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Spark1.0.0 source code compilation and Installation

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Spark1.0.0 source code compilation and Installation

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support