Spark1.0.0 source code compilation and Installation

Source: Internet
Author: User

Recently, I want to review what I learned. It is a good way to write a blog to help others make it easier for me. At the beginning, I am very sorry for your shortcomings.

 

Install jdk1.6 or above, Scala, Maven, ant, and hadoop2.20 before compilation, for example (/etc/profile ):

 

Spark compilation provides two methods:

  1. Maven Compilation: Add: Export maven_opts = "-xmx2g-XX: maxpermsize = 512 M-XX: reservedcodecachesize = 512 M" to the/etc/profile file"
    Run the following command: MVN-pyarn-dhadoop. Version = 2.2.0-dyarn. Version = 2.2.0-dskiptests clean package.
  2. SBT compilation configuration file: Project/sparkbuilder. Scala under the spark installation directory execution: spark_hadoop_version = 2.2.0 spark_yarn = true SBT/SBT assembly

Note: The First compilation will take a long time because you need to download the dependent jar package. (If the compilation fails, it is basically due to network reasons. If you re-execute the compilation command, it will be OK );
After spark is compiled, the entire directory size is about 900 mb. If you copy data to other nodes through SCP, the space will be occupied. Therefore, you need to generate a spark deployment package in the next step;

 

Spark deployment package Generation Command make-distribution.sh
-- Hadoop version: Version Number of hadoop. If this parameter is not added, the hadoop version is 1.0.4.
-- With-yarn: whether to support hadoop yarn. If no parameter is added, yarn is not supported.
-- With-Hive: whether to support hive in Spark SQL. If this parameter is not added, hive is not supported.
-- Skip-Java-test: Indicates whether Java testing has been performed during compilation. If this parameter is not added, it is skipped.
-- With-Tachyon: whether to support tachyon in the memory file system. If this parameter is not added, tachyon is not supported.
-- Tgz: generate the spark-$ VERSION-bin.tgz in the root directory. If this parameter is not added, the tgz file is not generated and only generated
/Dist directory.
-- Name: And-tgz

Example:
Generate deployment packages that support yarn and hive:
./Make-distribution.sh -- hadoop 2.2.0 -- With-yarn -- With-hive -- tgz

Note: The parameters are ordered. During installation, enter the prompt (If your JDK version is not 1.6) and press Yes to press Enter;

 

After the spark deployment package is generated, decompress SCP to each node and change the following Configuration:

  • Configure the conf/Slave file in the spark installation directory and add the IP addresses or host names of each node (if the host name is configured, configure the correspondence between the IP addresses of the/etc/sysconfig/hosts file and the host name) 4.2 configure the conf/spark-env.sh file under the spark installation directory

Export spark_master_ip = chenx [Master host name]
Export spark_master_port = 7077 [access port]
Export spark_worker_cores = 1 [number of cores used]
Export spark_worker_instances = 1
Export spark_worker_memory = 3 GB [memory usage]

The following table lists the HA configurations of spark;

// Spark file system-based ha Configuration

Export spark_daemon_java_opts = "-dspark. Deploy. recoverymode = filesystem-dspark. Deploy. recoverydirectory =/temp/recovery"

// Spark ha configuration based on zookeeper

Export spark_daemon_java_opts = "-dspark. Deploy. recoverymode = zookeeper-dspark. Deploy. zookeeper. url = hadoop1: 2181, hadoop2: 2181, hadoop3: 2181-dspar
K. Deploy. zookeeper. dir =/temp/recover"

 

Last run:./sbin/start-all.sh

Run the JPS command in standalone mode. The master and worker processes are OK;

Spark1.0.0 source code compilation and Installation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.