Deploy Spark onto Hadoop 2.2.0

Source: Internet
Author: User
Keywords Compile or step directly
Tags aliyun apache basic basic software compile download hadoop hadoop 2

This article describes how to deploy Apache to Hadoop 2.2.0, http://www.aliyun.com/zixun/aggregation/14417.html". If your Hadoop is another version, such as CDH4, you can refer directly to the official Explain the operation.

There are two points to note: (1) Hadoop must be 2.0 series, such as 0.23.x, 2.0.x, 2.xx or CDH4, CDH5, etc. Running Spark on Hadoop is essentially running Spark on Hadoop YARN Because Spark itself provides only job management capabilities, resource scheduling relies on third-party systems such as YARN or Mesos (2) The reason why Mesos is not YARN is because YARN has strong community support and has gradually Become a resource management system standard.

Note that the official has released the 0.8.1 version, you can directly select from here to download the appropriate version, if you are using hadoop 2.2.0 or CDH5, can be downloaded directly from here.

Deploying Spark to Hadoop 2.2.0 requires the following steps:

Step 1: Prepare the basic software

Step 2: Download Compile spark 0.8.1 or later

Step 3: Run the Spark instance

The next few details of these steps.

Step 1: Prepare the basic software

(1) basic software

Including the Linux operating system, Hadoop 2.2.0 or higher, Maven 3.0.4 (or the latest 3.0.x version), which, Hadoop 2.2.0 only need to use the simplest way to install, specific reference to my This article: Hadoop YARN installation and deployment, Maven installation method is very simple, you can download the binary version at http://maven.apache.org/download.cgi, extract, configure MAVEN_HOME and PATH two environment variables, specific to their own Find relevant methods on the Internet, such as this "Linux installation maven", but need to be aware that the version is not 3.0.x, Spark version of the stringent requirements.

(2) hardware preparation

Spark 2.2.0 came up with a yarn-new support hadoop 2.2.0, because hadoop 2.2.0 API incompatible changes need to use Maven compiled and packaged separately, and the compilation process is very slow (normal machine, 2 hours About), and take up more memory, so you need a machine that meets the following conditions as a compiler:

Condition 1: Can be networked: The first compilation, maven need to download a lot of jar package from the Internet, the speed is relatively slow, if your network does not work, it is recommended to give up compile.

Condition 2: Memory above 2GB

Step 2: Download Compile spark 0.8.1 or later

Git can be downloaded or directly wget or spark 0.8.1 version

wget https://github.com/apache/incubator-spark/archive/v0.8.1-incubating.zip

Note that version 0.8.1 did not support hadoop 2.2.0, since version 0.8.1.

After downloading, unzip it:

unzip v0.8.1-incubating

Then enter the extract directory, enter the following command:

cd incubator-spark-0.8.1-incubating

export MAVEN_OPTS = "- Xmx2g -XX: MaxPermSize = 512M -XX: ReservedCodeCacheSize = 512m"

mvn -Dyarn.version = 2.2.0 -Dhadoop.version = 2.2.0 -Pnew-yarn -DskipTests package

Generally need to wait a long time, after the completion of the compiler, the spark kernel packaged into a separate jar package, the command is as follows:

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.