This article describes how to deploy Apache to Hadoop 2.2.0, http://www.aliyun.com/zixun/aggregation/14417.html". If your Hadoop is another version, such as CDH4, you can refer directly to the official Explain the operation.
There are two points to note: (1) Hadoop must be 2.0 series, such as 0.23.x, 2.0.x, 2.xx or CDH4, CDH5, etc. Running Spark on Hadoop is essentially running Spark on Hadoop YARN Because Spark itself provides only job management capabilities, resource scheduling relies on third-party systems such as YARN or Mesos (2) The reason why Mesos is not YARN is because YARN has strong community support and has gradually Become a resource management system standard.
Note that the official has released the 0.8.1 version, you can directly select from here to download the appropriate version, if you are using hadoop 2.2.0 or CDH5, can be downloaded directly from here.
Deploying Spark to Hadoop 2.2.0 requires the following steps:
Step 1: Prepare the basic software
Step 2: Download Compile spark 0.8.1 or later
Step 3: Run the Spark instance
The next few details of these steps.
Step 1: Prepare the basic software
(1) basic software
Including the Linux operating system, Hadoop 2.2.0 or higher, Maven 3.0.4 (or the latest 3.0.x version), which, Hadoop 2.2.0 only need to use the simplest way to install, specific reference to my This article: Hadoop YARN installation and deployment, Maven installation method is very simple, you can download the binary version at http://maven.apache.org/download.cgi, extract, configure MAVEN_HOME and PATH two environment variables, specific to their own Find relevant methods on the Internet, such as this "Linux installation maven", but need to be aware that the version is not 3.0.x, Spark version of the stringent requirements.
(2) hardware preparation
Spark 2.2.0 came up with a yarn-new support hadoop 2.2.0, because hadoop 2.2.0 API incompatible changes need to use Maven compiled and packaged separately, and the compilation process is very slow (normal machine, 2 hours About), and take up more memory, so you need a machine that meets the following conditions as a compiler:
Condition 1: Can be networked: The first compilation, maven need to download a lot of jar package from the Internet, the speed is relatively slow, if your network does not work, it is recommended to give up compile.
Condition 2: Memory above 2GB
Step 2: Download Compile spark 0.8.1 or later
Git can be downloaded or directly wget or spark 0.8.1 version
wget https://github.com/apache/incubator-spark/archive/v0.8.1-incubating.zip
Note that version 0.8.1 did not support hadoop 2.2.0, since version 0.8.1.
After downloading, unzip it:
unzip v0.8.1-incubating
Then enter the extract directory, enter the following command:
cd incubator-spark-0.8.1-incubating
export MAVEN_OPTS = "- Xmx2g -XX: MaxPermSize = 512M -XX: ReservedCodeCacheSize = 512m"
mvn -Dyarn.version = 2.2.0 -Dhadoop.version = 2.2.0 -Pnew-yarn -DskipTests package
Generally need to wait a long time, after the completion of the compiler, the spark kernel packaged into a separate jar package, the command is as follows: