Getting Started with Hadoop (3)--hadoop2.0 Theoretical basis: Installation Deployment method

Last Update:2015-11-15 Source: Internet

Author: User

Tags tmp folder

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, hadoop2.0 installation deployment process

1, Automatic installation deployment: Ambari, Minos (Xiaomi), Cloudera Manager (charge)

2, using RPM Package installation deployment: Apache Hadoop does not support, HDP and CDH provide

3. Install the deployment using the JAR package: Each version is available. (This approach is recommended for early understanding of Hadoop)

Deployment process:

Preparing the hardware (Linux operating system)

Prepare the Software installation package and install the base software (primarily JDK)

Distribute the Hadoop installation package to the same directory in each node and unzip

Modifying a configuration file

Start the service

Verify that startup is successful

Second, hadoop2.0 Hardware and software preparation

Hardware preparation: The test environment requires only a single Linux machine. The build environment requires more than one Linux machine.

Recommended memory is not less than 4G (performance)

Software Preparation: jdk1.6+ (CDH5 recommended JDK7), hadoop2.0 installation package.

1. We recommend using non-root users to install Hadoop. (Some features of Hadoop are not allowed to operate under the root user)

2, configure SSH password-free login: In order to start the Hadoop cluster convenient.

Third, hadoop2.0 installation package download

It is recommended to select a free version of the commercial company: mainly consider the option of not needing a version.

Http://archive.cloudera.com/cdh4/cdh/4

Http://archive.cloudera.com/cdh5/cdh/5

Hadoop Directory Structure Analysis:

Bin: The most basic administrative script and the directory where the script is used. These scripts are the underlying implementation of the administrative scripts under the Sbin directory.

ETC: the directory where the configuration files are located, including Core-site.xml, Hdfs-site.xml, Mapred-site.xml, etc., Yarn-site.xml

Include: externally provided programming library header files. Typically used for C + + programs to access HDFs.

LIB: This directory contains programming dynamic libraries and static libraries that Hadoop provides externally.

Libexec: The directory where the shell configuration files for each service are located.

The directory where the Sbin:hadoop management script is located, mainly contains the startup shutdown scripts for various types of services in HDFs and yarn.

Share:hadoop the directory where each module is compiled with the jar package.

Four, hadoop2.0 test environment (stand-alone) construction method

Here is a theoretical explanation only.

1, first put the installation package into a directory, and decompression.

2. Modify the XML configuration file under the folder Etc/hadoop in the extracted directory:

hadoop-env.sh Modify the following configuration: Export java_home=/home/....

The slaves file is modified to the following configuration: YARN001

Mapred-site.xml in: Mapreduce.framework.name=yarn

core-site.xml:fs.default.name=hdfs://yar001:8020

Yarn-site.xml:yarn.nodenamager.aux-services=mapreduce_shuffle

Core-site.xml:dfs.replication=1

3. Start the service:

Formatted Hdfs:bin/hadoop Namenode-format

Start hdfs:sbin/start-dfs.sh

Start yart:sbin/start-yarn.sh

4. Verify success:

JPS to see if the corresponding service has been started:

Namenode\datanode\nodemanager\resourcemanager\secondarynamenode

Visit yarn:http://yarn001:8088

Visit hdfs:http://yarn001:50070

Problems:

The virtual machine fails to start successfully after a reboot: the/tmp folder is emptied and a folder other than/TMP is configured.

Added in Core-site.xml: dfs.namenode.name.dir=/xxx; dfs.datanode.data.dir=/xxxx;

Five, hadoop2.0 production environment (multi-machine) construction method

1, the installation package is stored in a directory, and decompression.

2. Modify the XML configuration file under Folder Etc/hadoop in the extracted directory.

3. Format and start HDFs

4. Start yarn

The difference with a stand-alone environment is that the contents of the configuration file that you modified in step 2 are different. And the detailed steps in step 3 are different.

HDFS Ha deployment method: see subsequent articles

How to deploy HDFS ha+federation: see subsequent articles

Yarn Deployment Method: See subsequent articles

Getting Started with Hadoop (3)--hadoop2.0 Theoretical basis: Installation Deployment method

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More