Getting Started with Hadoop (3)--hadoop2.0 Theoretical basis: Installation Deployment method

Source: Internet
Author: User
Tags tmp folder

First, hadoop2.0 installation deployment process

1, Automatic installation deployment: Ambari, Minos (Xiaomi), Cloudera Manager (charge)

2, using RPM Package installation deployment: Apache Hadoop does not support, HDP and CDH provide

3. Install the deployment using the JAR package: Each version is available. (This approach is recommended for early understanding of Hadoop)

Deployment process:

Preparing the hardware (Linux operating system)

Prepare the Software installation package and install the base software (primarily JDK)

Distribute the Hadoop installation package to the same directory in each node and unzip

Modifying a configuration file

Start the service

Verify that startup is successful

Second, hadoop2.0 Hardware and software preparation

Hardware preparation: The test environment requires only a single Linux machine. The build environment requires more than one Linux machine.

Recommended memory is not less than 4G (performance)

Software Preparation: jdk1.6+ (CDH5 recommended JDK7), hadoop2.0 installation package.

1. We recommend using non-root users to install Hadoop. (Some features of Hadoop are not allowed to operate under the root user)

2, configure SSH password-free login: In order to start the Hadoop cluster convenient.

Third, hadoop2.0 installation package download

It is recommended to select a free version of the commercial company: mainly consider the option of not needing a version.

Http://archive.cloudera.com/cdh4/cdh/4

Http://archive.cloudera.com/cdh5/cdh/5

Hadoop Directory Structure Analysis:

Bin: The most basic administrative script and the directory where the script is used. These scripts are the underlying implementation of the administrative scripts under the Sbin directory.

ETC: the directory where the configuration files are located, including Core-site.xml, Hdfs-site.xml, Mapred-site.xml, etc., Yarn-site.xml

Include: externally provided programming library header files. Typically used for C + + programs to access HDFs.

LIB: This directory contains programming dynamic libraries and static libraries that Hadoop provides externally.

Libexec: The directory where the shell configuration files for each service are located.

The directory where the Sbin:hadoop management script is located, mainly contains the startup shutdown scripts for various types of services in HDFs and yarn.

Share:hadoop the directory where each module is compiled with the jar package.

Four, hadoop2.0 test environment (stand-alone) construction method

Here is a theoretical explanation only.

1, first put the installation package into a directory, and decompression.

2. Modify the XML configuration file under the folder Etc/hadoop in the extracted directory:

hadoop-env.sh Modify the following configuration: Export java_home=/home/....

The slaves file is modified to the following configuration: YARN001

Mapred-site.xml in: Mapreduce.framework.name=yarn

core-site.xml:fs.default.name=hdfs://yar001:8020

Yarn-site.xml:yarn.nodenamager.aux-services=mapreduce_shuffle

Core-site.xml:dfs.replication=1

3. Start the service:

Formatted Hdfs:bin/hadoop Namenode-format

Start hdfs:sbin/start-dfs.sh

Start yart:sbin/start-yarn.sh

4. Verify success:

JPS to see if the corresponding service has been started:

Namenode\datanode\nodemanager\resourcemanager\secondarynamenode

Visit yarn:http://yarn001:8088

Visit hdfs:http://yarn001:50070

Problems:

The virtual machine fails to start successfully after a reboot: the/tmp folder is emptied and a folder other than/TMP is configured.

Added in Core-site.xml: dfs.namenode.name.dir=/xxx; dfs.datanode.data.dir=/xxxx;

Five, hadoop2.0 production environment (multi-machine) construction method

1, the installation package is stored in a directory, and decompression.

2. Modify the XML configuration file under Folder Etc/hadoop in the extracted directory.

3. Format and start HDFs

4. Start yarn

The difference with a stand-alone environment is that the contents of the configuration file that you modified in step 2 are different. And the detailed steps in step 3 are different.

HDFS Ha deployment method: see subsequent articles

How to deploy HDFS ha+federation: see subsequent articles

Yarn Deployment Method: See subsequent articles

Getting Started with Hadoop (3)--hadoop2.0 Theoretical basis: Installation Deployment method

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.