Hadoop distributed file system architecture deployment

Source: Internet
Author: User
Keywords name java nbsp; value
Tags analysis apache application code computing configuration configure copy

Hadoop, a distributed computing open source framework for the Apache open source organization, has been used on many of the largest web sites, such as Amazon, Facebook and Yahoo. For me, a recent point of use is log analysis of service integration platforms. The service integration platform will have a large amount of logs, which is in line with the applicable scenarios for distributed computing (log analysis and indexing are two major application scenarios).

Today we come to actually build Hadoop version 2.2.0, the actual combat environment for the current mainstream server operating system CentOS 5.8 system.

First, the actual combat environment

System Version: CentOS 5.8x86_64

JAVA version: JDK-1.7.0_25

Hadoop version: hadoop-2.2.0

192.168.149.128namenode (namenode, secondary namenode and ResourceManager roles)

192.168.149.129datanode1 (act as datanode, nodemanager role)

192.168.149.130datanode2 (act as datanode, nodemanager role)

Second, the system preparation

1, Hadoop can download the latest version of Hadoop2.2 directly from the Apache official website. The official is currently provided linux32-bit system executables, so if you need to deploy on a 64-bit system you need to download a separate source src source code. (If it is a real online environment, please download 64-bit hadoop version, so you can avoid a lot of problems, where I experiment with a 32-bit version)

1234 Hadoop download address

http://apache.claz.org/hadoop/common/hadoop-2.2.0/

Java download download

http://www.Oracle.com/technetwork/java/javase/downloads/index.html

2, here we use three CnetOS server to build Hadoop cluster, the respective roles as indicated above.

The first step: we need to set the corresponding host name of the three servers / etc / hosts as follows (the real environment can use intranet DNS resolution)

[root @ node1 hadoop] # cat / etc / hosts

# Do not remove the following line, or various programs

# that require network functionality will fail.

127.0.0.1localhost.localdomain localhost

192.168.149.128node1

192.168.149.129node2

192.168.149.130node3

(Note * We need to configure the hosts in the namenode, datanode three servers)

Step two: no password from namenode landing datanode server, need to do the following configuration:

Execute ssh-keygen on namenode 128, press Enter to return.

Then copy the public key / root / .ssh / id_rsa.pub to the datanode server, the copy method is as follows:

ssh-copy-id -i .ssh / id_rsa.pub root@192.168.149.129

ssh-copy-id -i .ssh / id_rsa.pub root@192.168.149.130

Third, Java installation and configuration

tar-xvzf jdk-7u25-linux-x64.tar.gz && mkdir-p / usr / java /; mv / jdk1.7.0_25 / usr / java /.

Install and configure java environment variables, add the following code at the end of / etc / profile:

export JAVA_HOME = / usr / java / jdk1.7.0_25 /

export PATH = $ JAVA_HOME / bin: $ PATH

export CLASSPATH = $ JAVE_HOME / lib / dt.jar: $ JAVE_HOME / lib / tools.jar: ./

Save and exit, and then execute source / etc / profile to take effect. The java-version command line implementation on behalf of JAVA installed successfully.

[root @ node1 ~] # java-version

java version "1.7.0_25"

Java (TM) SE Runtime Environment (build 1.7.0_25-b15)

Java HotSpot ™ 64-Bit Server VM (build 23.25-b01, mixed mode)

(Note * We need to install the Java JDK version on namenode, datanode, all three servers)

Fourth, Hadoop version installed

Hadoop2.2.0 official download version, without decompression decompile installation can be used, as follows:

The first step to extract:

tar -xzvf hadoop-2.2.0.tar.gz && mv hadoop-2.2.0 / data / hadoop /

(Note * First installed on the namenode server hadoop version, datanode do not have to install, will be modified after the installation of unified installation datanode)

The second step configuration variables:

Continue to add the following code at the end of / etc / profile and execute source / etc / profile to take effect.

export HADOOP_HOME = / data / hadoop /

export PATH = $ PATH: $ HADOOP_HOME / bin /

export JAVA_LIBRARY_PATH = / data / hadoop / lib / native /

(Note * We need to configure Hadoop related variables on namenode, datanode, and three servers)

Fifth, configure Hadoop

In namenode configuration, we need to modify the following places:

1, modify the vi /data/hadoop/etc/hadoop/core-site.xml as follows:

<? xml version = "1.0"?>

<? xml-stylesheet type = "text / xsl" href = \ '# \' "Put site-specific property overrides inthisfile. ->

<configuration>

<property>

<name> fs.default.name </ name>

<value> hdfs: //192.168.149.128: 9000 </ value>

</ property>

<property>

<name> hadoop.tmp.dir </ name>

<value> / tmp / hadoop - $ {user.name} </ value>

<description> A base forother temporary directories. </ description>

</ property>

</ configuration>

2, modify the vi /data/hadoop/etc/hadoop/mapred-site.xml as follows:

<? xml version = "1.0"?>

<? xml-stylesheet type = "text / xsl" href = \ '# \' "Put site-specific property overrides inthisfile. ->

<configuration>

<property>

<name> mapred.job.tracker </ name>

<value> 192.168.149.128:9001 </ value>

</ property>

</ configuration>

3, modify the vi /data/hadoop/etc/hadoop/hdfs-site.xml as follows:

<? xml version = "1.0" encoding = "UTF-8"?>

<? xml-stylesheet type = "text / xsl" href = \ '# \' "/ name>

<value> / data / hadoop / data_name1, / data / hadoop / data_name2 </ value>

</ property>

<property>

<name> dfs.data.dir </ name>

<value> / data / hadoop / data_1, / data / hadoop / data_2 </ value>

</ property>

<property>

<name> dfs.replication </ name>

<value> 2 </ value>

</ property>

</ configuration>

4, add JAVAHOME variable at the end of /data/hadoop/etc/hadoop/hadoop-env.sh file:

echo "export JAVA_HOME = / usr / java / jdk1.7.0_25 /" >> /data/hadoop/etc/hadoop/hadoop-env.sh

5, modify the vi / data / hadoop / etc / hadoop / masters file as follows:

192.168.149.128

6, modify the vi / data / hadoop / etc / hadoop / slaves file as follows:

192.168.149.129

192.168.149.130

As configured above, the specific meaning of the above configuration do not do too much explanation here, do not understand when building, you can check the relevant official documents.

As above namenode basic build is completed, then we need to deploy datanode, deployment datanode relatively simple, you can perform the following operation.

1 fori in`seq 129130`; doscp -r / data / hadoop / root@192.168.149.$i: / data /; done

Since then the entire cluster basic build is completed, the next step is to start hadoop cluster.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.