Hadoop installation and configuration Manual

Source: Internet
Author: User

Hadoop installation and configuration Manual

I. Preparation

Hadoop runtime environment:

  1. SSH service running properly
  2. JDK

If you have not installed it, you can install it yourself.

 

Ii. BASICS (single-node Hadoop)

  1. Hadoop download

Hadoop download: http://hadoop.apache.org/releases.html#Download

This article is based on hadoop1.0.4, download: http://labs.mop.com/apache-mirror/hadoop/common/hadoop-1.0.4/hadoop-1.0.4.tar.gz

Decompress the downloaded source code compressed package to the appropriate location, such as:/Users/yinxiu/dev/hadoop-1.0.4 (this is the location where hadoop is installed in this article)

  1. Environment variable (hadoop_env.sh)

Directory/Users/yinxiu/dev/hadoop-1.0.4/conf

2.1 variables required for JAVA_HOME

Export JAVA_HOME = actual JDK path

For example:

Export JAVA_HOME =/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home

2.2 optional HADOOP_HOME Variables

HADOOP_HOME is the parent directory of the bin directory by default, the configuration in this article is/Users/yinxiu/dev/hadoop-1.0.4

Export HADOOP_HOME =/Users/yinxiu/dev/hadoop-1.0.4

Note: After HADOOP_HOME is configured in the experiment, Hadoop is started. Prompt: $ HADOOP_HOME is deprecated.

This prompt indicates repeated definitions.

This warning occurs in HADOOPINSTALL/bin/hadoop-config.sh:

If ["$ HADOOP_HOME_WARN_SUPPRESS" = "] & [" $ HADOOP_HOME "! = ""]; Then

Echo "Warning: \ $ HADOOP_HOME is deprecated." 1> & 2

Echo 1> & 2

Fi

Export HADOOP_HOME =$ {HADOOP_PREFIX}

Work und: remove the HADOOP_HOME configuration, or add export HADOOP_HOME_WARN_SUPPRESS = TRUE to the hadoop-env.sh.

  1. Environment configuration file configuration

There are three main configuration files: core-site.xml, hdfs-site.xml, mapred-site.xml

3.1 conf/core-site.xml

<Configuration>

<Property>

<Name> fs. default. name </name>

<Value> hdfs: /// (master ip): 9000 </value>

</Property>

<Property>

<Name> hadoop. tmp. dir </name>

<Value> temporary directory for hadoop running </value>

</Property>

</Configuration>

Temporary directory for hadoop running:/Users/yinxiu/dev/hadoopdata/temp

3.2 conf/hdfs-site.xml

<Configuration>

<Property>

<Name> dfs. replication </name>

<Value> 1 </value>

</Property>

<Property>

<Name> dfs. data. dir </name>

<Value> DFS data storage directory </value>

</Property>

<Property>

<Name> dfs. name. dir </name>

<Value> location where DFS Namenode is stored </value>

</Property>

</Configuration>

DFS data storage directory:/Users/yinxiu/dev/hadoopdata/data

DFS Namenode storage location:/Users/yinxiu/dev/hadoopdata/temp/dfs/name

3.3 conf/mapred-site.xml

<Configuration>

<Property>

<Name> mapred. job. tracker </name>

<Value> (master ip addresses): 9001 </value>

</Property>

</Configuration>

You can configure the maximum number of maps running on a single node and the maximum number of reduce tasks running on a single node simultaneously:

Mapred. tasktracker. map. tasks. maximum = 8

Mapred. tasktracker. reduce. tasks. maximum = 6

  1. Ssh configuration (so that it can log on via ssh without a password, that is, through certificate authentication)

Sh-keygen-t dsa-p'-f ~ /. Ssh/id_dsa

Cat ~ /. Ssh/id_dsa.pub> ~ /. Ssh/authorized_keys

Note: connect to host localhost port 22: Connection refused

Make sure that ssh runs normally when running hadoop. There may be multiple causes of connect to host localhost port 22: Connection refused. For example, remote logon is not enabled.

MAC enable remote login: http://www.bluishcoder.co.nz/articles/mac-ssh.html

  1. Start

Go to the HADOOPINSTALL directory

5.1 format namenode

Run bin/hadoop namenode-format

5.2 simple start of all daemons

Bin/start-all.sh

5.3 stop daemon

Bin/stop-all.sh

  1. Verification Test

After the startup is successful, you can view the following address in your browser:

6.1 Job tracker

Http: // master ip: 50030

 

6.2 NameNode

Http: // master ip: 50070

 

Iii. Advanced article (multi-node Hadoop)

Take five machines as an example.

Node-1 NameNode

Node-2 DataNode

Node-3 DataNode

Node-4 DataNode

Node-5 DataNode

 

  1. Cluster SSH settings (namenode can log on to datanode without a password through ssh)

Generate a key pair on the machine that will act as the NameNode:

$ Ssh-keygen-t rsa-p'-f ~ /. Ssh/id_rsa

$ Cat ~ /. Ssh/id_rsa.pub> ~ /. Ssh/authorized_keys

Note the use of ssh-keygen and cat commands.

If you are prompted to enter passphrase for the generated key, press enter to set it to an empty password.

Copy the content of id_ras.pub to the. ssh/authorized_keys file of each machine (including the local machine). (if the original authorized_keys file exists, append the content of id_rsa.pub)

The cp and scp commands are required for replication and remote replication.

If ssh is configured, the following message is displayed:

The authenticity of host [servername-2] can't be established.

Key fingerprint is 1024 5f: a0: 0b: 65: d3: 82: df: AB: 44: 62: 6d: 98: 9c: fe: e9: 52.

Are you sure you want to continue connecting (yes/no )?
OpenSSH tells you that it does not know this host, but you do not have to worry about this problem, because it is the first time you log on to this host. Type "yes ". This will add the "recognition mark" of this host to "~ /. Ssh/know_hosts "file. This prompt is no longer displayed when you access this host for the second time.

Note: Authentication refused: bad ownership or modes for directory
/Root errors may be caused by permission issues or user groups. refer to the following documents.

Http://recursive-design.com/blog/2010/09/14/ssh-authentication-refused/

Http://bbs.csdn.net/topics/380198627

  1. Host Configuration

The host configuration is basically the same as the configuration in the basic section.

  1. Masters/Slaves file configuration


Add the host name to the HADOOPINSTALL/conf/masters file of the NameNode node. In this example, the content of the masters file is as follows:

Node-1

 

Add the Host Name of the DataNode node to HADOOPINSTALL/conf/slaves, with one host name in one row. The content is as follows:

Node-2

Node-3

Node-4

Node-5

  1. Deploy a Hadoop Cluster

As mentioned above, the environment variables and configuration files of Hadoop are all stored on the master host node-1. The configured hadoop is distributed to the same location of each slave, make sure that the directory structure is consistent.

Use scp for distribution.

  1. Start

After the configuration, format the NameNode.

Bin/hadoop
Namenode-format

Boot is consistent with the basics, starting and shutting down with a simple start-all.sh and stop-all.sh, note before starting

  1. Test and verification

 

 

 

 

 

 

Some materials

Http://www.cnblogs.com/xia520pi/archive/2012/05/16/2503949.html

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.