Hadoop installation and configuration Manual

Last Update:2018-12-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I. Preparation

Hadoop runtime environment:

SSH service running properly
JDK

If you have not installed it, you can install it yourself.

Ii. BASICS (single-node Hadoop)

Hadoop download

Hadoop download: http://hadoop.apache.org/releases.html#Download

This article is based on hadoop1.0.4, download: http://labs.mop.com/apache-mirror/hadoop/common/hadoop-1.0.4/hadoop-1.0.4.tar.gz

Decompress the downloaded source code compressed package to the appropriate location, such as:/Users/yinxiu/dev/hadoop-1.0.4 (this is the location where hadoop is installed in this article)

Environment variable (hadoop_env.sh)

Directory/Users/yinxiu/dev/hadoop-1.0.4/conf

2.1 variables required for JAVA_HOME

Export JAVA_HOME = actual JDK path

For example:

Export JAVA_HOME =/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home

2.2 optional HADOOP_HOME Variables

HADOOP_HOME is the parent directory of the bin directory by default, the configuration in this article is/Users/yinxiu/dev/hadoop-1.0.4

Export HADOOP_HOME =/Users/yinxiu/dev/hadoop-1.0.4

Note: After HADOOP_HOME is configured in the experiment, Hadoop is started. Prompt: $ HADOOP_HOME is deprecated.

This prompt indicates repeated definitions.

This warning occurs in HADOOPINSTALL/bin/hadoop-config.sh:

If ["$ HADOOP_HOME_WARN_SUPPRESS" = "] & [" $ HADOOP_HOME "! = ""]; Then

Echo "Warning: \ $ HADOOP_HOME is deprecated." 1> & 2

Echo 1> & 2

Export HADOOP_HOME =$ {HADOOP_PREFIX}

Work und: remove the HADOOP_HOME configuration, or add export HADOOP_HOME_WARN_SUPPRESS = TRUE to the hadoop-env.sh.

Environment configuration file configuration

There are three main configuration files: core-site.xml, hdfs-site.xml, mapred-site.xml

3.1 conf/core-site.xml

<Name> fs. default. name </name>

<Value> hdfs: /// (master ip): 9000 </value>

</Property>

<Name> hadoop. tmp. dir </name>

<Value> temporary directory for hadoop running </value>

</Property>

</Configuration>

Temporary directory for hadoop running:/Users/yinxiu/dev/hadoopdata/temp

3.2 conf/hdfs-site.xml

<Name> dfs. replication </name>

</Property>

<Value> DFS data storage directory </value>

</Property>

<Value> location where DFS Namenode is stored </value>

</Property>

</Configuration>

DFS data storage directory:/Users/yinxiu/dev/hadoopdata/data

DFS Namenode storage location:/Users/yinxiu/dev/hadoopdata/temp/dfs/name

3.3 conf/mapred-site.xml

<Name> mapred. job. tracker </name>

<Value> (master ip addresses): 9001 </value>

</Property>

</Configuration>

You can configure the maximum number of maps running on a single node and the maximum number of reduce tasks running on a single node simultaneously:

Mapred. tasktracker. map. tasks. maximum = 8

Mapred. tasktracker. reduce. tasks. maximum = 6

Ssh configuration (so that it can log on via ssh without a password, that is, through certificate authentication)

Sh-keygen-t dsa-p'-f ~ /. Ssh/id_dsa

Cat ~ /. Ssh/id_dsa.pub> ~ /. Ssh/authorized_keys

Note: connect to host localhost port 22: Connection refused

Make sure that ssh runs normally when running hadoop. There may be multiple causes of connect to host localhost port 22: Connection refused. For example, remote logon is not enabled.

MAC enable remote login: http://www.bluishcoder.co.nz/articles/mac-ssh.html

Start

Go to the HADOOPINSTALL directory

5.1 format namenode

Run bin/hadoop namenode-format

5.2 simple start of all daemons

Bin/start-all.sh

5.3 stop daemon

Bin/stop-all.sh

Verification Test

After the startup is successful, you can view the following address in your browser:

6.1 Job tracker

Http: // master ip: 50030

6.2 NameNode

Http: // master ip: 50070

Iii. Advanced article (multi-node Hadoop)

Take five machines as an example.

Node-1 NameNode

Node-2 DataNode

Node-3 DataNode

Node-4 DataNode

Node-5 DataNode

Cluster SSH settings (namenode can log on to datanode without a password through ssh)

Generate a key pair on the machine that will act as the NameNode:

$ Ssh-keygen-t rsa-p'-f ~ /. Ssh/id_rsa

$ Cat ~ /. Ssh/id_rsa.pub> ~ /. Ssh/authorized_keys

Note the use of ssh-keygen and cat commands.

If you are prompted to enter passphrase for the generated key, press enter to set it to an empty password.

Copy the content of id_ras.pub to the. ssh/authorized_keys file of each machine (including the local machine). (if the original authorized_keys file exists, append the content of id_rsa.pub)

The cp and scp commands are required for replication and remote replication.

If ssh is configured, the following message is displayed:

The authenticity of host [servername-2] can't be established.

Key fingerprint is 1024 5f: a0: 0b: 65: d3: 82: df: AB: 44: 62: 6d: 98: 9c: fe: e9: 52.

Are you sure you want to continue connecting (yes/no )?
OpenSSH tells you that it does not know this host, but you do not have to worry about this problem, because it is the first time you log on to this host. Type "yes ". This will add the "recognition mark" of this host to "~ /. Ssh/know_hosts "file. This prompt is no longer displayed when you access this host for the second time.

Note: Authentication refused: bad ownership or modes for directory
/Root errors may be caused by permission issues or user groups. refer to the following documents.

Http://recursive-design.com/blog/2010/09/14/ssh-authentication-refused/

Http://bbs.csdn.net/topics/380198627

Host Configuration

The host configuration is basically the same as the configuration in the basic section.

Masters/Slaves file configuration

Add the host name to the HADOOPINSTALL/conf/masters file of the NameNode node. In this example, the content of the masters file is as follows:

Node-1

Add the Host Name of the DataNode node to HADOOPINSTALL/conf/slaves, with one host name in one row. The content is as follows:

Node-2

Node-3

Node-4

Node-5

Deploy a Hadoop Cluster

As mentioned above, the environment variables and configuration files of Hadoop are all stored on the master host node-1. The configured hadoop is distributed to the same location of each slave, make sure that the directory structure is consistent.

Use scp for distribution.

Start

After the configuration, format the NameNode.

Bin/hadoop
Namenode-format

Boot is consistent with the basics, starting and shutting down with a simple start-all.sh and stop-all.sh, note before starting

Test and verification

Some materials

Http://www.cnblogs.com/xia520pi/archive/2012/05/16/2503949.html

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hadoop installation and configuration Manual

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Hadoop installation and configuration Manual

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support