Hadoop installation and configuration Manual
I. Preparation
Hadoop runtime environment:
- SSH service running properly
- JDK
If you have not installed it, you can install it yourself.
Ii. BASICS (single-node Hadoop)
- Hadoop download
Hadoop download: http://hadoop.apache.org/releases.html#Download
This article is based on hadoop1.0.4, download: http://labs.mop.com/apache-mirror/hadoop/common/hadoop-1.0.4/hadoop-1.0.4.tar.gz
Decompress the downloaded source code compressed package to the appropriate location, such as:/Users/yinxiu/dev/hadoop-1.0.4 (this is the location where hadoop is installed in this article)
- Environment variable (hadoop_env.sh)
Directory/Users/yinxiu/dev/hadoop-1.0.4/conf
2.1 variables required for JAVA_HOME
Export JAVA_HOME = actual JDK path
For example:
Export JAVA_HOME =/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home
2.2 optional HADOOP_HOME Variables
HADOOP_HOME is the parent directory of the bin directory by default, the configuration in this article is/Users/yinxiu/dev/hadoop-1.0.4
Export HADOOP_HOME =/Users/yinxiu/dev/hadoop-1.0.4
Note: After HADOOP_HOME is configured in the experiment, Hadoop is started. Prompt: $ HADOOP_HOME is deprecated.
This prompt indicates repeated definitions.
This warning occurs in HADOOPINSTALL/bin/hadoop-config.sh:
If ["$ HADOOP_HOME_WARN_SUPPRESS" = "] & [" $ HADOOP_HOME "! = ""]; Then
Echo "Warning: \ $ HADOOP_HOME is deprecated." 1> & 2
Echo 1> & 2
Fi
Export HADOOP_HOME =$ {HADOOP_PREFIX}
Work und: remove the HADOOP_HOME configuration, or add export HADOOP_HOME_WARN_SUPPRESS = TRUE to the hadoop-env.sh.
- Environment configuration file configuration
There are three main configuration files: core-site.xml, hdfs-site.xml, mapred-site.xml
3.1 conf/core-site.xml
<Configuration>
<Property>
<Name> fs. default. name </name>
<Value> hdfs: /// (master ip): 9000 </value>
</Property>
<Property>
<Name> hadoop. tmp. dir </name>
<Value> temporary directory for hadoop running </value>
</Property>
</Configuration>
Temporary directory for hadoop running:/Users/yinxiu/dev/hadoopdata/temp
3.2 conf/hdfs-site.xml
<Configuration>
<Property>
<Name> dfs. replication </name>
<Value> 1 </value>
</Property>
<Property>
<Name> dfs. data. dir </name>
<Value> DFS data storage directory </value>
</Property>
<Property>
<Name> dfs. name. dir </name>
<Value> location where DFS Namenode is stored </value>
</Property>
</Configuration>
DFS data storage directory:/Users/yinxiu/dev/hadoopdata/data
DFS Namenode storage location:/Users/yinxiu/dev/hadoopdata/temp/dfs/name
3.3 conf/mapred-site.xml
<Configuration>
<Property>
<Name> mapred. job. tracker </name>
<Value> (master ip addresses): 9001 </value>
</Property>
</Configuration>
You can configure the maximum number of maps running on a single node and the maximum number of reduce tasks running on a single node simultaneously:
Mapred. tasktracker. map. tasks. maximum = 8
Mapred. tasktracker. reduce. tasks. maximum = 6
- Ssh configuration (so that it can log on via ssh without a password, that is, through certificate authentication)
Sh-keygen-t dsa-p'-f ~ /. Ssh/id_dsa
Cat ~ /. Ssh/id_dsa.pub> ~ /. Ssh/authorized_keys
Note: connect to host localhost port 22: Connection refused
Make sure that ssh runs normally when running hadoop. There may be multiple causes of connect to host localhost port 22: Connection refused. For example, remote logon is not enabled.
MAC enable remote login: http://www.bluishcoder.co.nz/articles/mac-ssh.html
- Start
Go to the HADOOPINSTALL directory
5.1 format namenode
Run bin/hadoop namenode-format
5.2 simple start of all daemons
Bin/start-all.sh
5.3 stop daemon
Bin/stop-all.sh
- Verification Test
After the startup is successful, you can view the following address in your browser:
6.1 Job tracker
Http: // master ip: 50030
6.2 NameNode
Http: // master ip: 50070
Iii. Advanced article (multi-node Hadoop)
Take five machines as an example.
Node-1 NameNode
Node-2 DataNode
Node-3 DataNode
Node-4 DataNode
Node-5 DataNode
- Cluster SSH settings (namenode can log on to datanode without a password through ssh)
Generate a key pair on the machine that will act as the NameNode:
$ Ssh-keygen-t rsa-p'-f ~ /. Ssh/id_rsa
$ Cat ~ /. Ssh/id_rsa.pub> ~ /. Ssh/authorized_keys
Note the use of ssh-keygen and cat commands.
If you are prompted to enter passphrase for the generated key, press enter to set it to an empty password.
Copy the content of id_ras.pub to the. ssh/authorized_keys file of each machine (including the local machine). (if the original authorized_keys file exists, append the content of id_rsa.pub)
The cp and scp commands are required for replication and remote replication.
If ssh is configured, the following message is displayed:
The authenticity of host [servername-2] can't be established.
Key fingerprint is 1024 5f: a0: 0b: 65: d3: 82: df: AB: 44: 62: 6d: 98: 9c: fe: e9: 52.
Are you sure you want to continue connecting (yes/no )?
OpenSSH tells you that it does not know this host, but you do not have to worry about this problem, because it is the first time you log on to this host. Type "yes ". This will add the "recognition mark" of this host to "~ /. Ssh/know_hosts "file. This prompt is no longer displayed when you access this host for the second time.
Note: Authentication refused: bad ownership or modes for directory
/Root errors may be caused by permission issues or user groups. refer to the following documents.
Http://recursive-design.com/blog/2010/09/14/ssh-authentication-refused/
Http://bbs.csdn.net/topics/380198627
- Host Configuration
The host configuration is basically the same as the configuration in the basic section.
- Masters/Slaves file configuration
Add the host name to the HADOOPINSTALL/conf/masters file of the NameNode node. In this example, the content of the masters file is as follows:
Node-1
Add the Host Name of the DataNode node to HADOOPINSTALL/conf/slaves, with one host name in one row. The content is as follows:
Node-2
Node-3
Node-4
Node-5
- Deploy a Hadoop Cluster
As mentioned above, the environment variables and configuration files of Hadoop are all stored on the master host node-1. The configured hadoop is distributed to the same location of each slave, make sure that the directory structure is consistent.
Use scp for distribution.
- Start
After the configuration, format the NameNode.
Bin/hadoop
Namenode-format
Boot is consistent with the basics, starting and shutting down with a simple start-all.sh and stop-all.sh, note before starting
- Test and verification
Some materials
Http://www.cnblogs.com/xia520pi/archive/2012/05/16/2503949.html