This document describes how to deploy and configure Hadoop on RedHatLinuxES5. Deployment Environment list: RedhatLinuxES5: 10.68.219.42linuxidc-42; 10.68.199.165linuxidc-165JDK1. 6.20Hadoop0.20.2031. hardware environment first
Hadoop deployment
This article describes how to deploy and configure hadoop on RedHat Linux ES5.
Deployment Environment list:
Redhat Linux ES5: 10.68.219.42 linuxidc-42; 10.68.199.165 linuxidc-165
JDK 1.6.20
Hadoop 0.20.203
1. hardware environment
First, make sure that the host name and IP address of each machine can be correctly parsed. A simple test method is to use the ping command to ping the host name. For example, ping the linuxidc-42 on the linuxidc-165, if it can ping OK.
If it cannot be parsed, modify the/etc/hosts file. If this machine is used as a Namenode, you need
Add the IP addresses of all the Datanode machines in the cluster and their corresponding machine names. If the machine is used as a Datanode, you only need to add the IP addresses of the Namenode machines and their corresponding machine names to the hosts of the local machine.
Take this installation as an example:
The hosts file on the linuxidc-42 is as follows:
# Do not remove the following line, or various programs
# That require network functionality will fail.
: 1 localhost6.localdomain6 localhost6
10.68.219.42 linuxidc-42
10.68.199.165 linuxidc-165 (Note: delete or comment out this configuration 127.0.0.1 localhost) (affects normal hadoop Operation)
The hosts file on the linuxidc-165 is as follows:
# Do not remove the following line, or various programs
# That require network functionality will fail.
10.68.219.42 linuxidc-42
For hadoop, in HDFS, nodes are classified into Namenode and Datanode. There is only one Namenode (now secondNamenode is added), and there are many Datanode. In MapReduce, nodes are classified into Jobtracker and Tasktracker, there is only one Jobtracker, and many tasktrackers can be used.
I deploy Namenode and jobtracker ON THE linuxidc-42 as the datanode and tasktracker.
Of course the linuxidc-42 itself also acts as a datanode and tasktracker.
[Users and directories]
Create user hadoop on linuxidc-42, linuxidc-165, password hadoop;
Create the hadoop installation directory and change the directory owner to hadoop (created user );
During this installation, I installed hadoop to/usr/install/hadoop; ($ chown/usr/install/hadoop)
2. SSH settings
After Hadoop is started, Namenode starts and stops various daemon on each node through SSH (Secure Shell, therefore, you do not need to enter a password when executing commands between nodes. Therefore, you need to configure SSH to use the password-free public key authentication method.
For an SSH service, the linuxidc-42 is the SSH client, and the linuxidc-165 is the SSH server, so on the linuxidc-165 you need to make sure that the sshd service is started. Simply put, a key pair, namely a private key and a public key, needs to be generated on the linuxidc-42. Copy the public key to the linuxidc-165 so that, for example, when the linuxidc-42 initiates an ssh connection to the linuxidc-165, a random number is generated on the linuxidc-165 and encrypted with the public key of the linuxidc-42, sent to the linuxidc-42; The linuxidc-42 decrypts the encrypted number with the private key after receiving it, and sends the decrypted number back to the linuxidc-165, the linuxidc-165 allows the linuxidc-42 to connect after confirming that the number of decryption is correct. This completes a public key authentication process.
First, ensure that the SSH server is installed on each machine and starts properly.
[Configuration]
Configure linuxidc-42
$ Ssh-keygen-t dsa-p'-f ~ /. Ssh/id_dsa (create public key and key)
$ Cat ~ /. Ssh/id_dsa.pub> ~ /. Ssh/authorized_keys
(Append to the end of authorized_keys on the local machine. If there is no authorized_keys file, you can directly run cp)
$ Chmod 644 authorized_keys
(This step is critical. You must ensure that authorized_keys only has read and write permissions on its owner, and others do not allow write permissions. Otherwise, SSH will not work)
Configure linuxidc-165
[Hadoop @ linuxidc-42:. ssh] $ scp authorized_keys linuxidc-165:/home/hadoop/. ssh/
Scp here is remote copy through ssh, here you need to enter the password of the remote host, that is, the password of the hadoop account on the linuxidc-165 machine, of course, you can also use other methods to copy the authorized_keys file to another machine.
[Hadoop @ linuxidc-165:. ssh] $ chmod 644 authorized_keys
Test]
Now the SSH configuration on each machine has been completed, you can test it, such as the dbrg-1 to initiate an ssh connection to the dbrg-2.
[Hadoop @ linuxidc-42: ~] $ Ssh liunx-165.
If ssh is configured, the following message is displayed:
The authenticity of host [linuxidc-165] can't be established.
Key fingerprint is 1024 5f: a0: 0b: 65: d3: 82: df: AB: 44: 62: 6d: 98: 9c: fe: e9: 52.
Are you sure you want to continue connecting (yes/no )?
OpenSSH tells you that it does not know this host, but you do not have to worry about this problem, because it is the first time you log on to this host. Type "yes ". This will add the "recognition mark" of this host to "~ /. Ssh/know_hosts "file. This prompt is no longer displayed when you access this host for the second time.
Then you will find that you can establish an ssh connection without entering the password. Congratulations, the configuration is successful.
But don't forget to test the local ssh linuxidc-42
3. Configure hadoop
First, decompress and install hadoop to/usr/install/haoop.
A. Configure conf/hadoop-env.sh
Set JAVA_HOME as the root path for java installation
B. Configure conf/masters (Namenode node)
As follows:
[Hadoop @ linuxidc-42 ~] # Vi/usr/install/hadoop-0.20.203.0/conf/masters
Linuxidc-42
C. Configure conf/slaves (DataNode node)
As follows:
[Hadoop @ linuxidc-42 ~] # Vi/usr/install/hadoop-0.20.203.0/conf/slaves
Linuxidc-42
Linuxidc-165
D. Configure conf/core-site.xml
Fs. default. name
Hdfs: /// linuxidc-42: 9000
E. Configure conf/hdfs-site.xml
Dfs. replication
1
Dfs. data. dir
/Usr/install/datanodespace
F. Configure conf/mapred-site.xml
Mapred. job. tracker
Linuxidc-42: 9001
In the old version of hadoop, there was only one profile hadoop-site.xml. In the new version, three configuration files are split.
G. Configure datanode machine, linuxidc-165
As mentioned above, the environment variables and configuration files of Hadoop are on the linuxidc-42 machine. Now we need to deploy hadoop on other machines to ensure the directory structure is consistent.
[Hadoop @ linuxidc-42: ~] $ Scp-r/usr/install/hadoop linuxidc-165:/usr/install/hadoop
So far, we can say that Hadoop has been deployed on various machines. Now let's start Hadoop.
4. Start Hadoop
[Format namenode] Before starting, we need to format namenode, first enter the/usr/install/hadoop directory, and execute the following command
[Hadoop @ linuxidc-42: hadoop] $ bin/hadoop namenode-format
The format is successful. If it fails, go to the hadoop/logs/directory to view the log file.
[Start] Now we should officially start hadoop. There are many startup scripts in bin/, which can be started as needed.
* The start-all.sh starts all Hadoop daemon. Including namenode, datanode, jobtracker, tasktrack
* Stop-all.sh stops all Hadoop
* The start-mapred.sh starts the Map/Reduce daemon. Including Jobtracker and Tasktrack
* Stop-mapred.sh stops Map/Reduce daemon
* The start-dfs.sh starts the Hadoop DFS daemon. Namenode and Datanode
* Stop-dfs.sh stops DFS daemon
Here, we simply start all the daemons
[Hadoop @ linuxidc-42: hadoop] $ bin/start-all.sh
Similarly, if you want to stop hadoop
[Hadoop @ linuxidc-42: hadoop] $ bin/stop-all.sh
5. Test
The Network Interfaces of NameNode and JobTracker. Their addresses are:
NameNode-http :/// linuxidc-42: 50070/
JobTracker-http: // linuxidc-42: 50030/