Hadoop0.20 installation and deployment notes on RedHatLinuxES5

Source: Internet
Author: User
Tags ssh server
This document describes how to deploy and configure Hadoop on RedHatLinuxES5. Deployment Environment list: RedhatLinuxES5: 10.68.219.42linuxidc-42; 10.68.199.165linuxidc-165JDK1. 6.20Hadoop0.20.2031. hardware environment first

Hadoop deployment

This article describes how to deploy and configure hadoop on RedHat Linux ES5.

Deployment Environment list:

Redhat Linux ES5: 10.68.219.42 linuxidc-42; 10.68.199.165 linuxidc-165

JDK 1.6.20

Hadoop 0.20.203

1. hardware environment

First, make sure that the host name and IP address of each machine can be correctly parsed. A simple test method is to use the ping command to ping the host name. For example, ping the linuxidc-42 on the linuxidc-165, if it can ping OK.

If it cannot be parsed, modify the/etc/hosts file. If this machine is used as a Namenode, you need

Add the IP addresses of all the Datanode machines in the cluster and their corresponding machine names. If the machine is used as a Datanode, you only need to add the IP addresses of the Namenode machines and their corresponding machine names to the hosts of the local machine.

Take this installation as an example:

The hosts file on the linuxidc-42 is as follows:

# Do not remove the following line, or various programs

# That require network functionality will fail.

: 1 localhost6.localdomain6 localhost6

10.68.219.42 linuxidc-42

10.68.199.165 linuxidc-165 (Note: delete or comment out this configuration 127.0.0.1 localhost) (affects normal hadoop Operation)

 

The hosts file on the linuxidc-165 is as follows:

# Do not remove the following line, or various programs

# That require network functionality will fail.

10.68.219.42 linuxidc-42

 

For hadoop, in HDFS, nodes are classified into Namenode and Datanode. There is only one Namenode (now secondNamenode is added), and there are many Datanode. In MapReduce, nodes are classified into Jobtracker and Tasktracker, there is only one Jobtracker, and many tasktrackers can be used.

I deploy Namenode and jobtracker ON THE linuxidc-42 as the datanode and tasktracker.

Of course the linuxidc-42 itself also acts as a datanode and tasktracker.

 

[Users and directories]

Create user hadoop on linuxidc-42, linuxidc-165, password hadoop;

Create the hadoop installation directory and change the directory owner to hadoop (created user );

During this installation, I installed hadoop to/usr/install/hadoop; ($ chown/usr/install/hadoop)

2. SSH settings

After Hadoop is started, Namenode starts and stops various daemon on each node through SSH (Secure Shell, therefore, you do not need to enter a password when executing commands between nodes. Therefore, you need to configure SSH to use the password-free public key authentication method.

For an SSH service, the linuxidc-42 is the SSH client, and the linuxidc-165 is the SSH server, so on the linuxidc-165 you need to make sure that the sshd service is started. Simply put, a key pair, namely a private key and a public key, needs to be generated on the linuxidc-42. Copy the public key to the linuxidc-165 so that, for example, when the linuxidc-42 initiates an ssh connection to the linuxidc-165, a random number is generated on the linuxidc-165 and encrypted with the public key of the linuxidc-42, sent to the linuxidc-42; The linuxidc-42 decrypts the encrypted number with the private key after receiving it, and sends the decrypted number back to the linuxidc-165, the linuxidc-165 allows the linuxidc-42 to connect after confirming that the number of decryption is correct. This completes a public key authentication process.

First, ensure that the SSH server is installed on each machine and starts properly.

[Configuration]

Configure linuxidc-42

$ Ssh-keygen-t dsa-p'-f ~ /. Ssh/id_dsa (create public key and key)

$ Cat ~ /. Ssh/id_dsa.pub> ~ /. Ssh/authorized_keys

(Append to the end of authorized_keys on the local machine. If there is no authorized_keys file, you can directly run cp)

$ Chmod 644 authorized_keys

(This step is critical. You must ensure that authorized_keys only has read and write permissions on its owner, and others do not allow write permissions. Otherwise, SSH will not work)

Configure linuxidc-165

[Hadoop @ linuxidc-42:. ssh] $ scp authorized_keys linuxidc-165:/home/hadoop/. ssh/

Scp here is remote copy through ssh, here you need to enter the password of the remote host, that is, the password of the hadoop account on the linuxidc-165 machine, of course, you can also use other methods to copy the authorized_keys file to another machine.

[Hadoop @ linuxidc-165:. ssh] $ chmod 644 authorized_keys

Test]

Now the SSH configuration on each machine has been completed, you can test it, such as the dbrg-1 to initiate an ssh connection to the dbrg-2.

[Hadoop @ linuxidc-42: ~] $ Ssh liunx-165.

If ssh is configured, the following message is displayed:

The authenticity of host [linuxidc-165] can't be established.

Key fingerprint is 1024 5f: a0: 0b: 65: d3: 82: df: AB: 44: 62: 6d: 98: 9c: fe: e9: 52.

Are you sure you want to continue connecting (yes/no )?

OpenSSH tells you that it does not know this host, but you do not have to worry about this problem, because it is the first time you log on to this host. Type "yes ". This will add the "recognition mark" of this host to "~ /. Ssh/know_hosts "file. This prompt is no longer displayed when you access this host for the second time.

Then you will find that you can establish an ssh connection without entering the password. Congratulations, the configuration is successful.

But don't forget to test the local ssh linuxidc-42

3. Configure hadoop

First, decompress and install hadoop to/usr/install/haoop.

A. Configure conf/hadoop-env.sh

Set JAVA_HOME as the root path for java installation

B. Configure conf/masters (Namenode node)

As follows:

[Hadoop @ linuxidc-42 ~] # Vi/usr/install/hadoop-0.20.203.0/conf/masters

Linuxidc-42

C. Configure conf/slaves (DataNode node)

As follows:

[Hadoop @ linuxidc-42 ~] # Vi/usr/install/hadoop-0.20.203.0/conf/slaves

Linuxidc-42

Linuxidc-165

D. Configure conf/core-site.xml

 

Fs. default. name

Hdfs: /// linuxidc-42: 9000

E. Configure conf/hdfs-site.xml

Dfs. replication

1

Dfs. data. dir

/Usr/install/datanodespace

F. Configure conf/mapred-site.xml

 

Mapred. job. tracker

Linuxidc-42: 9001

In the old version of hadoop, there was only one profile hadoop-site.xml. In the new version, three configuration files are split.

G. Configure datanode machine, linuxidc-165

As mentioned above, the environment variables and configuration files of Hadoop are on the linuxidc-42 machine. Now we need to deploy hadoop on other machines to ensure the directory structure is consistent.

[Hadoop @ linuxidc-42: ~] $ Scp-r/usr/install/hadoop linuxidc-165:/usr/install/hadoop

So far, we can say that Hadoop has been deployed on various machines. Now let's start Hadoop.

4. Start Hadoop

[Format namenode] Before starting, we need to format namenode, first enter the/usr/install/hadoop directory, and execute the following command

[Hadoop @ linuxidc-42: hadoop] $ bin/hadoop namenode-format

The format is successful. If it fails, go to the hadoop/logs/directory to view the log file.

[Start] Now we should officially start hadoop. There are many startup scripts in bin/, which can be started as needed.

* The start-all.sh starts all Hadoop daemon. Including namenode, datanode, jobtracker, tasktrack

* Stop-all.sh stops all Hadoop

* The start-mapred.sh starts the Map/Reduce daemon. Including Jobtracker and Tasktrack

* Stop-mapred.sh stops Map/Reduce daemon

* The start-dfs.sh starts the Hadoop DFS daemon. Namenode and Datanode

* Stop-dfs.sh stops DFS daemon

Here, we simply start all the daemons

[Hadoop @ linuxidc-42: hadoop] $ bin/start-all.sh

Similarly, if you want to stop hadoop

[Hadoop @ linuxidc-42: hadoop] $ bin/stop-all.sh

5. Test

The Network Interfaces of NameNode and JobTracker. Their addresses are:

NameNode-http :/// linuxidc-42: 50070/

JobTracker-http: // linuxidc-42: 50030/

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.