Configuration example for a 4-node Hadoop cluster

Source: Internet
Author: User

Environment:

Operating system: CentOS 6.5 64bit

Hadoop:version 1.2.1

Servers:hadoopnamenode,hadoop2ndnamenode,hadoopdatanode1,hadoopdatanode2

Note: For the sake of convenience, on these 4 servers, I use the root account directly to operate

Download and environment variable settings:

On all 4 servers:

Download hadoop-1.2.1-bin.tar.gz from Apache website, unzip and place in a directory, I put under/usr/local, for the sake of convenience, I renamed the directory hadoop-1.2.1 to Hadoop.

Modify the. BASHRC and add the following environment variables:

Export Hadoop_prefix=/usr/local/hadoop

Export path= $PATH: $HADOOP _prefix/bin

Configure the hosts File:

Add in/etc/hosts:

153.65.170.11 Hadoopnamenode

153.65.170.45 Hadoop2ndnamenode

153.65.171.174 Hadoopdatanode1

153.65.171.24 Hadoopdatanode2

Configuring SSH :

Execute on Hadoopnamenode:

ssh-keygen//generating public and private keys

Ssh-copy-id–i ~/.ssh/id_rsa.pub [Email protected]//Copy the Hadoopnamenode's public key to three other servers

Ssh-copy-id–i ~/.ssh/id_rsa.pub [email protected]

Ssh-copy-id–i ~/.ssh/id_rsa.pub [email protected]

The purpose of this is to SSH from Hadoopnamenode to the other three servers without requiring a password. After Ssh-copy-id, the public key is actually added to the other three server ~/.ssh/authorized_keys files.

For example, to log in to Hadoop2ndnamenode from Hadoopnamenode, the process is probably: Hadoop2ndnamenode sends a random string to Hadoopnamenode, and Hadoopnamenode encrypts it with its own private key and send it back again. Hadoop2ndnamenode is decrypted with the pre-stored Hadoopnamenode public key and, if successful, proves that the user is trustworthy, allowing login to the shell directly and no longer requires a password.

Configure Hadoop :

In general, we will use the following command to start HDFs (that is, name node, secondary name node, data node) and MapReduce:

/usr/local/hadoop/bin/start-dfs.sh

/usr/local/hadoop/bin/start-mapred.sh

The start-dfs.sh process is probably the same:

    1. The machine that executes the command automatically becomes name node (and job Tracker),
    2. Start all the machines listed in/usr/local/hadoop/conf/slaves and act as Data node (and task Tracker).
    3. Start all the machines listed in/usr/local/hadoop/conf/masters and act as secondary name node.

The process of start-mapred.sh is similar:

    1. The machine that executes the command automatically becomes the job tracker,
    2. Start all the machines listed in/usr/local/hadoop/conf/slaves and serve as task tracker

Note:The conf/masters file is often confusing and intuitively feels that it is used for configuration name node, but because (1) the name node is not required to be specifically configured into Conf/masters, Only secondary name node needs to be configured.

According to the above description, we can start to modify the Masters file on Hadoopnamenode, delete the original content, add a line:

Hadoop2ndnamenode

Modify the slaves file on the Hadoopnamenode, delete the original content, add two lines:

Hadoopdatanode1

Hadoopdatanode2

In addition, configuration is required on hadoopdatanode1 and hadoopdatanode2 so that data node knows the name Node,task tracker know Job tracker. So modify the Conf/core-site.xml on Hadoopdatanode1 and Hadoopdatanode2, respectively:

<configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://hadoopnamenode:10001</value>

</property>

</configuration>

and Conf/mapred-site.xml:

<configuration>

<property>

<name>mapred.job.tracker</name>

<value>hadoopnamenode:10002</value>

</property>

</configuration>

Format name Node :

Execute on Hadoopnamenode:

Hadoop Namenode-format

start Hadoop :

First, execute the following command on Hadoopnamenode to start all name node, secondary name node, data node:

start-dfs.sh

You can use the JPS command to view the currently running Java processes on 4 servers, as you would normally see:

Hadoopnamenode on a process: NameNode

Hadoop2ndnamenode on a process: Secondarynamenode

Hadoopdatanode1/hadoopdatanode2 on a process: DataNode

Second, execute the following command on Hadoopnamenode to start all Job Tracker, Task tracker:

start-mapred.sh

Continue to use the JPS command to view the currently running Java processes on 4 servers, as you will see in normal situations:

Hadoopnamenode on the process: NameNode, Jobtracker

Hadoop2ndnamenode on a process: Secondarynamenode

Hadoopdatanode1/hadoopdatanode2 on the process: DataNode, Tasktracker

Turn off Hadoop :

On the Hadoopnamenode:

stop-mapred.sh

stop-dfs.sh

Other:

Name Node Admin interface: http://hadoopnamenode:50070/

JOB Tracker Management Interface: http://hadoopnamenode:50030/

Configuration example for a 4-node Hadoop cluster

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.