Hadoop fully distributed cluster Construction

Source: Internet
Author: User

Build a Hadoop distributed cluster (Environment: Linux virtual machine)

1. Preparations: (Plan the host name, ip address, and usage. Set up three hosts first, and add four hosts dynamically.

In the usage column, you can also set namenode, secondaryNamenode, and jobTracker

Separate deployment, depending on actual needs, not unique)

Host Name machine ip usage

Cloud01 192.168.1.101 namenode/secondaryNamenode/jobTracker

Cloud02 192.168.1.102 datanode/taskTracker

Cloud03 192.168.1.103 datanode/taskTracker

Cloud04 192.168.1.104 datanode/taskTracker

2. Configure the linux environment (refer to the pseudo-distributed architecture below)

2.1 modify the Host Name (cloud01, cloud02, cloud03)

2.2 modify the ip address of each machine (allocated by yourself)

2.3 modify the ing between host names and ip addresses

(Only modification on cloud01. After modification, copy the modification to another machine. The command is as follows:

Scp/etc/profile cloud02:/etc/

Scp/etc/profile cloud03:/etc /)

2.4 disable Firewall

2.5 restart

3. Install jdk (refer to the pseudo-distributed architecture. The jdk 1.6.0 _ 45 version is used as an example)

You only need to install it on one machine and then copy it to another machine (the software should be managed in a unified manner)

For example, on cloud01, jdk is installed under/soft/java.

(Instruction: scp-r/soft/java/cloud02:/soft/

Scp-r/soft/java/cloud03:/soft/

You can copy the jdk. But we will not copy it for the time being. After the following hadoop is installed, copy it together)

4. Install hadoop cluster (hadoop version in hadoop-1.1.2 as an example)

4.1 upload the hadoop compressed package to the/soft directory and decompress it to the directory (refer to the pseudo-distributed architecture)

4.2 configure hadoop (six files need to be configured this time)

4.21hadoop-env. sh

In row 9

Export JAVA_HOME =/soft/java/jdk1.6.0 _ 45 (remove the # above)

4.22core-site. xml

<! -- Specify the communication address of the HDFS namenode -->

<Property>

<Name> fs. default. name </name>

<Value> hdfs: // cloud01: 9000 </value>

</Property>

<! -- Specify the directory for storing files generated during hadoop running -->

<Property>

<Name> hadoop. tmp. dir </name>

<Value>/soft/hadoop-1.1.2/tmp </value>

</Property>

4.23hdfs-site. xml

<! -- Configure the number of HDFS replicas (as defined by the actual situation based on requirements, the default value is 3) -->

<Property>

<Name> dfs. replication </name>

<Value> 3 </value>

</Property>

4.24mapred-site. xml

<! -- Specify the jobtracker address -->

<Property>

<Name> mapred. job. tracker </name>

<Value> cloud01: 9001 </value>

</Property>

4.25 masters (specify the secondarynamenode address)

Cloud01

4.26 slaves (subnode)

Cloud02

Cloud03

4.3 copy the configured hadoop to the other two machines.

Copy the soft folder directly (including jdk and hadoop). Therefore, we strongly recommend that you

File Management)

Command:

Scp-r/soft/cloud02 :/

Scp-r/soft/cloud03 :/

4.4 configure ssh Login-free

Logon-free access from the master node to the sub-node

That is, login-free access from cloud01 to cloud02 and cloud03

Generate on cloud01

Command: ssh-keygen-t rsa

Then copy it to the other two machines.

Command: ssh-copy-id-I cloud02

Ssh-copy-id-I cloud03

4.5 format hadoop

You only need to format it on cloud01 (master node namenode ).

Command: hadoop namenode-format

4.6 Verification

Startup cluster command: start-all.sh

If an Exception related to safemode is reported during startup

Run the command: hadoop dfsadmin-safemode leave (Exit security mode)

Start hadoop again

Then, run the jps command to check the machines and see if they are the same as the intended use)

OK. If it is the same as planned, it will be done.

5. dynamically Add a node

(It is common and practical in the actual production process)

Cloud04 192.168.1.104 datanode/taskTracker

5.1 Add a linux instance through clone (taking clone cloud01 as an example. This is not the case in actual production,

Because virtual machines are rarely used in the actual production process, they are all direct servers. Note that when cloning,

The host to be cloned must be stopped first)

5.2 modify the host name, IP address, configure the ing file, disable the firewall, and then configure hadoop

Add cloud04 to the file slaves to set login-free and restart

(Clone, you do not need to configure the ing file or disable the firewall.

The machine you cloned has been configured)

5.3 after the machine is restarted, start datanode and taskTracker respectively.

Command: hadoop-daemon.sh start datanode

Hadoop-daemon.sh start tasktracker

5.4 run the command to refresh the node where the namenode is located on cloud01

Hadoop dfsadmin-refreshNodes

5.5 Verification

Http: // linux ip Address: 50070 (hdfs Management Interface)

Check whether there is one more node. If there is one more node, it will be done!

6. delete a node (collection here)

6.1 Modify/soft/hadoop-1.1.2/conf/hdfs-site.xml files on cloud01

Add configuration information:

<Property>

<Name> dfs. hosts. exclude </name>

<Value>/soft/hadoop-1.1.2/conf/excludes </value>

</Property>

6.2 confirm the machine to be dismounted

Dfs. hosts. exclude defines the file content as one line for each machine to be deprecated.

6.3 force reload Configuration

Command: hadoop dfsadmin-refreshNodes

6.4 close a node

Command: hadoop dfsadmin-report

You can view the nodes connected to the current cluster.

Executing Decommission will show:

Decommission Status: Decommission in progress

After the execution is completed, the following displays:

Decommission Status: Decommissioned

6.5 edit the excludes file again

Once the machine is dismounted, they can be removed from the excludes file.

Log on to the machine to be dismounted and you will find that the DataNode process is gone, but TaskTracker still exists,

You need to handle it manually

Build a Hadoop environment on Ubuntu 13.04

Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1

Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)

Configuration of Hadoop environment in Ubuntu

Detailed tutorial on creating a Hadoop environment for standalone Edition

Build a Hadoop environment (using virtual machines to build two Ubuntu systems in a Winodws environment)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.