Precautions for hadoop deployment (basic)

Source: Internet
Author: User
Tags free ssh



This work adopts the Knowledge Sharing signature-non-commercial use-share the 2.5 mainland China License Agreement in the same way
.

 

 

Recently, cloud computing has become very popular in China, but it is daunting to read the concept of cloud computing. Fortunately, not all things are lost. hadoop is one of the real technologies. I did not read the basic hadoop materials carefully recently, especially the technical documents on how to deploy hadoop. I found that there are many things that are not worth attention.
I have found many articles on how to deploy hadoop on the Internet. I am not planning to share this article here.
Hadoop
Developer getting started Journal (PDF)
Is an authoritative document;
If you have any technical questions about hadoop, go to the professional to http://bbs.hadoopor.com/"href =" http://bbs.hadoopor.com/"target =" _ blank "> hadoop Technology Forum.
Go to the discussion.

Hadoop
Composition of the Cluster

Hadoop has two core functions: HDFS and mapreduce.
. HDFS-related services include tiddlylinknonexisting "> namenode
, Secondarynamenode
And tiddlylinknonexisting "> datanode
; With mapreduce
Related services include tiddlylinknonexisting "> jobtracker
And tasktracker
.

Hadoop clusters have two roles: master and slave. The master is divided into Master and secondary master. Where:

  • Master
    The master also provides tiddlylinknonexisting "> namenode
    , Secondarynamenode
    And tiddlylinknonexisting "> jobtracker
    Three services;
  • Secondary master only provides tiddlylinknonexisting "> secondarynamenode
    Service;
  • All slave can provide tiddlylinknonexisting "> datenode
    Or tasktracker
    Two types of services.

Hadoop supports three cluster modes:

  • Local (standalone)
    Mode (no cluster mode)
  • Pseudo-distributed
    Mode (single-host cluster mode)
  • Fully-distributed
    Mode (multi-host cluster mode)

A hadoop cluster consists of multiple computers. Each computer can be used as one or more roles.
When tiddlylinknonexisting "> pseudo-distributed

Mode when a hadoop cluster is created, a computer completes the tasks of the master and slave roles at the same time. In tiddlylinknonexisting "> fully-distributed

In mode, if only one computer acts as the master, the computer completes the task of the master; if multiple computers exist as the master, the first computer completes the task of the master, other computers complete the master task.

Password-less SSH
Login

The following command is called on the master to start hadoop:

$ Tiddlylinknonexisting "> hadoop_home
// Bin/start-all.sh

During this call, hadoop starts the following services in sequence:

  • Start namenode on the master
    Server
    Services;
  • Start secondarynamenode on the master
    Server
    Services;
  • Start secondarynamenode on the secondary master
    Server
    Services;
  • Start datanode on all slave
    Service;
  • On the master
    Start tiddlylinknonexisting "> jobtracker
    Service;
  • Tiddlylinknonexisting "> tasktracker on all slave
    Service.

Pay attention to the following points:

  1. Start tiddlylinknonexisting "> namenode
    And jobtracker
    Services do not require SSH authorization;
  2. You need to log on via SSH before enabling secondarynamenode.
    , Tiddlylinknonexisting "> datanode
    And tasktracker
    Server
    Therefore:
    1. Because sencondarynamenode needs to be started
      Service, so SSH authorization must be provided for the master;
    2. Because you need to start sencoddarynamenode
      Service, so it is necessary to provide SSH authorization for all secondary masters;
    3. Because datanode needs to be started
      And tiddlylinknonexisting "> tasktracker
      Service, so it is necessary to provide SSH authorization for all slave.

All in all, you need to provide SSH authorization for all computers in the hadoop cluster.

Why is password-free SSH login required? This saves you trouble. Imagine starting
How annoying it is to manually enter the SSH password of each computer during the hadoop cluster process! The SSH authorization method is not described in detail here. Password-less SSH Login technology is also relatively mature. However, you need to pay attention to the problem of file access permissions. Performance:

  • In Linux, The. Ssh directory under the $ home directory is owned by the user, and the permission must be 700 (only the user can access it );
  • The authorization file "authorized_keys" in the. Ssh directory is owned by the user and the permission must be 644.

 

Disable Firewall

When deploying a hadoop cluster, both the master and slave firewalls must be disabled. The fundamental purpose of disabling the firewall is to save things for graph, because HDFS and doesn' t yet exist "href =" javascript:; "> mapreduce
Hadoop opens many listening ports. They
They are:

Address and port attributes related to HDFS

FS. Default. Name

Bit
Set: CONF/core-site.xml
Required: Yes
Common Values: HDFS ://
[Domain name or IP address
Address]

9000
Description: namenode
Master server address

  • This item must be set in the conf/core-site.xml on all master and slave. In addition, because the hadoop architecture is in the master mode, the fs. Default. NAME value set on all master nodes and slave in a cluster should be the only tiddlylinknonexisting "> namenode
    The address of the master server.

DFS. datanode. Address

Bit
Set: CONF/hdfs-site.xml
Required: No
Default Value: 0.0.0.0: 50010
Ti: tiddlylinknonexisting "> datanode
Service address

DFS. datanode. IPC. Address

Bit
Set: CONF/hdfs-site.xml
Required: No
Default Value: 0.0.0.0: 50020
Ti: tiddlylinknonexisting "> datanode
IP address of the IPC Service

DFS. http. Address

Bit
Set: CONF/hdfs-site.xml
Required: No
Default Value: 0.0.0.0: 50070
: Tiddlylinknonexisting "> namenode
HTTP status monitoring address

DFS. Secondary. http. Address

Bit
Set: CONF/hdfs-site.xml
Required: No
Default Value: 0.0.0.0: 50090
: Tiddlylinknonexisting "> secondarynamenode
HTTP status monitoring address

DFS. datanode. http. Address

Bit
Set: CONF/hdfs-site.xml
Required: No
Default Value: 0.0.0.0: 50075
Ti: tiddlylinknonexisting "> datanode
HTTP status monitoring address

 

And tiddlylinknonexisting "> mapreduce
Related address and port Properties

Mapred. Job. Tracker

Bit
Set: CONF/mapred-site.xml
Required: Yes
Common values:[Domain name or IP address]
9001
Description
Ming: tiddlylinknonexisting "> jobtracker
Master server address and port

  • This item must be set in the conf/mapred-site.xml on all master and slave. In addition, because the hadoop architecture is in the master mode, the value of mapred. Job. Tracker set on all master nodes in a cluster and slave should be the only one.

    Tiddlylinknonexisting "> jobtracker
    The address of the master server.

Mapred. task. tracker. Report. Address

Bit
Set: CONF/mapred-site.xml
Required: No
Default Value: 127.0.0.1: 0
Note: tiddlylinknonexisting "> tasktracker is used to submit the report.
Service address

Mapred. Job. tracker. http. Address

Bit
Set: CONF/mapred-site.xml
Required: No
Default Value: 0.0.0.0: 50030
Note: tiddlylinknonexisting "> jobtracker
HTTP status monitoring address

Mapred. task. tracker. http. Address

Bit
Set: CONF/mapred-site.xml
Required: No
Default Value: 0.0.0.0: 50060
: Tiddlylinknonexisting "> tasktracker
HTTP status monitoring address

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.