Precautions for hadoop deployment (basic)

Last Update:2018-12-03 Source: Internet

Author: User

Tags free ssh

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This work adopts the Knowledge Sharing signature-non-commercial use-share the 2.5 mainland China License Agreement in the same way
.

Recently, cloud computing has become very popular in China, but it is daunting to read the concept of cloud computing. Fortunately, not all things are lost. hadoop is one of the real technologies. I did not read the basic hadoop materials carefully recently, especially the technical documents on how to deploy hadoop. I found that there are many things that are not worth attention.
I have found many articles on how to deploy hadoop on the Internet. I am not planning to share this article here.
Hadoop
Developer getting started Journal (PDF)
Is an authoritative document;
If you have any technical questions about hadoop, go to the professional to http://bbs.hadoopor.com/"href =" http://bbs.hadoopor.com/"target =" _ blank "> hadoop Technology Forum.
Go to the discussion.

Hadoop
Composition of the Cluster

Hadoop has two core functions: HDFS and mapreduce.
. HDFS-related services include tiddlylinknonexisting "> namenode
, Secondarynamenode
And tiddlylinknonexisting "> datanode
; With mapreduce
Related services include tiddlylinknonexisting "> jobtracker
And tasktracker
.

Hadoop clusters have two roles: master and slave. The master is divided into Master and secondary master. Where:

Master
The master also provides tiddlylinknonexisting "> namenode
, Secondarynamenode
And tiddlylinknonexisting "> jobtracker
Three services;
Secondary master only provides tiddlylinknonexisting "> secondarynamenode
Service;
All slave can provide tiddlylinknonexisting "> datenode
Or tasktracker
Two types of services.

Hadoop supports three cluster modes:

Local (standalone)
Mode (no cluster mode)
Pseudo-distributed
Mode (single-host cluster mode)
Fully-distributed
Mode (multi-host cluster mode)

A hadoop cluster consists of multiple computers. Each computer can be used as one or more roles.
When tiddlylinknonexisting "> pseudo-distributed

Mode when a hadoop cluster is created, a computer completes the tasks of the master and slave roles at the same time. In tiddlylinknonexisting "> fully-distributed

In mode, if only one computer acts as the master, the computer completes the task of the master; if multiple computers exist as the master, the first computer completes the task of the master, other computers complete the master task.

Password-less SSH
Login

The following command is called on the master to start hadoop:

$ Tiddlylinknonexisting "> hadoop_home
// Bin/start-all.sh

During this call, hadoop starts the following services in sequence:

Start namenode on the master
Server
Services;
Start secondarynamenode on the master
Server
Services;
Start secondarynamenode on the secondary master
Server
Services;
Start datanode on all slave
Service;
On the master
Start tiddlylinknonexisting "> jobtracker
Service;
Tiddlylinknonexisting "> tasktracker on all slave
Service.

Pay attention to the following points:

Start tiddlylinknonexisting "> namenode
And jobtracker
Services do not require SSH authorization;
You need to log on via SSH before enabling secondarynamenode.
, Tiddlylinknonexisting "> datanode
And tasktracker
Server
Therefore:
1. Because sencondarynamenode needs to be started
  Service, so SSH authorization must be provided for the master;
2. Because you need to start sencoddarynamenode
  Service, so it is necessary to provide SSH authorization for all secondary masters;
3. Because datanode needs to be started
  And tiddlylinknonexisting "> tasktracker
  Service, so it is necessary to provide SSH authorization for all slave.

All in all, you need to provide SSH authorization for all computers in the hadoop cluster.

Why is password-free SSH login required? This saves you trouble. Imagine starting
How annoying it is to manually enter the SSH password of each computer during the hadoop cluster process! The SSH authorization method is not described in detail here. Password-less SSH Login technology is also relatively mature. However, you need to pay attention to the problem of file access permissions. Performance:

In Linux, The. Ssh directory under the $ home directory is owned by the user, and the permission must be 700 (only the user can access it );
The authorization file "authorized_keys" in the. Ssh directory is owned by the user and the permission must be 644.

Disable Firewall

When deploying a hadoop cluster, both the master and slave firewalls must be disabled. The fundamental purpose of disabling the firewall is to save things for graph, because HDFS and doesn' t yet exist "href =" javascript:; "> mapreduce
Hadoop opens many listening ports. They
They are:

Address and port attributes related to HDFS

FS. Default. Name

Bit
Set: CONF/core-site.xml
Required: Yes
Common Values: HDFS ://
[Domain name or IP address
Address]
9000
Description: namenode
Master server address

This item must be set in the conf/core-site.xml on all master and slave. In addition, because the hadoop architecture is in the master mode, the fs. Default. NAME value set on all master nodes and slave in a cluster should be the only tiddlylinknonexisting "> namenode
The address of the master server.

DFS. datanode. Address

Bit
Set: CONF/hdfs-site.xml
Required: No
Default Value: 0.0.0.0: 50010
Ti: tiddlylinknonexisting "> datanode
Service address

DFS. datanode. IPC. Address

Bit
Set: CONF/hdfs-site.xml
Required: No
Default Value: 0.0.0.0: 50020
Ti: tiddlylinknonexisting "> datanode
IP address of the IPC Service

DFS. http. Address

Bit
Set: CONF/hdfs-site.xml
Required: No
Default Value: 0.0.0.0: 50070
: Tiddlylinknonexisting "> namenode
HTTP status monitoring address

DFS. Secondary. http. Address

Bit
Set: CONF/hdfs-site.xml
Required: No
Default Value: 0.0.0.0: 50090
: Tiddlylinknonexisting "> secondarynamenode
HTTP status monitoring address

DFS. datanode. http. Address

Bit
Set: CONF/hdfs-site.xml
Required: No
Default Value: 0.0.0.0: 50075
Ti: tiddlylinknonexisting "> datanode
HTTP status monitoring address

And tiddlylinknonexisting "> mapreduce
Related address and port Properties

Mapred. Job. Tracker

Bit
Set: CONF/mapred-site.xml
Required: Yes
Common values:[Domain name or IP address]
9001
Description
Ming: tiddlylinknonexisting "> jobtracker
Master server address and port

This item must be set in the conf/mapred-site.xml on all master and slave. In addition, because the hadoop architecture is in the master mode, the value of mapred. Job. Tracker set on all master nodes in a cluster and slave should be the only one.
Tiddlylinknonexisting "> jobtracker
The address of the master server.

Mapred. task. tracker. Report. Address

Bit
Set: CONF/mapred-site.xml
Required: No
Default Value: 127.0.0.1: 0
Note: tiddlylinknonexisting "> tasktracker is used to submit the report.
Service address

Mapred. Job. tracker. http. Address

Bit
Set: CONF/mapred-site.xml
Required: No
Default Value: 0.0.0.0: 50030
Note: tiddlylinknonexisting "> jobtracker
HTTP status monitoring address

Mapred. task. tracker. http. Address

Bit
Set: CONF/mapred-site.xml
Required: No
Default Value: 0.0.0.0: 50060
: Tiddlylinknonexisting "> tasktracker
HTTP status monitoring address

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More