This work adopts the Knowledge Sharing signature-non-commercial use-share the 2.5 mainland China License Agreement in the same way
.
Recently, cloud computing has become very popular in China, but it is daunting to read the concept of cloud computing. Fortunately, not all things are lost. hadoop is one of the real technologies. I did not read the basic hadoop materials carefully recently, especially the technical documents on how to deploy hadoop. I found that there are many things that are not worth attention.
I have found many articles on how to deploy hadoop on the Internet. I am not planning to share this article here.
Hadoop
Developer getting started Journal (PDF)
Is an authoritative document;
If you have any technical questions about hadoop, go to the professional to http://bbs.hadoopor.com/"href =" http://bbs.hadoopor.com/"target =" _ blank "> hadoop Technology Forum.
Go to the discussion.
Hadoop
Composition of the Cluster
Hadoop has two core functions: HDFS and mapreduce.
. HDFS-related services include tiddlylinknonexisting "> namenode
, Secondarynamenode
And tiddlylinknonexisting "> datanode
; With mapreduce
Related services include tiddlylinknonexisting "> jobtracker
And tasktracker
.
Hadoop clusters have two roles: master and slave. The master is divided into Master and secondary master. Where:
- Master
The master also provides tiddlylinknonexisting "> namenode
, Secondarynamenode
And tiddlylinknonexisting "> jobtracker
Three services;
- Secondary master only provides tiddlylinknonexisting "> secondarynamenode
Service;
- All slave can provide tiddlylinknonexisting "> datenode
Or tasktracker
Two types of services.
Hadoop supports three cluster modes:
- Local (standalone)
Mode (no cluster mode)
- Pseudo-distributed
Mode (single-host cluster mode)
- Fully-distributed
Mode (multi-host cluster mode)
A hadoop cluster consists of multiple computers. Each computer can be used as one or more roles.
When tiddlylinknonexisting "> pseudo-distributed
Mode when a hadoop cluster is created, a computer completes the tasks of the master and slave roles at the same time. In tiddlylinknonexisting "> fully-distributed
In mode, if only one computer acts as the master, the computer completes the task of the master; if multiple computers exist as the master, the first computer completes the task of the master, other computers complete the master task.
Password-less SSH
Login
The following command is called on the master to start hadoop:
$ Tiddlylinknonexisting "> hadoop_home
// Bin/start-all.sh
During this call, hadoop starts the following services in sequence:
- Start namenode on the master
Server
Services;
- Start secondarynamenode on the master
Server
Services;
- Start secondarynamenode on the secondary master
Server
Services;
- Start datanode on all slave
Service;
- On the master
Start tiddlylinknonexisting "> jobtracker
Service;
- Tiddlylinknonexisting "> tasktracker on all slave
Service.
Pay attention to the following points:
- Start tiddlylinknonexisting "> namenode
And jobtracker
Services do not require SSH authorization;
- You need to log on via SSH before enabling secondarynamenode.
, Tiddlylinknonexisting "> datanode
And tasktracker
Server
Therefore:
- Because sencondarynamenode needs to be started
Service, so SSH authorization must be provided for the master;
- Because you need to start sencoddarynamenode
Service, so it is necessary to provide SSH authorization for all secondary masters;
- Because datanode needs to be started
And tiddlylinknonexisting "> tasktracker
Service, so it is necessary to provide SSH authorization for all slave.
All in all, you need to provide SSH authorization for all computers in the hadoop cluster.
Why is password-free SSH login required? This saves you trouble. Imagine starting
How annoying it is to manually enter the SSH password of each computer during the hadoop cluster process! The SSH authorization method is not described in detail here. Password-less SSH Login technology is also relatively mature. However, you need to pay attention to the problem of file access permissions. Performance:
- In Linux, The. Ssh directory under the $ home directory is owned by the user, and the permission must be 700 (only the user can access it );
- The authorization file "authorized_keys" in the. Ssh directory is owned by the user and the permission must be 644.
Disable Firewall
When deploying a hadoop cluster, both the master and slave firewalls must be disabled. The fundamental purpose of disabling the firewall is to save things for graph, because HDFS and doesn' t yet exist "href =" javascript:; "> mapreduce
Hadoop opens many listening ports. They
They are:
Address and port attributes related to HDFS
FS. Default. Name
Bit
Set: CONF/core-site.xml
Required: Yes
Common Values: HDFS ://
[Domain name or IP address
Address]
9000
Description: namenode
Master server address
- This item must be set in the conf/core-site.xml on all master and slave. In addition, because the hadoop architecture is in the master mode, the fs. Default. NAME value set on all master nodes and slave in a cluster should be the only tiddlylinknonexisting "> namenode
The address of the master server.
DFS. datanode. Address
Bit
Set: CONF/hdfs-site.xml
Required: No
Default Value: 0.0.0.0: 50010
Ti: tiddlylinknonexisting "> datanode
Service address
DFS. datanode. IPC. Address
Bit
Set: CONF/hdfs-site.xml
Required: No
Default Value: 0.0.0.0: 50020
Ti: tiddlylinknonexisting "> datanode
IP address of the IPC Service
DFS. http. Address
Bit
Set: CONF/hdfs-site.xml
Required: No
Default Value: 0.0.0.0: 50070
: Tiddlylinknonexisting "> namenode
HTTP status monitoring address
DFS. Secondary. http. Address
Bit
Set: CONF/hdfs-site.xml
Required: No
Default Value: 0.0.0.0: 50090
: Tiddlylinknonexisting "> secondarynamenode
HTTP status monitoring address
DFS. datanode. http. Address
Bit
Set: CONF/hdfs-site.xml
Required: No
Default Value: 0.0.0.0: 50075
Ti: tiddlylinknonexisting "> datanode
HTTP status monitoring address
And tiddlylinknonexisting "> mapreduce
Related address and port Properties
Mapred. Job. Tracker
Bit
Set: CONF/mapred-site.xml
Required: Yes
Common values:[Domain name or IP address]
9001
Description
Ming: tiddlylinknonexisting "> jobtracker
Master server address and port
Mapred. task. tracker. Report. Address
Bit
Set: CONF/mapred-site.xml
Required: No
Default Value: 127.0.0.1: 0
Note: tiddlylinknonexisting "> tasktracker is used to submit the report.
Service address
Mapred. Job. tracker. http. Address
Bit
Set: CONF/mapred-site.xml
Required: No
Default Value: 0.0.0.0: 50030
Note: tiddlylinknonexisting "> jobtracker
HTTP status monitoring address
Mapred. task. tracker. http. Address
Bit
Set: CONF/mapred-site.xml
Required: No
Default Value: 0.0.0.0: 50060
: Tiddlylinknonexisting "> tasktracker
HTTP status monitoring address