Hadoop 2.2.0 (YARN) Build notes

Last Update:2018-08-12 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Recent work needs, groping to build a Hadoop 2.2.0 (YARN) cluster, encountered some problems in the middle, in this record, I hope to help students need.

This article does not cover hadoop2.2 compilation, compilation-related issues in another article, "Hadoop 2.2.0 Source Compilation Notes", this article assumes that we have obtained the Hadoop 2.2.0 64bit release package.

Due to spark compatibility issues, we later used the version of the Hadoop 2.0.5-alpha (2.2.0 is a stable version), the 2.0.5 configuration is slightly different, there are special hints.

1. Introduction

"This section is excerpted from http://www.cnblogs.com/xia520pi/archive/2012/05/16/2503949.html"

Hadoop is an open source distributed computing platform owned by the Apache Software Foundation. With Hadoop Distributed File System (Hdfs,hadoop distributed filesystem) and MapReduce (Google MapReduce's Open source implementation) provides the user with a distributed infrastructure that is transparent to the underlying details of the system at the core of Hadoop.

For Hadoop clusters, there are two broad categories of roles: Master and Salve. A HDFS cluster is made up of a namenode and several datanode. Where Namenode as the primary server, manages the file system's namespace and the client's access to the file system; The Datanode in the cluster manages the stored data. The MapReduce framework is composed of a jobtracker running on the master node and a tasktracker running on each cluster from the node. The master is responsible for scheduling all the tasks that make up a job, which are distributed across different nodes. The master node monitors their execution and restarts the previous failed tasks, and only the tasks assigned by the master node from the node. When a job is committed, Jobtracker receives the submit job and configuration information, and distributes the configuration information to the node, dispatching the task and monitoring the execution of the Tasktracker.

As can be seen from the above introduction, HDFs and MapReduce together form the core of the Hadoop Distributed system architecture. HDFs realizes Distributed File system on the cluster, MapReduce distributed computing and task processing on the cluster. HDFS provides the support of file operation and storage in the process of MapReduce task, MapReduce realizes the distribution, tracking and execution of tasks on the basis of HDFs, and collects the results, which are the main tasks of the Hadoop distributed cluster.
2. System Environment System version

CentOS 6.4 64bit
uname-a
Linux * 2.6.32_1-7-0-0 #1 SMP * * x86_64 x86_64 x86_64 gnu/linux JAVA Environment Install JAVA 1.6
To extract the JDK to the local directory
Add java_home environment variable to. bashrc file

Export java_home=/home/< hostname>/local/jdk1.6.0_45/"
Export jre_home="/home/ Export Path= $JAVA _home/bin: $JRE _home/bin: $PATH
Export classpath=.: $JAVA _home/lib: $JRE _home/lib: $CLASSPATH

HADOOP uncompressed hadoop-2.2.0-bin_64.tar.gz (this package was compiled by me in CentOS6.4 64bit) to the user root

Export hadoop_home=/home/< hostname>/hadoop-2.2.0
[HTML] view plaincopyprint? export path= $JAVA _home/bin:$ Hadoop_home/bin: $PATH

test Local mode Hadoop is configured in local mode by default, so you can perform local tests without modifying any configuration after decompression
Create local Directory
mkdir input
Populating data
CP Conf/*.xml Input
Execute Hadoop
Bin/hadoop jar hadoop-examples-*.jar grep input Output ' dfs[a-z.] +'
View Results
[HTML] view Plaincopyprint? Cat output/*

3. The network environment is simply using two nodes because of the previous test environment and configuration:
Master machine, acting as Namenode & Datanode
Slave machine, acting as Datanode

Set hostname
HDFS use hostname instead of IP to communicate with each other, Hadoop will reverse parse hostname, even with IP, will use the hostname to start tasktracker, so all configuration files can only use hostname, Can't use IP(full of Tears). We set the following two machines separately:

Machine	IP	HOSTNAME	role
Master	192.168.216.135	Master	Namenode, Datanode
Slave	192.168.216.136	Slave1	Datanode

The command to temporarily change hostname is (root permissions)
Hostname <new_name>
Permanent changes need to modify the configuration file/etc/sysconfig/network
[HTML]View Plaincopyprint? Hostname=<new_name>
Modify Hosts FileSet the/etc/hosts file (to be set on each machine) and add the following [HTML]View Plaincopyprint? The Datanode content of 192.168.216.135 master 192.168.216.136 slave1 Namenode and/etc/hosts must correspond to the IP address and host name. Cannot use 127.0.0.1 to replace the local IP address, otherwise Hadoop uses hostname to find IP, will be "127.0.0.1" as IP addresses.
set SSH no password accessBetween master and all slave, bidirectional ssh password-free access is required (slave and slave can be implemented without implementation).
Please see the SSH no password access article, this article no longer details
Firewall SettingsStrictly speaking, it should be to open some of the corresponding ports. For simplicity's sake, we're here to close SELinux and Iptalbes. Ways to turn off SELinux [HTML]View Plaincopyprint? Setenforce 1 set SELinux becomes enforcing mode setenforce 0 setting SELinux becomes permissive mode if permanently closed, edit/etc/selinux/config [HTML]View Plaincopyprint? Selinux=disabled

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hadoop 2.2.0 (YARN) Build notes

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Hadoop 2.2.0 (YARN) Build notes

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support