Hadoop cluster full distributed Mode environment deployment

Last Update:2018-07-29 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Introduction to Hadoop

Hadoop is an open source distributed computing platform owned by the Apache Software Foundation. With Hadoop Distributed File System (Hdfs,hadoop distributed filesystem) and MapReduce (Google MapReduce's Open source implementation) provides the user with a distributed infrastructure that is transparent to the underlying details of the system at the core of Hadoop.

For Hadoop clusters, there are two broad categories of roles: Master and Salve. A HDFS cluster is made up of a namenode and several datanode. Where Namenode as the primary server, manages the file system's namespace and the client's access to the file system; The Datanode in the cluster manages the stored data. The MapReduce framework is composed of a jobtracker running on the master node and a tasktracker running on each cluster from the node. The master is responsible for scheduling all the tasks that make up a job, which are distributed across different nodes. The master node monitors their execution and restarts the previous failed tasks, and only the tasks assigned by the master node from the node. When a job is committed, Jobtracker receives the submit job and configuration information, and distributes the configuration information to the node, dispatching the task and monitoring the execution of the Tasktracker.

As can be seen from the above introduction, HDFs and MapReduce together form the core of the Hadoop Distributed system architecture. HDFs realizes Distributed File system on the cluster, MapReduce distributed computing and task processing on the cluster. HDFS provides the support of file operation and storage in the process of MapReduce task, MapReduce realizes the distribution, tracking and execution of tasks on the basis of HDFs, and collects the results, which are the main tasks of the Hadoop distributed cluster.

Prerequisite

1 Ensure that all required software is installed on each node in your cluster: sun-jdk ssh Hadoop.
2) javatm1.5.x, must be installed, recommended to choose the Java version of the Sun company release.
3 SSH must be installed and guaranteed to run sshd to manage the remote Hadoop daemon with a Hadoop script.

Experimental environment

Operating platform: VMware
Operating system: CentOS 5.9
Software version: hadoop-1.2.1,jdk-6u45
Cluster architecture: Includes 4 nodes: 1 master,3 salve, LAN connection between nodes, can ping each other. The node IP addresses are distributed as follows:

Host Name	Ip	System version	Hadoop node	Hadoop process Name
Master	192.168.137.100	Cetos 5.9	Master	Namenode,jobtracker
Slave1	192.168.137.101	Cetos 5.9	Slave	Datanode,tasktracker
Slave2	192.168.137.102	Cetos 5.9	Slave	Datanode,tasktracker
Slave3	192.168.137.103	Cetos 5.9	Slave	Datanode,tasktracker

All four nodes are CentOS5.9 systems, and there is a same user Hadoop. The master machine is primarily configured with the roles of Namenode and Jobtracker, responsible for the execution of the distributed data and decomposition tasks of the Explorer, and the role of the 3 Salve machine configuration Datanode and Tasktracker, responsible for the distributed data storage and the execution of the tasks.

Installation steps

Download: Jdk-6u45-linux-x64.bin, hadoop-1.2.1.tar.gz (host name and network configuration slightly)
Note: In the production of the Hadoop cluster environment, because the server may have many units, by configuring the DNS mapping machine name, compared to the configuration of the/etc/host method, you can avoid each node to configure their own host files, and the new node does not need to modify the/etc/of each node Host name-ip the mapping file. Reduced configuration steps and time for easy management.

1, JDK installation #/bin/bash Jdk-6u45-linux-x64.bin #mv jdk1.6.0_45/usr/local/

Add Java environment variable: #vim/etc/profile #最后添加 # Set Java environment Export Java_home=/usr/local/jdk1.6.0_45/export jre_home=/usr/ LOCAL/JDK1.6.0_45/JRE export classpath=.: $CLASSPATH: $JAVA _home/lib: $JRE _home/lib export path= $PATH: $JAVA _home/bin: $JRE _home/bin

Effective Java variable: #source/etc/profile # java-version java Version "1.6.0_45" Java (TM) SE Runtime Environment (build 1.6.0_45-b06 Java HotSpot (TM) 64-bit Server VM (build 20.45-b01, Mixed mode)

The same directory is created on all machines, and the same user can be created, preferably using the user's home path to install the Hadoop installation path. Installation paths are:/home/hadoop/hadoop-1.2.1 #useradd Hadoop #passwd Hadoop

2, SSH configuration

After the Hadoop startup, Namenode is the ssh (Secure Shell) to start and stop the various daemons on each datanode, which requires the execution of instructions between the nodes is not required to enter the form of a password, Therefore, we need to configure SSH to use the form of password-free public key authentication. Take the four machines in this article for example, now master is the master node, and he needs to connect Slave1, Slave2, and Slave3. It is necessary to make sure that SSH is installed on each machine and that the SSHD service has been started on the Datanode machine.
Switch to a Hadoop user (make sure that user Hadoop can log on without a password because the Hadoop owner we installed later is a Hadoop user.) )
1 generate key pairs on each host #su-hadoop #ssh-keygen-t rsa#cat ~/.ssh/id_rsa.pub

This command generates a key pair: Id_rsa (private key file) and Id_rsa.pub (public key file). The default is saved in the ~/.ssh/directory.
2 Add master public key to remote host Slave1 Authorized_keys file
Create a Authorized_keys #vim under/home/hadoop/.ssh/authorized_keys

Copy the public key you just copied in
Permission is set to 600. (This is important, the network does not set 600 permissions will cause landing failure)
Test login: $ ssh Slave1 the authenticity of host ' slave2 (192.168.137.101) ' can ' t be established. RSA key fingerprint is d5:18:cb:5f:92:66:74:c7:30:30:bb:36:bf:4c:ed:e9. Are you sure your want to continue connecting (yes/no)? Yes warning:permanently added ' slave2,192.168.137.101 ' (RSA) to the list of known hosts. Last Login:fri Aug 21:31:36 2013 from slave1 [hadoop@slave1 ~]$

In the same way, copy Master's public key to another node.

3. Installing Hadoop

1 Switch to Hadoop users, download the installation package, the direct decompression installation can: #su-hadoop #wget HTTP://APACHE.STU.EDU.TW/HADOOP/COMMON/HADOOP-1.2.1/ hadoop-1.2.1.tar.gz #tar-ZXVF hadoop-1.2.1.tar.gz

My installation directory is:
/home/hadoop/hadoop-1.2.1
For convenience, use the HADOOP command or start-all.sh commands to modify Master/etc/profile add the following: Export hadoop_home=/home/hadoop/hadoop-1.2.1 export Path= $PATH: $HADOOP _home/bin

After the modification is completed, execute source/etc/profile to make it effective.
2) Configure conf/hadoop-env.sh files
Configuring conf/hadoop-env.sh files, adding: Export java_home=/usr/local/jdk1.6.0_45/

Here, modify the installation location for your JDK.
To test the Hadoop installation:

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hadoop cluster full distributed Mode environment deployment

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Hadoop cluster full distributed Mode environment deployment

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support