Hadoop installation memo

Last Update:2015-05-09 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Refer to Liu Peng's "Practical Hadoop" and follow the instructions in hadoop 0.20.2.

Practical Hadoop: open a shortcut to cloud computing pdf hd scan Version Download

First, understand several background processes in Hadoop.

NameNode, Secondary NameNode, JobTracker, TaskTracker, and DataNode roles.

NameNode: Responsible for splitting data blocks and the node to which the data is split. It centrally manages memory and I/O.

This process is deployed on the Master node. It is a single point, and the entire system is down.

Secondary NameNode: Like NameNode, It is a helper. Each cluster has one, which communicates with NameNode and regularly stores HDFS metadata snapshots. When NameNode fails, it can be used as a backup NameNode. It is also deployed on the Master node.

JobTracker is responsible for scheduling jobs. It determines which files are run by which nodes and listens to the heartbeat sent by TaskTracker. If the heartbeat packet is not received, the task will be restarted if the task fails. Each cluster has only one JobTracker. It is deployed on the Master node.

The preceding three processes are deployed on the Master node, while TaskTracker and DataNode process are all deployed in the cluster.

DataNode reads and writes HDFS data blocks to the local file system. When the client reads and writes a database, NameNode tells the client to go to The DataNode, and then the client directly communicates with the DataNode server and operates related data blocks.

TaskTracker is also located in the slave node. It is responsible for executing specific tasks independently. Each slave node can have only one TaskTracker, but each TaskTracker can generate multiple Java virtual machines, it is used to process multiple maps and reduce in parallel. TaskTracker also interacts with JobTracker. JobTasker is responsible for assigning tasks and detecting the heartbeat of TaskTracker. If there is no heartbeat, it is considered to have crashed and will be assigned to other TaskTracker.

The deployment diagram of each process is as follows:

You can refer to the steps in the installation process, but pay attention to them.

Create a dedicated user grid running hadoop on the host and slave, and set a password-free logon mechanism for SSH. For details, refer. Integrate the contents in the public key files on all machines into an authorized_keys file to enable password-free logon to ssh.

When starting hadoop, be sure to log on as a grid user and perform operations in the grid user's home directory. Sometimes, permission issues occur, in this case, set the owner of the hadoop folder on the host and slave to the grid user and group. Run chown-R grid: grid/home/grid/hadoop-1.2.1 (this is the directory where hadoop is placed, which needs to be modified by the root user)

Then you can start the start-all.sh under the bin directory in the hadoop folder, you can see the following information, indicating that the startup is successful.

In this case, you can run the command to view the process startup status. Run the jps file in jdk on the host and you can see the following:

Run the same command on the slave node.

Now, Hadoop has been installed successfully.

Tutorial on standalone/pseudo-distributed installation and configuration of Hadoop2.4.1 under Ubuntu14.04

Install and configure Hadoop2.2.0 on CentOS

Build a Hadoop environment on Ubuntu 13.04

Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1

Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)

Configuration of Hadoop environment in Ubuntu

Detailed tutorial on creating a Hadoop environment for standalone Edition

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hadoop installation memo

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Hadoop installation memo

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support