Hadoop installation memo

Source: Internet
Author: User

Hadoop installation memo

Refer to Liu Peng's "Practical Hadoop" and follow the instructions in hadoop 0.20.2.

Practical Hadoop: open a shortcut to cloud computing pdf hd scan Version Download

First, understand several background processes in Hadoop.

NameNode, Secondary NameNode, JobTracker, TaskTracker, and DataNode roles.

NameNode: Responsible for splitting data blocks and the node to which the data is split. It centrally manages memory and I/O.

This process is deployed on the Master node. It is a single point, and the entire system is down.

Secondary NameNode: Like NameNode, It is a helper. Each cluster has one, which communicates with NameNode and regularly stores HDFS metadata snapshots. When NameNode fails, it can be used as a backup NameNode. It is also deployed on the Master node.

JobTracker is responsible for scheduling jobs. It determines which files are run by which nodes and listens to the heartbeat sent by TaskTracker. If the heartbeat packet is not received, the task will be restarted if the task fails. Each cluster has only one JobTracker. It is deployed on the Master node.

The preceding three processes are deployed on the Master node, while TaskTracker and DataNode process are all deployed in the cluster.

DataNode reads and writes HDFS data blocks to the local file system. When the client reads and writes a database, NameNode tells the client to go to The DataNode, and then the client directly communicates with the DataNode server and operates related data blocks.

TaskTracker is also located in the slave node. It is responsible for executing specific tasks independently. Each slave node can have only one TaskTracker, but each TaskTracker can generate multiple Java virtual machines, it is used to process multiple maps and reduce in parallel. TaskTracker also interacts with JobTracker. JobTasker is responsible for assigning tasks and detecting the heartbeat of TaskTracker. If there is no heartbeat, it is considered to have crashed and will be assigned to other TaskTracker.

The deployment diagram of each process is as follows:

You can refer to the steps in the installation process, but pay attention to them.

Create a dedicated user grid running hadoop on the host and slave, and set a password-free logon mechanism for SSH. For details, refer. Integrate the contents in the public key files on all machines into an authorized_keys file to enable password-free logon to ssh.

When starting hadoop, be sure to log on as a grid user and perform operations in the grid user's home directory. Sometimes, permission issues occur, in this case, set the owner of the hadoop folder on the host and slave to the grid user and group. Run chown-R grid: grid/home/grid/hadoop-1.2.1 (this is the directory where hadoop is placed, which needs to be modified by the root user)

Then you can start the start-all.sh under the bin directory in the hadoop folder, you can see the following information, indicating that the startup is successful.

In this case, you can run the command to view the process startup status. Run the jps file in jdk on the host and you can see the following:

Run the same command on the slave node.

Now, Hadoop has been installed successfully.

Tutorial on standalone/pseudo-distributed installation and configuration of Hadoop2.4.1 under Ubuntu14.04

Install and configure Hadoop2.2.0 on CentOS

Build a Hadoop environment on Ubuntu 13.04

Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1

Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)

Configuration of Hadoop environment in Ubuntu

Detailed tutorial on creating a Hadoop environment for standalone Edition

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.