Hadoop Installation Memo

Source: Internet
Author: User
Tags free ssh

Refer to Liu Peng's "Live Hadoop" book, in accordance with a few points of attention in Hadoop 0.20.2.

First, start by understanding several background processes in Hadoop.

Namenode,secondary Namenode,jobtracker,tasktracker,datanode these characters.

NameNode: is responsible for how to slice and dice the data block, and which node to put. It is centrally managed for memory and I/O.

This process is deployed on the master node, is a single point, it hangs the entire system is hung up.

Secondary NameNode: As with NameNode, the auxiliary program. Each cluster has one, which communicates with the Namenode and periodically saves the HDFs metadata snapshot when the Namenode fault can be used as a standby namenode. It is also deployed on the master node.

Jobtracker is responsible for dispatching the job, which determines which files are run by which nodes, and listens for heartbeats sent by Tasktracker. When a heartbeat is not received, a task is considered to fail, and a task is determined to be restarted. There is only one jobtracker per cluster. It is deployed on the master node.

The above three processes are deployed on the master node, while the Tasktracker and Datanode process processes are required to be deployed for each of the various points in a cluster.

The Datanode is responsible for reading and writing HDFS data blocks to the local file system. When the client reads and writes a database, the Namenode tells the client to go to that Datanode, and the client communicates directly with the Datanode server and operates the relevant data block.

Tasktracker is also located from the node, which is responsible for executing the specific task independently, each slave node can have only one tasktracker, but each tasktracker can produce multiple Java virtual machines for parallel processing of multiple map and reduce think. Tasktracker also interacts with Jobtracker, Jobtasker is responsible for assigning tasks, and detects tasktracker heartbeat, if there is no heartbeat, it is considered to have crashed and will be considered to be assigned to other tasktracker.

The deployment diagram for each process is as follows:

650) this.width=650; "title=" clipboard "style=" border-top:0px; border-right:0px; Background-image:none; border-bottom:0px; padding-top:0px; padding-left:0px; margin:0px; border-left:0px; padding-right:0px "border=" 0 "alt=" clipboard "src=" http://s3.51cto.com/wyfs02/M00/6B/F3/ Wkiol1u7nkgbqs9gaageg2yscne517.jpg "" 425 "height=" 508 "/>

Specific installation links can be in reference to the steps, but there are a few points to note.

Host and Slave Unified create a dedicated user to run Hadoop grid, set up SSH password-free login mechanism, you can refer to http://chenlb.iteye.com/blog/211809. The contents of all the public key files on the machine are integrated into a Authorized_keys file to realize the mutual password-free SSH login.

When starting Hadoop, note that to log in as a grid user, to operate in a grid user's home directory, and sometimes permissions, it is important to note that the owner of the Hadoop folder for the host and slave is set to grid users and groups. Execute Chown-r grid:grid/home/grid/hadoop-1.2.1 (here is the placement directory for Hadoop, which is to be modified using the root user)

You can then start start-all.sh in the bin directory in the Hadoop folder, and you can see the following information stating that the startup was successful.

650) this.width=650; "title=" clipboard[1] "style=" border-top:0px; border-right:0px; Background-image:none; border-bottom:0px; padding-top:0px; padding-left:0px; margin:0px; border-left:0px; padding-right:0px "border=" 0 "alt=" clipboard[1] "src=" http://s3.51cto.com/wyfs02/M01/6B/F3/wKioL1U7nkLCn1E_ Aaj7qxjuhae603.jpg "" 681 "height=" 228 "/>

At this point you can also run the command to see the start of the process, run the JDK JPS file on the host, you can see the following:

650) this.width=650; "title=" clipboard[2] "style=" border-top:0px; border-right:0px; Background-image:none; border-bottom:0px; padding-top:0px; padding-left:0px; margin:0px; border-left:0px; padding-right:0px "border=" 0 "alt=" clipboard[2] "src=" http://s3.51cto.com/wyfs02/M02/6B/F3/ Wkiol1u7nkwcjt9yaab-byo43e8307.jpg "" 443 "height=" 108 "/>

Running the same command from the node, you can see

650) this.width=650; "title=" clipboard[3] "style=" border-top:0px; border-right:0px; Background-image:none; border-bottom:0px; padding-top:0px; padding-left:0px; border-left:0px; padding-right:0px "border=" 0 "alt=" clipboard[3] "src=" http://s3.51cto.com/wyfs02/M00/6B/F3/ Wkiol1u7nkaz9zpfaabgfx9ptp8509.jpg "" Height= "/>"

At this point, the installation of Hadoop has been successful.

Hadoop Installation Memo

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.