Hadoop learning notes-1. hadoop Introduction

Source: Internet
Author: User
Tags hadoop mapreduce

Hadoop is a project under Apache. It consists of HDFS, mapreduce, hbase, hive, Zookeeper, and other Members. HDFS and mapreduce are two of the most basic and important members.

HDFS is an open-source version of Google gfs. It is a highly fault-tolerant distributed file system that provides high-throughput data access and is suitable for storing massive (Pb-level) data) (usually more than 64 MB), the principle is as follows:

The Master/Slave structure is used. Namenode maintains metadata in the cluster and provides functions for creating, opening, deleting, and renaming files or directories. Datanode stores data and initiates read/write requests for processing data. Datanode periodically reports heartbeat to namenode. namenode controls datanode by responding to heartbeat.

Infoword named mapreduce the top 10 emerging technologies in 2009. Mapreduce is a powerful tool for large-scale data (Tb-level) computing. Map and reduce are the main ideas of mapreduce.Programming Language, As shown in the following figure:

 

MAP is used to scatter data and reduce is used to aggregate data. You only need to implement the map and reduce interfaces to compute TB-level data. common applications include: log analysis, data mining, and other data analysis applications. In addition, it can also be used for scientific data computing, such as calculating the circumference rate pi.

The implementation of hadoop mapreduce also adopts the Master/Slave structure. Master is called jobtracker, while slave is called tasktracker.

The calculation submitted by the user is called a job. Each job is divided into several tasks. Jobtracker is responsible for scheduling jobs and tasks, while tasktracker is responsible for executing asks.

 

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.