What is Hadoop?

Source: Internet
Author: User
Tags file system split

What is Hadoop? Hadoop is a software platform for developing and running large scale data, and it is a Appach open source software framework in Java language to realize distributed computing of massive data in a large number of computer clusters.

The most central design in the Hadoop framework is that HDFS and Mapreduce.hdfs provide storage of massive amounts of data, and MapReduce provides computing for the data.

The process of data processing in Hadoop can be understood simply in terms of the following figure: The data is processed by the HADDOP cluster and the results are obtained.

Hdfs:hadoop the Distributed File system Distributed file System,hadoop.

The large file is divided into the default 64M piece of data block distribution stored in the cluster machine.

The file data1 in the following figure is divided into 3 blocks, and the 3 pieces are distributed in a redundant mirror in different machines.

Mapreduce:hadoop for each input split create a task call map calculation, in which the record is processed sequentially in this split, and the map outputs the result in key--value form. Hadoop is responsible for the output of the map as the input of reduce after the key value, and the output of the reduce task is the output of the entire job, stored on the HDFs.

The cluster of Hadoop is mainly composed of Namenode,datanode,secondary Namenode,jobtracker,tasktracker.

As shown in the following illustration:

The Namenode records how files are split into blocks and that these blocks are stored in those datenode nodes.

Namenode also holds the state information of the file system running.

The blocks stored in the Datanode is the split.

Secondary Namenode helps Namenode to collect state information about file system runs.

Jobtracker is responsible for job execution when a task is submitted to the Hadoop cluster, and is responsible for scheduling multiple tasktracker.

Tasktracker is responsible for a map or reduce task.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.