Hadoop cluster WordCount Run

Source: Internet
Author: User

1. Introduction to the MapReduce theory

1.1. MapReduce Programming Mode

MapReduce uses the idea of "divide and conquer", distributes the operation of large data sets to a node under the management of a master node, and then obtains the final result by consolidating the intermediate results of each node. In short, MapReduce is "the decomposition of tasks and the aggregation of results".

In Hadoop, there are two machine roles used to perform mapreduce tasks: One is Jobtracker, the other is tasktracker,jobtracker for scheduling, and Tasktracker is for performing work. There is only one jobtracker in a Hadoop cluster.

In distributed computing, the MapReduce framework is responsible for dealing with complex problems such as distributed storage, work scheduling, load balancing, fault-tolerant equalization, fault-tolerant processing and network communication in parallel programming, and the processing process is highly abstracted into two functions: map and Reduce,map are responsible for splitting the task into multiple tasks. Reduce is responsible for summing up the results of the multi-tasking process after decomposition.

It is important to note that data sets (or tasks) that are handled with MapReduce must have the characteristic that the data set to be processed can be decomposed into many small datasets, and each small data set can be processed in full parallel.

Hadoop cluster WordCount Run

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.