Hadoop authoritative guide Chapter2 mapreduce

Source: Internet
Author: User

Mapreduce

Mapreduce is a programming model for data processing. The model is simple, yet not too simple to express useful programs in. hadoop can run mapreduce programs written
In various versions; In this chapter, we shall look at the same program expressed in Java, Ruby, Python, and C ++. most important, mapreduce programs are inherently parallel, thus putting very large-scale data analysis into the hands of anyone with enough machines at their disposal. mapreduce comes into its own for large datasets, so let's start by looking at one.

2.1 analyzing the data with hadoop use hadoop to analyze data

To take advantage of the parallel processing that hadoop provides, we need to express our query as a mapreduce job. After some local, small-scale testing, we will be able
Run it on a cluster of machines.

With the advantages of concurrent processing provided by hadoop, we need to use mapreduce job to express a query. Through a localized and small-scale test, we can run it on the Cluster machine.

2.2 map and reduce

Mapreduce works by breaking the processing into two phases: the map phase and the reduce phase.

Each phase has key-value pairs as input and output, the types of which may be chosen by the programmer. The programmer also specifies two functions: the map function and the reduce function.

Map function and reduce function input and output key-value pairs

 

2.3 scaling out horizontal scaling

Data Flow

A mapreduce job is a unit of work that the client wants to be saved med: it consists of the input data, the mapreduce program, and configuration information. hadoop runs the job by dividing it into tasks, of which there are two types: Map tasks and reduce tasks.

A job is a unit of work executed by the client. It consists of input data, programs, and configuration information.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.