Hadoop MapReduce Analysis

Source: Internet
Author: User
Tags hadoop mapreduce

Abstract: MapReduce is another core module of Hadoop. It understands MapReduce from three aspects: What MapReduce is, what MapReduce can do, and how MapReduce works.

Keywords: Hadoop MapReduce Distributed Processing

In the face of big data, the storage and processing of big data is like a person's right hand. It is particularly important. Hadoop is suitable for solving big data problems. It relies heavily on its big data storage system, HDFS and big data processing system, that is, MapReduce. For more information about HDFS, see the author's article Hadoop HDFS. For MapReduce, we will understand MapReduce from the following three questions.

Question 1: What is MapReduce?

Question 2: What Can MapReduce do?

Question 3: How does MapReduce work?

For the first question, we reference the introduction to MapReduce by Apache Foundation, "Hadoop MapReduce is a software framework for easily writing applications which process vast amounts of data (multi-terabyte data-sets) in-parallel on large clusters (thousands of nodes) of commodity hardware in a reliable, fault-tolerant manner. "From this we can see that Hadoop MapReduce is a software framework based on which applications can be easily written. These applications can run on a large set of thousands of commercial machines, it also processes terabytes of data sets in parallel in a reliable and fault-tolerant manner. This definition contains these keywords: Software Framework, parallel processing, reliability and fault tolerance, large-scale clusters, and massive data sets. Therefore, for MapReduce, we can simply think that it is a software framework, and massive data is its "Dish ", it "cooks this dish" concurrently in a reliable and fault-tolerant manner in a large-scale cluster ". Here, I sincerely lament the greatness of thoughts, the magic of decomposition, and the clever combination.

Knowing what MapReduce is and the second question is clear. What can MapReduce do? In short, big data processing is supported. The so-called big data processing, that is, value-oriented, for Big Data Processing, mining and optimization and other processing.

MapReduce is good at processing big data. Why does MapReduce have this capability? This can be found by the design idea of MapReduce. The idea of MapReduce is "divide and conquer ". Mapper is responsible for dividing complex tasks into several simple tasks. "Simple tasks" have three meanings: first, the data or computing scale is greatly reduced compared with the original task; second, the proximity calculation principle, that is, tasks are assigned to nodes that store the required data for computing. Third, these small tasks can be computed in parallel, with almost no dependencies between them. Reducer summarizes the results of the map stage. As for how many reducers are needed, you can set the value of the mapred. reduce. tasks parameter in the mapred-site.xml configuration file according to the specific problem, the default value is 1.

How does MapReduce handle big data? You can compile MapReduce applications to perform big data operations. Since MapReduce is used to process big data, how does MapReduce program work? This is the third problem, that is, the working mechanism of MapReduce.

Shows the entire process of MapReduce. It contains the following four independent entities.

Entity 1: client, used to submit MapReduce jobs.

Entity 2: jobtracker, used to coordinate the operation of a job.

Entity 3: tasktracker, used to process tasks after job division.

Entity 4: HDFS, used to share job files among other entities.

By reviewing the MapReduce workflow, we can see that the entire MapReduce work process includes the following steps in an orderly manner.

Step 1: Submit a job

Step 2: initialize a job

Step 3: Task Allocation

Step 4: execute a task

Step 5: update processes and statuses

Step 6: Homework completion

For details about what to do in each link, refer to Chapter 6 MapReduce working mechanism in the Hadoop authoritative guide. For more information, see.

If you want to use MapReduce to handle big data, you need to write MapReduce applications as needed. Therefore, using the MapReduce framework to develop programs requires in-depth consideration and constant practice.

Build a Hadoop environment on Ubuntu 13.04

Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1

Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)

Configuration of Hadoop environment in Ubuntu

Detailed tutorial on creating a Hadoop environment for standalone Edition

Build a Hadoop environment (using virtual machines to build two Ubuntu systems in a Winodws environment)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.