MapReduce Learning Guide and Troubleshooting Summary
1. The origin of thought: We are learning mapreduce, first of all we know from the mind. In fact, any idea, abstract, good ideas, all come from our lives, and we are more likely to understand what is going on around us. So the following article is from the perspective of life, to let us understand, what is mapreduce.
Introduction to Hadoop (1): What is Map/reduce
2. Design Ideas
We know the mapreduce from the thought, then the mapreduce is exactly what we need to see and touch. How do we implement this idea, and how should we design mapreduce. So now, let's talk about its design ideas.
Design ideas How to say: Ok The following post, is through a picture to express.
Mapreduce whole working mechanism diagram
MapReduce is the core of Hadoop, and because of MapReduce, it has distributed. So we may be able to get through a graph that is not clear enough and detailed. We also need to understand the principles inside:
How the MapReduce work is explained
3. Model implementation
Through the above we may have some of our own views. But we may not know it very well. So let's take a look at the programming model for further understanding.
Overview of the MapReduce programming model
MapReduce programming Model
4. Problem creation
We read the above article, this time there will be some nouns, concepts into our minds.
Except for the Map,reduce,task,job,shuffe,partition,combiner, these confuse us.
We have the following problems:
The number of maps is determined by who, and how to calculate them.
Reduce the number of people who decide, how to calculate.
In short, the map is determined by split, and reduce is determined by partition.
Details can be viewed
How to determine the relationship between the number of Hadoop map and reduce--map and the amount of reduce.
--------------------------------------------------------------------------------------------------
What is shuffle.
What is partition.
What is combiner.
What is the relationship between the three of them.
MapReduce is the core of Hadoop, and shuffle is the core of MapReduce, shuffle personally think it is a dynamic process, including combiner,merge and so on, so here because of the shuffle, many people are talking about the comprehensive, Combiner,merge,sort almost all said, there is no mistake, but it is easy for beginners to create an illusion, that is, these processes must be included, in fact, these are based on the needs of individuals to determine.
Thorough understanding of the MapReduce core shuffle--various MapReduce issues
For Combiner's doubts can be viewed
The role of combiner in the process of mapper
Resolves the following issues
Why do I need to do the mapper at the end of the process? Why the mapper can be processed at the end of the process. Since the mapper can be processed at the end of the process, why the reducer end is also processed. Above three questions, you can see the posts in Hadoop, combine, partition, shuffle what the role is respectively.
Also refer to the above questions
Personal summary of Mapper and reducer
5. Programming implementation
MapReduce is a programming model, and we know it, and the following is a programming implementation. So what MapReduce can do. Here we can see
Novice Guide, how to create a MapReduce program in the development environment
MapReduce primary Case (1): Using MapReduce to weigh
MapReduce Primary Case (2): Sorting with MapReduce data
MapReduce Primary Case (3): Using MapReduce to achieve average results
Through the above three examples, we also verify the previous understanding of the content.
You can also refer to the following content
The reading of the Hadoop Mapper class
The reading of the Hadoop reducer class
Mapreduce Shuffle and sort
Documentation guidance for MapReduce packages and authoring in Hadoop
Hadoop development environment Construction and Map-reduce development example video download
How to do Hadoop two-time development guide Video download
6.mapreduce applications
The above is some basic knowledge, then we are familiar with, in fact, some places, can be applied in other aspects. The following can be consulted:
An example analysis of the HBase MapReduce of Taobao
The application of MapReduce in stress testing
Here is attached a picture, want to see carefully, click on the picture, hold down the mouse wheel, enlarge can