This article focuses on the entire mapreduce process of Hadoop, do not tell stories, not nonsense, focus on describing each link. Through the article to Google over a lot of hard, I took some notes, add some of their own opinion, not necessarily all right, we have to discriminate. I hope this article has some help for mapreduce students who want to learn about Hadoop.
. Target
using map/reduce algorithm
1) can compute distributed processing
a) when needed, data is always available
b) Applications don't care how many computers do the service
2 provides high reliability
a) The application does not care about the possibility of temporary or permanent failure of the machine or network
two. What does the application do?
1 defines mapper and reducer classes and a "startup" program to complete the process
2) Mapper
a) input is a key1,value1 value pair
b) Outputs a kye2,value2 value pair
3) Reducer
a) Enter a key2 and a value2
b) Outputs a KEY3,VALUE3 value pair
4 Startup program
a) Create a jobconf define a job
b) Submit jobconf to Jobtracker and wait for execution
three. Application Data Flow Diagram
See annex Figure A
Four. Input and output formatting
The
application also chooses input and output formats to define how persistent data can be read and written. There are interfaces that can be defined.
1 Input format
a) separate input data to determine input to each map task
b) defines a recordreader to read Key,value value pairs, which are passed to the map task
2) output format
a) gives a key,value value pair and a filename, writes the output of the reduce task to the persistent storage
Five. Output Sort
applications can control sort commands and output through Outputkeycomparator and Partitioner
1) Outputkeycomparator
a) defines how to compare serialization key values
b) Default outputkeycomparator, but this should be defined in an application.
I. is written as follows Key.compareto (Key2)
2 Partitioner Distributor
a) gives a map output key and reduce number, select a reduce
b) The default Hashpartitioner, the use of modular computing to handle the allocation of work
I. Key.hashcode% numreduces
Six. Combo
Combiners is a jobs optimizer with a combined multi-valued to a value of reduce.
1) Usually, combiner and reducer can run through the output of the map, just before the output is run to reducer. This is also consistent with the principle that mobile computing is less costly than moving data.
2) For example, WordCount mapper generation (Word,count) and combiner and reducer generate the combination of each word.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.