MapReduce Overall architecture Analysis

Last Update:2015-12-14 Source: Internet

Author: User

Tags hadoop mapreduce hadoop ecosystem

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Transferred from:http://blog.csdn.net/Androidlushangderen/article/details/41051027

After analyzing the Redis source code for a period of time, I am about to start the next technical learning journey, the technology is currently very hot Hadoop, but a hadoop ecosystem is very large, so first of all my intention is to select one of the modules, to learn, research, I chose MapReduce. MapReduce was first developed by Google in a paper published in 04, and was later realized with the advent of Hadoop behind it. The intention of learning MapReduce will not be like the Redis source learning, I will only pick out some of the more of the process analysis, I hope to understand the deeper. As with the last time, to learn a technology, first to understand the whole, so I have also made a structural classification of the Hadoop MapReduce. The first is a graphical representation of a graph made from a relationship class diagram:

Content will be more, the following gives me spent a few hours to sort out the text function description classification, combined with pictures and text, understanding the effect will be better:

MapReduce source Analysis (the main four modules, others represents the name of the. java file under the parent directory):
1.org.apache.hadoop.mapred (old version Mapreduceapi):
(1). Jobcontrol (Job Job Direct control class)
(2). Join: (used in the job job to mimic the data connection processing operations tool)
(3). lib (the tool method that MapReduce relies on)
|----(1). Aggregate (Files for data aggregation processing)
|----(2). DB (Database operations related files)
|----(3). Others
(4). Pipes (Hadoop mapreduce C + + interface McCartney)
(5). Tools (contains a mradmin file for connecting to connect operation, no such file in the new version)
(6). Others
2.org.apache.hadoop.mapreduce (New Mapreduceapi):
(1). Example (example of storing running Hadoop jobs)
(2). lib (the tool method on which the new version of MapReduce depends):
|----(1). Aggregate (Files for data aggregation processing)
|----(2). DB (Database operations related files)
|----(3). Others
(3). Security (newly added safety-related code in the Hadoop1.0 version)
|----(1). Token (token verification for security testing)
| |----(1). delegation (Agent in token directory, delegate token)
| |----(2). Others
|----(2). Others
(4). Server (Hadoop service-side features, mainly including Jobtracker,tasktracker)
|----(1). Jobtracker (Task Scheduler Tracker)
|----(2). Tasktracker (Task Execution Tracker)
|----(1). Userlogs (User logging module for task execution)
|----(2). Others
(5). Split (Split processing class for job jobs)
(6). Others
3.org.apache.hadoop.filecache (file cache, for file distribution):
(1). Distributedcache.java (the file specified by the job is distributed to the machine on task execution before the job is executed)
(2). Taskdistributedcachemanager.java (that is, job ID, Job conf, configuration parameters, job profile path, task collection included in the job (currently in Tasktracker), and some user rights, and so on)
(3). Trackerdistributedcachemanager.java (, the cache file used to manage all tasks on the machine)
4.org.apache.hadoop---mapreduce-default.xml:
The default file for MapReduce in the home directory, including the configuration of the address port number, and so on.

all of the above is what I have summed up, it will inevitably be wrong, I hope that we can first master the structure of the MapReduce system, a good break, there are problems can be directly commented that the following I analyzed the code will be timed synchronization to my GitHub, address: Https://github.com/linyiqun

MapReduce Overall architecture Analysis

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More