MapReduce Overall architecture Analysis

Source: Internet
Author: User
Tags hadoop mapreduce hadoop ecosystem

Transferred from:http://blog.csdn.net/Androidlushangderen/article/details/41051027

After analyzing the Redis source code for a period of time, I am about to start the next technical learning journey, the technology is currently very hot Hadoop, but a hadoop ecosystem is very large, so first of all my intention is to select one of the modules, to learn, research, I chose MapReduce. MapReduce was first developed by Google in a paper published in 04, and was later realized with the advent of Hadoop behind it. The intention of learning MapReduce will not be like the Redis source learning, I will only pick out some of the more of the process analysis, I hope to understand the deeper. As with the last time, to learn a technology, first to understand the whole, so I have also made a structural classification of the Hadoop MapReduce. The first is a graphical representation of a graph made from a relationship class diagram:

Content will be more, the following gives me spent a few hours to sort out the text function description classification, combined with pictures and text, understanding the effect will be better:

MapReduce source Analysis (the main four modules, others represents the name of the. java file under the parent directory):
1.org.apache.hadoop.mapred (old version Mapreduceapi):
(1). Jobcontrol (Job Job Direct control class)
(2). Join: (used in the job job to mimic the data connection processing operations tool)
(3). lib (the tool method that MapReduce relies on)
|----(1). Aggregate (Files for data aggregation processing)
|----(2). DB (Database operations related files)
|----(3). Others
(4). Pipes (Hadoop mapreduce C + + interface McCartney)
(5). Tools (contains a mradmin file for connecting to connect operation, no such file in the new version)
(6). Others
2.org.apache.hadoop.mapreduce (New Mapreduceapi):
(1). Example (example of storing running Hadoop jobs)
(2). lib (the tool method on which the new version of MapReduce depends):
|----(1). Aggregate (Files for data aggregation processing)
|----(2). DB (Database operations related files)
|----(3). Others
(3). Security (newly added safety-related code in the Hadoop1.0 version)
|----(1). Token (token verification for security testing)
| |----(1). delegation (Agent in token directory, delegate token)
| |----(2). Others
|----(2). Others
(4). Server (Hadoop service-side features, mainly including Jobtracker,tasktracker)
|----(1). Jobtracker (Task Scheduler Tracker)
|----(2). Tasktracker (Task Execution Tracker)
|----(1). Userlogs (User logging module for task execution)
|----(2). Others
(5). Split (Split processing class for job jobs)
(6). Others
3.org.apache.hadoop.filecache (file cache, for file distribution):
(1). Distributedcache.java (the file specified by the job is distributed to the machine on task execution before the job is executed)
(2). Taskdistributedcachemanager.java (that is, job ID, Job conf, configuration parameters, job profile path, task collection included in the job (currently in Tasktracker), and some user rights, and so on)
(3). Trackerdistributedcachemanager.java (, the cache file used to manage all tasks on the machine)
4.org.apache.hadoop---mapreduce-default.xml:
The default file for MapReduce in the home directory, including the configuration of the address port number, and so on.

all of the above is what I have summed up, it will inevitably be wrong, I hope that we can first master the structure of the MapReduce system, a good break, there are problems can be directly commented that the following I analyzed the code will be timed synchronization to my GitHub, address: Https://github.com/linyiqun

MapReduce Overall architecture Analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.