Transferred from:http://blog.csdn.net/Androidlushangderen/article/details/41051027
After analyzing the Redis source code for a period of time, I am about to start the next technical learning journey, the technology is currently very hot Hadoop, but a hadoop ecosystem is very large, so first of all my intention is to select one of the modules, to learn, research, I chose MapReduce. MapReduce was first developed by Google in a paper published in 04, and was later realized with the advent of Hadoop behind it. The intention of learning MapReduce will not be like the Redis source learning, I will only pick out some of the more of the process analysis, I hope to understand the deeper. As with the last time, to learn a technology, first to understand the whole, so I have also made a structural classification of the Hadoop MapReduce. The first is a graphical representation of a graph made from a relationship class diagram:
Content will be more, the following gives me spent a few hours to sort out the text function description classification, combined with pictures and text, understanding the effect will be better:
MapReduce source Analysis (the main four modules, others represents the name of the. java file under the parent directory):
1.org.apache.hadoop.mapred (old version Mapreduceapi):
(1). Jobcontrol (Job Job Direct control class)
(2). Join: (used in the job job to mimic the data connection processing operations tool)
(3). lib (the tool method that MapReduce relies on)
|----(1). Aggregate (Files for data aggregation processing)
|----(2). DB (Database operations related files)
|----(3). Others
(4). Pipes (Hadoop mapreduce C + + interface McCartney)
(5). Tools (contains a mradmin file for connecting to connect operation, no such file in the new version)
(6). Others
2.org.apache.hadoop.mapreduce (New Mapreduceapi):
(1). Example (example of storing running Hadoop jobs)
(2). lib (the tool method on which the new version of MapReduce depends):
|----(1). Aggregate (Files for data aggregation processing)
|----(2). DB (Database operations related files)
|----(3). Others
(3). Security (newly added safety-related code in the Hadoop1.0 version)
|----(1). Token (token verification for security testing)
| |----(1). delegation (Agent in token directory, delegate token)
| |----(2). Others
|----(2). Others
(4). Server (Hadoop service-side features, mainly including Jobtracker,tasktracker)
|----(1). Jobtracker (Task Scheduler Tracker)
|----(2). Tasktracker (Task Execution Tracker)
|----(1). Userlogs (User logging module for task execution)
|----(2). Others
(5). Split (Split processing class for job jobs)
(6). Others
3.org.apache.hadoop.filecache (file cache, for file distribution):
(1). Distributedcache.java (the file specified by the job is distributed to the machine on task execution before the job is executed)
(2). Taskdistributedcachemanager.java (that is, job ID, Job conf, configuration parameters, job profile path, task collection included in the job (currently in Tasktracker), and some user rights, and so on)
(3). Trackerdistributedcachemanager.java (, the cache file used to manage all tasks on the machine)
4.org.apache.hadoop---mapreduce-default.xml:
The default file for MapReduce in the home directory, including the configuration of the address port number, and so on.
all of the above is what I have summed up, it will inevitably be wrong, I hope that we can first master the structure of the MapReduce system, a good break, there are problems can be directly commented that the following I analyzed the code will be timed synchronization to my GitHub, address: Https://github.com/linyiqun
MapReduce Overall architecture Analysis