Hadoop technology Insider: in-depth analysis of mapreduce Architecture Design and Implementation Principles

Last Update:2018-12-06 Source: Internet

Author: User

Tags hadoop mapreduce

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Basic information of hadoop technology Insider: in-depth analysis of mapreduce architecture design and implementation principles by: Dong Xicheng series name: Big Data Technology series Publishing House: Machinery Industry Press ISBN: 9787111422266 Release Date: 318-5-8 published on: July 6,: 16 webpage:: Computer> Software and program design> distributed system design more about "hadoop technology Insider: in-depth analysis of the mapreduce architecture design and implementation principles. The computer book "hadoop technology insider" consists of two books, the "common + HDFS" and "mapreduce architecture design and implementation principle" are analyzed in detail from the source code perspective. This book is written by senior practitioners in the hadoop field. It first introduces the design concept and programming model of mapreduce, then, the architecture design and implementation principles of the RPC framework, client, jobtracker, tasktracker, and task mapreduce runtime environments are deeply analyzed from the perspective of source code, at last, we explain hadoop performance optimization, security mechanism, Multi-User Job scheduler, next-generation mapreduce framework, and other advanced topics and contents from the perspective of practical application. This book is suitable for secondary hadoop developers, application development engineers, and O & M engineers. Hadoop technology Insider: in-depth analysis of mapreduce Architecture Design and Implementation Principles. Chapter 12 consists of four parts (excluding the appendix): Part 1 (1st ~ Chapter 2) introduces the hadoop source code organization, acquisition, compilation, debugging, and reading environment construction, as well as the mapreduce design concept and basic architecture. Part 2 (chapter 2 ), focuses on mapreduce programming interfaces, including two sets of programming interfaces, the old API and the new API, and hadoop workflow. Part 3 (4th ~ Chapter 8) analyzes the mapreduce runtime environment, including the internal implementation details and Mechanisms of the RPC framework, client, jobtracker, tasktracker, and task. Part 4 (9th ~ Chapter 12) describes hadoop performance optimization, Multi-User Job scheduler, security mechanisms, and the next-generation mapreduce framework. Directory: hadoop technology Insider: in-depth analysis of mapreduce Architecture Design and Implementation Principles Preface Part 1 basics Chapter 2 Preparations before reading Source Code/1st preparation of source code learning environment/21.1.1 download of basic software/21.1.2 how to prepare a Windows environment/31.1.3 how to prepare a Linux environment/61.2 get hadoop source code/71.3 build a hadoop source code reading environment/81.3.1 create a hadoop Project/81.3.2 hadoop source code reading skills/91.4 hadoop source code structure/101.5 hadoop initial experience/131.5.1 start hadoop/131.5.2 hadoop shell introduction/151.5.3 hadoop Eclipse plug-in introduction/151.6 compile and debug hadoop source code/191.6.1 compile hadoop source code/191.6.2 debug hadoop source code/Chapter 23 mapreduce Design Philosophy basic Architecture/242.1 hadoop Development History/242.1.1 hadoop background/242.1.2 features of the new Apache hadoop version/252.1.3 hadoop version change/262.2 hadoop mapreduce design goals/282.3 mapreduce programming model Overview/292.3.1 mapreduce Programming model Overview/292.3.2 mapreduce programming instance/312.4 hadoop basic architecture/322.4.1 HDFS architecture/332.4.2 hadoop mapreduce architecture/342.5 hadoop mapreduce job lifecycle/362.6 Summary/38 part 2 mapreduce Programming Model chapter 2 mapreduce programming model/3rd mapreduce programming model Overview/403.1.1 mapreduce programming interface architecture/403.1.2 comparison of New and Old mapreduce APIs/403.1 basic mapreduce API concepts/423.2.1 serialization/423.2.2 reporter parameter/433.2.3 callback mechanism/433.3 Java API parsing/443.3.1 job configuration and submission/443.3.2 inputformat Interface Design and Implementation/483.3.3 outputformat Interface Design and Implementation/533.3.4 Mapper and reducer parsing/553.3.5 partitioner Interface Design and Implementation/593.4 non-Java API resolution/613.4.1 hadoop streaming implementation principle/613.4.2 hadoop pipes implementation principle/643.5 hadoop workflow/673.5.1 jobcontrol implementation principle/673.5.2 chainmapper/chainreducer implementation principle /693.5.3 hadoop workflow engine/713.6 Summary/73 Part 3 mapreduce core design Chapter 2 hadoop RPC framework resolution/4th hadoop RPC framework Overview/764.1 Java basics/774.2.1 Java reflection mechanism and Dynamics agent/784.2.2 Java Network Programming/804.2.3 Java NiO/824.3 hadoop RPC basic framework analysis/894.3.1 RPC basic concepts/894.3.2 hadoop RPC basic framework/914.3.3 integration with other open source RPC frameworks/984.4 mapreduce communication protocols analysis/1004.4.1 mapreduce communication protocol Overview/1004.4.2 jobsubmissionprotocol communication protocol/1024.4.3 intertrackerprotocol communication protocol/1024.4.4 taskumbilicalprotocol communication protocol/1034.4.5 other communication protocols/1044.5 Summary/Chapter 106 job submission and Initialization Process Analysis /1075.1 job submission and initialization Overview/1075.2 job submission process details/supervisor 5.2.1 execute shell command/supervisor 5.2.2 job file upload/supervisor 5.2.3 generate inputsplit file/1115.2.4 job submit to jobtracker/1135.3 job initialization process details /1155.4 hadoop distributedcache Principle Analysis/1175.4.1 usage/1185.4.2 Working Principle Analysis/1205.5 Summary/122 Chapter 4 internal implementation analysis of jobtracker/6th jobtracker Overview/1236.1 jobtracker Startup Process Analysis/1256.2.1 jobtracker startup Process Overview/1256.2.2 important object initialization/1256.2.3 various thread functions/1286.2.4 job recovery/1296.3 heartbeat reception and response/1296.3.1 Update Status/1316.3.2 issue command/1316.4 job and task runtime information maintenance /configure job description model/1346.4.2 jobinprogress/1366.4.3 taskinprogress/1376.4.4 job and job status Conversion Diagram/1396.5 Fault Tolerance Mechanism/1416.5.1 jobtracker fault tolerance/1416.5.2 tasktracker fault tolerance/1426.5.3 job fault Tolerance/1476.5.5 disk fault tolerance/1516.6 task speculative execution principle/1526.6.1 computing model hypothesis/1536.6.2 1.0.0 algorithm/1536.6.3 0.21.0 algorithm/1546.6.4 2.0 algorithm/1566.7 hadoop resource management/ 1576.7.1 task scheduling framework analysis/1596.7.2 task selection policy analysis/1626.7.3 FIFO scheduler analysis/1646.7.4 hadoop Resource Management Optimization/1656.8 Summary/168 chapter 7th internal implementation analysis of tasktracker/1697.1 tasktracker Overview/1697.2 tasktracker Startup Process Analysis/1707.2.1 important variable initialization/1717.2.2 important object initialization/1717.2.3 connection jobtracker/1727.3 heartbeat mechanism/1727.3.1 single heartbeat sending/1727.3.2 status sending/1757.3.3 command execution/1787.4 tasktracker Behavior Analysis /1797.4.1 start new task/1797.4.2 submit task/1797.4.3 kill task/1817.4.4 kill job/1827.4.5 re-Initialize/1847.5 job Directory management/1847.6 start new task/1867.6.1 task Start Process Analysis/1867.6.2 Resource isolation mechanism/1937.7 Summary/195 chapter 8th task running process analysis/1968.1 task running process overview/1968.2 basic data structures and algorithms/1978.2.1 ifile storage format/1978.2.2 Sorting/1988.2.3 reporter/2018.3 Map task internal implementation/2048.3.1 map task overall process/2048.3.2 collect Process Analysis/2058.3.3 spill Process Analysis/2138.3.4 combine process analysis/2148.4 reduce task internal implementation/2148.4.1 reduce task overall process/2158.4.2 shuffle and merge stage analysis/2158.4.3 sort and reduce stage analysis/2188.5 MAP/reduce task optimization/2198.5.1 parameter optimization/2198.5.2 system optimization/2208.6 Summary/224 Part 4 mapreduce advanced chapter 9th hadoop Performance tuning/2289.1 Overview/2289.2 tuning from the Administrator perspective/2299.2.1 hardware selection/2299.2.2 operating system parameter tuning/2299.2.3 JVM parameter tuning/2309.2.4 hadoop parameter tuning/2309.3 from the user application Programming specification/2359.3.2 job-level parameter optimization/2359.3.3 job-level parameter optimization/2399.4 Summary/Chapter 240 hadoop Multi-User Job scheduler/10th multi-user scheduling background/24110.2 hod/24210.2.1 torque Resource Manager/24210.2.2 hod Job Scheduling/243

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More