Two. Distributed Computing (Map/reduce)
Distributed computing is also a broad concept, where it is narrowly referred to as a distributed framework designed by the Google Map/reduce framework. In Hadoop, distributed file systems are, to a large extent, serviced by a variety of distributed computing requirements. We say that the Distributed file system is a distributed file system, and similar definitions are extended to distributed computing, and we can see it as a computational function that adds distributed support. From a computational point of view, the Map/reduce framework accepts key values from various formats as input, reads the calculations, and eventually generates the output file in the custom format. From the point of view of distribution, the input files of distributed computing are often large in scale and distributed on multiple machines, while stand-alone computing is completely unsustainable and inefficient, so the map/reduce framework needs to provide a mechanism to extend this computation to an infinite scale machine cluster. According to this definition, our understanding of the whole map/reduce can also be seen along these two processes ...
In the Map/reduce framework, each time a calculation request is called a job. In the distributed Computing Map/reduce Framework, in order to complete this job, it takes two steps to the strategy, the first is to split it into a number of map tasks, assigned to different machines to perform, each map task take part of the input file as their input, after some calculation, generate a An intermediate file in a format that is exactly the same as the final file format required, but contains only a subset of the data. So, when all the map tasks are complete, it goes to the next step to merge the intermediate files to get the final output file. At this point, the system generates several reduce tasks, which are also assigned to different machines to execute, and its goal is to aggregate the intermediate files generated by several map tasks into the final output file. Of course, this rollup is not always as straightforward as 1 + 1 = 2, which is the value of the reduce task. After the above steps, eventually, the job is completed and the required target file is generated. The key of the whole algorithm is to add a process of intermediate file generation, which greatly improves the flexibility and ensures the distributed extensibility ...
I. Comparison of terminology
Like Distributed file Systems, Google, Hadoop, and ... I, each one of the ways to express the concept of unity, in order to ensure its unity, unique in the following table ...
Translation in the text |
Hadoop terminology |
Google terminology |
Related explanations |
Homework |
Job |
Job |
Each calculation request of a user is called a job. |
Job Server |
Jobtracker |
Master |
The server where the user submits the job, and it is responsible for assigning each job task and managing all the task servers. |
Task Server |
Tasktracker |
Worker |
Worker bees, responsible for the implementation of specific tasks. |
Task |
Task |
Task |
Each job, you need to split, by multiple servers to complete, split out of the execution unit, called task. |
Backup tasks |
Speculative Task |
Buckup Task |
Each task is likely to fail or slow down, in order to reduce the cost of this, the system will be proactive implementation on the other task server to perform the same task, this is the backup task. |