Distributed basic Learning "two"--distributed Computing System (MAP/REDUCE)

Source: Internet
Author: User
Tags file system final split

Two. Distributed Computing (Map/reduce)

Distributed computing is also a broad concept, where it is narrowly referred to as a distributed framework designed by the Google Map/reduce framework. In Hadoop, distributed file systems are, to a large extent, serviced by a variety of distributed computing requirements. We say that the Distributed file system is a distributed file system, and similar definitions are extended to distributed computing, and we can see it as a computational function that adds distributed support. From a computational point of view, the Map/reduce framework accepts key values from various formats as input, reads the calculations, and eventually generates the output file in the custom format. From the point of view of distribution, the input files of distributed computing are often large in scale and distributed on multiple machines, while stand-alone computing is completely unsustainable and inefficient, so the map/reduce framework needs to provide a mechanism to extend this computation to an infinite scale machine cluster. According to this definition, our understanding of the whole map/reduce can also be seen along these two processes ...

In the Map/reduce framework, each time a calculation request is called a job. In the distributed Computing Map/reduce Framework, in order to complete this job, it takes two steps to the strategy, the first is to split it into a number of map tasks, assigned to different machines to perform, each map task take part of the input file as their input, after some calculation, generate a An intermediate file in a format that is exactly the same as the final file format required, but contains only a subset of the data. So, when all the map tasks are complete, it goes to the next step to merge the intermediate files to get the final output file. At this point, the system generates several reduce tasks, which are also assigned to different machines to execute, and its goal is to aggregate the intermediate files generated by several map tasks into the final output file. Of course, this rollup is not always as straightforward as 1 + 1 = 2, which is the value of the reduce task. After the above steps, eventually, the job is completed and the required target file is generated. The key of the whole algorithm is to add a process of intermediate file generation, which greatly improves the flexibility and ensures the distributed extensibility ...

I. Comparison of terminology

Like Distributed file Systems, Google, Hadoop, and ... I, each one of the ways to express the concept of unity, in order to ensure its unity, unique in the following table ...

Translation in the text Hadoop terminology Google terminology Related explanations
Homework Job Job Each calculation request of a user is called a job.
Job Server Jobtracker Master The server where the user submits the job, and it is responsible for assigning each job task and managing all the task servers.
Task Server Tasktracker Worker Worker bees, responsible for the implementation of specific tasks.
Task Task Task Each job, you need to split, by multiple servers to complete, split out of the execution unit, called task.
Backup tasks Speculative Task Buckup Task Each task is likely to fail or slow down, in order to reduce the cost of this, the system will be proactive implementation on the other task server to perform the same task, this is the backup task.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.