Distributed basic Learning "two"--distributed Computing System (MAP/REDUCE)

Last Update:2017-02-27 Source: Internet

Author: User

Tags file system final split

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Two. Distributed Computing (Map/reduce)

Distributed computing is also a broad concept, where it is narrowly referred to as a distributed framework designed by the Google Map/reduce framework. In Hadoop, distributed file systems are, to a large extent, serviced by a variety of distributed computing requirements. We say that the Distributed file system is a distributed file system, and similar definitions are extended to distributed computing, and we can see it as a computational function that adds distributed support. From a computational point of view, the Map/reduce framework accepts key values from various formats as input, reads the calculations, and eventually generates the output file in the custom format. From the point of view of distribution, the input files of distributed computing are often large in scale and distributed on multiple machines, while stand-alone computing is completely unsustainable and inefficient, so the map/reduce framework needs to provide a mechanism to extend this computation to an infinite scale machine cluster. According to this definition, our understanding of the whole map/reduce can also be seen along these two processes ...

In the Map/reduce framework, each time a calculation request is called a job. In the distributed Computing Map/reduce Framework, in order to complete this job, it takes two steps to the strategy, the first is to split it into a number of map tasks, assigned to different machines to perform, each map task take part of the input file as their input, after some calculation, generate a An intermediate file in a format that is exactly the same as the final file format required, but contains only a subset of the data. So, when all the map tasks are complete, it goes to the next step to merge the intermediate files to get the final output file. At this point, the system generates several reduce tasks, which are also assigned to different machines to execute, and its goal is to aggregate the intermediate files generated by several map tasks into the final output file. Of course, this rollup is not always as straightforward as 1 + 1 = 2, which is the value of the reduce task. After the above steps, eventually, the job is completed and the required target file is generated. The key of the whole algorithm is to add a process of intermediate file generation, which greatly improves the flexibility and ensures the distributed extensibility ...

I. Comparison of terminology

Like Distributed file Systems, Google, Hadoop, and ... I, each one of the ways to express the concept of unity, in order to ensure its unity, unique in the following table ...

Translation in the text	Hadoop terminology	Google terminology	Related explanations
Homework	Job	Job	Each calculation request of a user is called a job.
Job Server	Jobtracker	Master	The server where the user submits the job, and it is responsible for assigning each job task and managing all the task servers.
Task Server	Tasktracker	Worker	Worker bees, responsible for the implementation of specific tasks.
Task	Task	Task	Each job, you need to split, by multiple servers to complete, split out of the execution unit, called task.
Backup tasks	Speculative Task	Buckup Task	Each task is likely to fail or slow down, in order to reduce the cost of this, the system will be proactive implementation on the other task server to perform the same task, this is the backup task.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More