The map_reduce mechanism of ceilometer, map

The map_reduce mechanism of ceilometer, map_reduce

Last Update:2014-12-12 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The map_reduce mechanism of ceilometer, map_reduce
Map/Reduce is an aggregation tool. For example, SQL, mongodb group (by), countdistinct, and so on are all aggregate commands.

Map/Reduce is actually a software framework for implementing the idea of distributed computing. That is, you follow the specifications of this framework. Writing upper-Layer Code can implement your distributed computing and aggregate all the computing results to get a simple result. Applications written based on Map/reduce can run on clusters composed of thousands of servers and process data in parallel in a reliable and fault-tolerant manner.

The specific process is as follows:
   Map/Reduce can divide a task into many subtasks that can be processed in parallel. These subtasks are allocated to different servers for parallel computing, when the computing of all servers is complete, the results are aggregated to form a final result.
   You can define a map function to process a key/value pair to generate a batch of intermediate key/value pairs, and then define a Reduce function to combine all the values with the same key in the middle.
   To put it simply, Map maps a group of data to another group of data one by one, and its ing rules are specified by a function, such as for [1, 2, 3, 4] The ing of multiplication 2 is changed to [2, 4, 6, 8]. Reduce is to normalize a group of data. The normalization rule is specified by a function. For example, the result of the sum of [1, 2, 3, 4] is 10, the result of product reduction is 24.

Map operations operate on each element independently. In other words, Map operations generate a new set of data, while the original data remains unchanged. Therefore, it is highly parallel. Although the Reduce operation is not as good as the concurrency of the Map operation, it will always get a relatively simple result, and large-scale operations are relatively independent, so it is more suitable for parallel operations.

MapReduce tasks are used to process key/value pairs. This framework converts each input record into a key/value pair, and each pair of data is input to the Map job. The Map task outputs a set of key/value pairs. In principle, the input is a key/value pair, but the output can be multiple key/value pairs. It then groups and sorts Map output key/value pairs. Then, the Reduce method is called for each sort key-value pair, and its output is a key value and a set of associated data values. The Reduce method can output any number of key/value pairs, which will be written to the output file under the work output directory. If the Reduce output key value remains the same as the Reduce Input key value, the final output remains sorted.

This framework provides two processes to manage MapReduce jobs:

TaskTracker manages and executes various Map and Reduce jobs on the computing nodes in the cluster.
JobTracker accepts job submission, provides job monitoring and control, manages jobs, and assigns jobs to TaskTracker nodes.

Generally, each cluster has a JobTracker process, and each node in the cluster has one or more TaskTracker processes. JobTracker is a key module. If a TaskTracker encounters a problem, JobTracker Schedules other TaskTracker processes to retry.

The Map/reduce algorithm includes the following steps:
1. Partition)
Divide data into N parts
2. Map
In addition to dividing data, you also need to Map the code that computes the data to each computing node for concurrent execution. The N nodes execute their own tasks, and then return the execution results.
3. Partition)
The N execution results need to be merged, So we divide the data again
4. Reduce
The Reduce code and Reduce data are distributed to M nodes for execution. After each node is executed, data is returned. If you need to Reduce again, you can execute it again. Eventually Reduce is a total of results.

In fact, the code we need to write has only two methods: A map method, how to execute each piece of data, a reduce method, and how to merge each piece of data. The framework sorts the output of the map operation and then inputs the result to the reduce task.
Reference diagram:

Summary:
The concept of map/reduce is very simple. In other words, it can be implemented in any language. Google's map/reduce is famous not because of its many clever ideas, but because it sums up distributed computing in a very simple way.
For any distributed computing, the core tasks are: 1. task division 2. Data merging. If you cannot divide tasks, it is useless to use any distributed framework. For example, for clustering Calculation of super-large matrices, if the algorithm itself cannot be divided, it cannot be distributed at all. Therefore, Division is the most important for all distributed issues.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

The map_reduce mechanism of ceilometer, map_reduce

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

The map_reduce mechanism of ceilometer, map_reduce

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support