The map_reduce mechanism of Ceilometer

Source: Internet
Author: User
Tags new set sorts

Map/reduce is an aggregation tool. For example, SQL and MongoDB group (by), COUNTDISTINCT, etc. are aggregation commands.

Map/reduce is actually a software framework for the realization of the idea of distributed computing. Is that you follow the specifications of this framework, writing upper-level code can implement your distributed computing, and can aggregate all the results together to achieve a simple result. Applications based on Map/reduce can run on clusters of thousands of servers and process data in parallel in a reliable, fault-tolerant manner.

The specific process is:
Map/reduce can decompose a task into a number of sub-tasks that can be parallelized, which are assigned to parallel computations on different servers, and when all of the server's calculations are complete, the results are aggregated together to form a final result.
user-defined aMapfunction to handle aKey/valueTo generate a batch of intermediateKey/valueYes, define an R again.educefunction will have all of these intermediateKeyof theValuesmerge them.
Simply put, map is a set of data to one-to-one mapping to another set of data, its mapping rules are specified by a function, such as the [1, 2, 3,4] to the map of 2 is changed to [2, 4, 6, 8]. Reduce is a set of data that is normalized, the rule of the normalization is specified by a function, such as the sum of [1, 2, 3, 4] to obtain the result is 10, and the result of the quadrature of it is 24.

The map operation operates independently of each element, in other words, the map operation produces a whole new set of data, and the original data remains unchanged. Therefore, it is highly parallel. The reduce operation is not as good as the map operation parallelism, but it always gets a relatively simple result, the large scale operation is relatively independent, so it is also more suitable for parallel.

The MapReduce task is used to handle key/value pairs. The framework converts each input record into a key/value pair, and each pair of data is entered into the map job. The output of the map task is a set of key/value pairs, in principle, the input is a key/value pair, but the output can be multiple key/value pairs. It then groups and sorts the map output key/value pairs. The reduce method is then called for each key-value pair that is ordered, and its output is a key value and a set of associated data values. The reduce method can output any number of key/value pairs, which will be written to the output file in the working output directory. If the reduce output key value remains the same as the reduce input key value, the final output remains sorted.

The framework provides two processing processes to manage MapReduce jobs:

    • Tasktracker manages and executes individual map and reduce jobs on compute nodes in the cluster.
    • Jobtracker accepts job submissions, provides job monitoring and control, manages tasks, and assigns jobs to tasktracker nodes.

In general, each cluster has a jobtracker process, and each node in the cluster has one or more tasktracker processes. Jobtracker is a critical module that causes the system to fail, and if a tasktracker problem occurs, Jobtracker dispatches other tasktracker processes to retry.


The map/reduce algorithm consists of several steps:
1, Partition (divided data)
divide The data into n parts
2. Map
in addition to dividing the data, it is also necessary to map the code that is operating the data to each operator node to execute concurrently. The n nodes perform their own tasks and return the execution results after the execution is completed.
3, Partition (merge Division)
these n execution results need to be merged, so we divide the data again
4. Reduce
The reduce code and the reduce data are distributed to M-node executions, and each node completes the return data. If you need to reduce again, you can do it again. The final reduce is a total result.

In fact, we need to write only two methods of code: A map method, how to execute each piece of data, a reduce method, how to merge each piece of data. The framework sorts the output of the map operation and then inputs the results to the reduce task.
Specific reference diagram:




Summarize:
The idea of Map/reduce is very simple, in other words, any language can be achieved. Google's map/reduce is famous not because the idea is ingenious, but because it sums up the distributed computing in a very simple way.
Any distributed computing, the most core tasks are: 1, Task Division 2, Data Merge, if the task can not be divided, then use what distributed framework is useless. For example, the clustering calculation of super-large matrices, if the algorithm itself can not be divided, then there is no way to distribute. So it's all about distributed issues, and partitioning is the most important.

The map_reduce mechanism of Ceilometer

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.