the reader is impatient, I did not, so first say the conclusion: you can not edit the program, as long as the mouse to drag a few icons, change parameters, you can complete the distribution of billion data processing procedures.
of course, the ideal goal has not yet been achieved, but the road has been plainly displayed in front of us, at least we have come close to half.
first of all, the MapReduce algorithm itself comes from functional programming, so it is reasonable to use FP's idea to construct the algorithm. The previous program was developed with Haskell and now has a new version written in Python.
has done some practical applications of Mr, found that many problems have basic algorithm patterns, and several models are very simple. The follow-up will be summed up, here is to say: (own summary, compare Cottage)
mapreduce Algorithm Mode
1. Meta mode: MR Chain
multiple MapReduce processes can be strung together to achieve arbitrary and complex statistical algorithms.
can also be called the Data flow pattern
2. Map Mode
includes field count, field Join two
3. Reduce Mode
Keycount, Value Sum, Nubcount, value Join
Core Thought
(borrowed from the Java World)
1. Data Flow Programming: The source data flows from one end of the Mr Network, processed in a processing chain, obtains the final result, the chain can have several branches
2. Combinatorial programming: The use of generic Mapper, reducer operators, combined to achieve complex functions,
This is a multiplication process, combined with Mr Chain, you can multiply the complexity of the processing.
Try to maintain the simplicity and atomicity of each operator and function orthogonal.
3. function Corrie: A combination of parameters can be customized to generate user-defined functions
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.