The core concept of the cascading API is piping and streaming. A pipeline is a series of processing steps (parsing, looping, filtering, and so on) that define the data processing to be performed, and the flow is the union of pipelines with data sources and data receivers (Data-sink). Cascading is a new data processing API for Hadoop clusters that uses expressive APIs to build complex processing workflows rather than directly implement Hadoop mapreduce algorithms.
The processing API allows developers to quickly assemble complex distributed processes without having to "consider" MapReduce. It can also be efficiently scheduled based on dependencies between processes and other meta data information. The core concept of the cascading API is piping and streaming. A pipeline is a series of processing steps (parsing, looping, filtering, and so on) that define the data processing to be performed, and the flow is the union of pipelines with data sources and data receivers (Data-sink). In other words, the stream is the channel through which data is passed. Further, Cascade is the link, branch, and grouping of multiple streams. The API provides a number of key features:
Based on dependency topology scheduling (Toplogical Scheduler) and MapReduce planning-this is the two key components of the cascading API that can be scheduled based on calls that rely on convection, because their execution order is independent of the construction order, This allows concurrent invocation of partial streams and cascades. In addition, the steps of various streams are intelligently converted into map-reduce calls that correspond to Hadoop cluster. Event notification-The various steps of a stream can be notified by a callback to inform the host of the process of reporting and responding to data processing. The Scripting--cascading API has scripted interfaces for Jython, groovy, and JRuby-which makes it suitable for common dynamic JVM languages
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.