cascading--data Processing API for Hadoop MapReduce

Source: Internet
Author: User
Keywords Hadoop mapreduce
Tags api based data data processing data sources hadoop processing sources

The core concept of the cascading API is piping and streaming. A pipeline is a series of processing steps (parsing, looping, filtering, and so on) that define the data processing to be performed, and the flow is the union of pipelines with data sources and data receivers (Data-sink). Cascading is a new data processing API for Hadoop clusters that uses expressive APIs to build complex processing workflows rather than directly implement Hadoop mapreduce algorithms.

The processing API allows developers to quickly assemble complex distributed processes without having to "consider" MapReduce. It can also be efficiently scheduled based on dependencies between processes and other meta data information. The core concept of the cascading API is piping and streaming. A pipeline is a series of processing steps (parsing, looping, filtering, and so on) that define the data processing to be performed, and the flow is the union of pipelines with data sources and data receivers (Data-sink). In other words, the stream is the channel through which data is passed. Further, Cascade is the link, branch, and grouping of multiple streams. The API provides a number of key features:

Based on dependency topology scheduling (Toplogical Scheduler) and MapReduce planning-this is the two key components of the cascading API that can be scheduled based on calls that rely on convection, because their execution order is independent of the construction order, This allows concurrent invocation of partial streams and cascades. In addition, the steps of various streams are intelligently converted into map-reduce calls that correspond to Hadoop cluster. Event notification-The various steps of a stream can be notified by a callback to inform the host of the process of reporting and responding to data processing. The Scripting--cascading API has scripted interfaces for Jython, groovy, and JRuby-which makes it suitable for common dynamic JVM languages

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.