Technical research Reference--industry open source real-time stream Processing System summary

Source: Internet
Author: User

Here to the current industry open source of some real-time stream processing system to do a summary, as a reference for future technical research.

S4

S4 (Simple scalable streaming System) is Yahoo's latest release of an open source computing platform, it is a general, distributed, extensible, with partition fault tolerance, support plug-in distributed flow computing platform, On this platform, programmers can easily develop applications for unbounded uninterrupted stream data processing and develop the language as Java.

Project Link: http://incubator.apache.org/s4/(note: S4 0.5.0 has supported TCP link and state recovery features)

Storm

Storm is a distributed real-time computing system for Twitter's open source, and Storm uses simple APIs to enable developers to reliably handle unbounded, continuous streaming data, real-time computing, development languages Clojure and Java, and non-JVM languages that can be stdin/ StdOut communicates with Storm in the JSON format protocol. There are many application scenarios for storm: Real-time analytics, online machine learning, continuous computing, distributed RPC, ETL processing, and more.

Project Link: http://storm-project.net

Streambase

Streambase is a platform for complex event processing (CEP) and event stream processing. It is a commercial application, but it offers the developer Edition, the development language for Java.

Project Link: http://www.streambase.com

Hstreaming

Built on top of Hadoop, hstreaming can be combined with Hadoop and its ecosystem to provide real-time streaming computing services. This allows hstreaming users to analyze and process big data in the same ecosystem and develop the language as Java.

Project Link: http://www.hstreaming.com

Esper & Nesper

Esper is a streaming platform dedicated to complex event processing (CEP), with Java version Esper. NET version is nesper. Esper & Nesper makes it easy for developers to quickly develop applications that handle large-volume messages and events, whether historical or real-time.

Project Link: http://esper.codehaus.org

Kafka

Kafka is a high-throughput, pub-sub-based distributed messaging system that was developed by LinkedIn in December 2010, primarily for the processing of active streaming data and the development of the language Scala.

Project Link: Http://incubator.apache.org/kafka

Scribe

scribe is Facebook's Open source log collection system, developed in C, and supported by thrift to support a wide range of commonly used client languages, and has been widely used within Facebook. It collects logs from a variety of log sources, stores them on a central storage system (which can be NFS, distributed file systems, etc.) for centralized statistical analysis processing. It provides a scalable, high-fault-tolerant solution for "distributed collection, unified processing" of logs. scribe is typically used in conjunction with Hadoop, scribe is used to push logs into HDFs, and Hadoop is processed periodically by mapreduce jobs.

Project Link: http://github.com/facebook/scribe

Flume

Flume is a distributed, reliable, and highly available log collection system Cloudera provides for the collection, aggregation, and movement of large volumes of log data, and the development language is java. Flume supports the customization of various data senders in the log system for data collection, while Flume provides the ability to simply process the data and write to various data receivers (customizable).

Project Link: http://incubator.apache.org/flume

Technical research Reference--industry open source real-time stream Processing System summary

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.