Here to the current industry open source of some real-time stream processing system to do a summary, as a reference for future technical research.
S4
S4 (Simple scalable streaming System) is Yahoo's latest release of an open source computing platform, it is a general, distributed, extensible, with partition fault tolerance, support plug-in distributed flow computing platform, On this platform, programmers can easily develop applications for unbounded uninterrupted stream data processing and develop the language as Java.
Project Link: http://incubator.apache.org/s4/(note: S4 0.5.0 has supported TCP link and state recovery features)
Storm
Storm is a distributed real-time computing system for Twitter's open source, and Storm uses simple APIs to enable developers to reliably handle unbounded, continuous streaming data, real-time computing, development languages Clojure and Java, and non-JVM languages that can be stdin/ StdOut communicates with Storm in the JSON format protocol. There are many application scenarios for storm: Real-time analytics, online machine learning, continuous computing, distributed RPC, ETL processing, and more.
Project Link: http://storm-project.net
Streambase
Streambase is a platform for complex event processing (CEP) and event stream processing. It is a commercial application, but it offers the developer Edition, the development language for Java.
Project Link: http://www.streambase.com
Hstreaming
Built on top of Hadoop, hstreaming can be combined with Hadoop and its ecosystem to provide real-time streaming computing services. This allows hstreaming users to analyze and process big data in the same ecosystem and develop the language as Java.
Project Link: http://www.hstreaming.com
Esper & Nesper
Esper is a streaming platform dedicated to complex event processing (CEP), with Java version Esper. NET version is nesper. Esper & Nesper makes it easy for developers to quickly develop applications that handle large-volume messages and events, whether historical or real-time.
Project Link: http://esper.codehaus.org
Kafka
Kafka is a high-throughput, pub-sub-based distributed messaging system that was developed by LinkedIn in December 2010, primarily for the processing of active streaming data and the development of the language Scala.
Project Link: Http://incubator.apache.org/kafka
Scribe
scribe is Facebook's Open source log collection system, developed in C, and supported by thrift to support a wide range of commonly used client languages, and has been widely used within Facebook. It collects logs from a variety of log sources, stores them on a central storage system (which can be NFS, distributed file systems, etc.) for centralized statistical analysis processing. It provides a scalable, high-fault-tolerant solution for "distributed collection, unified processing" of logs. scribe is typically used in conjunction with Hadoop, scribe is used to push logs into HDFs, and Hadoop is processed periodically by mapreduce jobs.
Project Link: http://github.com/facebook/scribe
Flume
Flume is a distributed, reliable, and highly available log collection system Cloudera provides for the collection, aggregation, and movement of large volumes of log data, and the development language is java. Flume supports the customization of various data senders in the log system for data collection, while Flume provides the ability to simply process the data and write to various data receivers (customizable).
Project Link: http://incubator.apache.org/flume
Technical research Reference--industry open source real-time stream Processing System summary