Today's meeting to discuss why log processing uses both Flume and Kafka, is it possible to use only Kafka without Flume? The idea was to use only the Flume interface, whether it is the input interface (socket and file) and the output interface (kafka/hdfs/hbase, etc.).
Consider a single scenario, and from a simplified system perspective, it might be better to use only one that meets the needs of the application. However, in view of the existing system business development, in order to later flexible expansion, prior to the system design to leave a certain sense of extensibility is more important, may use the Flume+kafka architecture relative to use only Kafka will occupy 1-2 machines to do flume log collection, However, in order to facilitate the future expansion of log data processing methods, Flume+kafka architecture can be used.
Flume: Pipeline----Individuals think it is more appropriate to have multiple producer scenarios, or to have scenarios written to HBase, HDFs, and Kafka requirements.
KAFKA: Message Queuing-----Because Kafka is pull mode, it is suitable for scenarios with multiple consumers.
At present, a log forwarder is responsible for generating the log in the application scenario. Backend needs to pass Strom consumption log information, suggest can be set into log-->kafka-> Strom. If there is a need to write to HBase or HDFs later, connect the Strom after Kafka, or log on to the log forwarder, and read the log message by Flume.
Reference:
The difference between Kafka and Flume
Comparison of Kafka and Flume
Flume-based Log collection system
Using Flume Essentials
The difference between Flume and Kafka