Kafka is a high-throughput distributed message publishing and subscription system. It has the following features:
- The O (1) disk data structure provides message persistence. This structure can maintain stable performance for even the storage of several terabytes of messages for a long time.
- High throughput: even a very common hardware Kafka supports 100,000 messages per second.
- Messages can be partitioned by Kafka servers and consumer clusters.
- Supports parallel hadoop data loading.
Kafka aims to provide a publish/subscribe solution that can process all the action stream data in a website with a consumer scale. Such actions (Web browsing, search, and other user actions) are a key factor in many social functions on modern networks. This data is usually solved by processing logs and log aggregation due to throughput requirements. This is a feasible solution for log data and offline analysis systems like hadoop that require real-time processing. Kafka aims to unify online and offline message processing through the parallel loading mechanism of hadoop, and also to provide real-time consumption through cluster machines.
Kafka logo distributed publishing and subscription message system Kafka