Kafka is a distributed, publish / subscribe-based messaging system. The main design goals are as follows: ● Provide the message persistence capability in such a way that the time complexity is O (1), ensuring constant time access even for data above terabyte level ● High throughput. Even in very low-cost commercial machines can also be stand-alone support for 100K messages per second transmission ● Support Kafka Server message partitioning and distributed consumption, while ensuring the message within each partition in order to transmit ● Support both offline data Processing and real-time data processing
The above is an official introduction to kafka, kafka can see the main problem you want to solve is the problem of data reading, summed up the function has two points: Data must be processed So, in my opinion, kafka can be thought of as a "buffer" ("memory"?) On a distributed system, whereas kafka has similarities to a database, and many concepts can be used HDFS analogy.
Concept in Kafka
● Broker: Kafka cluster contains one or more servers, this server is called broker. Find an analogy: HDFS DataNode. ● Topic Each message posted to a Kafka cluster has a category called this topic. (The physically different topic's messages are stored separately. Logically, a topic's messages are stored or consumed on one or more brokers, but users need only specify the topic of the message to produce or consume the data without having to worry about where the data resides) ● Partition Parition is a physical concept, each topic contains one or more partitions, the parition number can be specified when creating a topic. Each partition corresponds to a folder that holds the partition's data and index files ● Replication The emergence of replication in order to achieve the sustainability and high availability of resources, in other words, to ensure that data will not be easily lost in a single node downtime. ● Leader Election There are two concepts of leader election in kafka: 1. Based on zookeeper leader election, the object is consumer. Kafka manages the cluster configuration through Zookeeper, elects the leader, and rebalances when the consumer group changes. 2. Leader for partition election, the object is replication. After the introduction of replication, the same partition may have multiple replication, and then need to select a leader in these replication, producer and consumer only interact with this leader, the other replica as a follower to copy data from the leader. Producer The producer of the message, responsible for posting news to the Kafka broker. ● Consumer Consumer, each consumer belongs to a specific consumer group (for each consumer specified group name, if you do not specify the group name belongs to the default group). When using the consumer high level API, a message of the same topic can only be consumed by a consumer within the same consumer group, but multiple consumer groups can consume this message simultaneously.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.