Kakfa message delivery semantics and kakfa delivery Semantics
Message Delivery Semantics
- At most once -- Messages may be lost but are never redelivered (Messages may be lost but will not be delivered repeatedly)
- At least once -- Messages are never lost but may be redelivered (Messages will not be lost but may be delivered repeatedly)
- Exactly once -- this is what people actually want, each message is delivered once and only once (messages are shipped only once)
Many systems claim to provide "exactly once" delivery, but it is important to read it carefully. Most of these claims are misleading (they do not consider the possible failure of producers and consumers, and when multiple consumer processes process simultaneously, and data written to the disk may be lost ).
The message delivery semantics of Kafka is direct. When publishing a message, we have a concept that the message is submitted to the log. Once a published message is submitted to the log, the message will not be lost if the broker in the partition where the message is located is active. After version 0.11.0.0, The Kafka producer supports the idempotent shipping option to ensure that no duplicate entries exist in the log even if the message is resent. To achieve this, the broker does not specify an ID for each message and a serial number for each message.
Not all situations require such a strong guarantee.
Now, let's look at this semantics from the consumer's perspective. The consumer uses logs to control its location. If the consumer does not crash, it simply stores the location in the memory, but if the consumer fails, we want another process to take over the partition, then the new process needs to select a proper location for processing. Let's take a look at several options for the consumer to read the Post-processing message and update location of the message.
The consumer's position is stored as a message in a topic, so we can write the offset to Kafka in the same transaction as the output topics processing the processed data. if the transaction is aborted, the consumer's position will revert to its old value and the produced data on the output topics will not be visible to other consumers, depending on their "isolation level. "In the default" read_uncommitted "isolation level, all messages are visible to consumers even if they were part of an aborted transaction, but in" read_committed, "the consumer will only return messages from transactions which were committed (and any messages which were not part of a transaction ).
When the topic is consumed and produced to another topic, we can use the transaction producer. The consumer's location is stored as a message in the topic, so that we can write the offset to kafka in the transaction that receives and processes the data. If the transaction is aborted, the consumer's position will be restored to the old value and the generated data is invisible to other consumers, depending on the isolation level. The default isolation level is "read_uncommitted", indicating that the message is visible to all consumers, even if some messages come from the aborted transaction.
Summary
1. Message Delivery Semantics
Up to once: messages may be lost but will not be delivered repeatedly
At least once: messages are not lost but may be delivered repeatedly.
Precise once: only once
2. kafka assigns an ID to each producer and a serial number to each published message. In this way, even if the producer repeatedly sends the message, no duplicate records will be recorded in the commit log.
3. from the consumer's point of view, after saving the location, the message is "at most once"; after processing the message, the storage location is "at least once"; as for "precise once ", the transaction producer can be used to receive and process messages in the same transaction and save the position (offset) to another topic. As long as the transaction succeeds, we are all happy. If the transaction fails, the location is restored.
Reference http://kafka.apache.org/documentation/#design