Roaming Kafka The transaction definition of the message transmission of the design article

Source: Internet
Author: User

Before discussing how consumer and producer work, now let's discuss the data transfer aspect. Transaction definitions for data transfer typically have the following three levels:

    1. At most one time: messages are not sent repeatedly and are transmitted at most once, but they may not be transmitted at one time.
    2. at least once: messages are not sent out, they are transmitted at least once, but they can also be transmitted repeatedly.
    3. accurate once (exactly once): does not leak the transmission also does not repeat the transmission, each message transmits once and only then transmits once, this is everybody hoped.

Most messaging systems claim to be "accurate once", but reading their documents carefully can be misleading, such as not explaining what happens when consumer or producer fail, or when multiple consumer are parallel. Or when writing to the hard disk data is lost. Kafka's approach should be more advanced. When publishing a message, Kafka has a concept of "committed", and once the message is committed, the data is not lost as long as the copy of the partition where the message is written is active. The concept of the activity of a replica is discussed in the next section of the document. Now assume that the broker is not down.
If a network error occurs when producer publishes a message, but it is not certain that the actual commit occurred before or after the commit, although this is not common, it must be considered, and now the Kafka version has not resolved the issue, and future versions are trying to resolve it.
Not all situations require a high level of "exact once", Kafka allows producer to specify a flexible level. For example, producer can specify that a notification must wait for a message to be committed, or to send the message completely asynchronously without waiting for any notification, or just wait for leader to declare that it has received the message (followers is not necessary).

Now consider this issue from the consumer aspect, all replicas have the same log file and the same offset,consumer maintain their own consumption of the message offset, if the consumer will not crash of course, can save this value in memory, of course, there is no guarantee of this. If consumer crashes, there will be another consumer. Then consume the message, and it needs to continue processing from a suitable offset. In this case, you have the following options:

      • Consumer can read the message first, then write offset to the log file, and then process the message. There is a possibility that after the storage of offset has not processed the message is crash, the new consumer continue to handle from this offset, then some messages will never be processed, this is said "at most once."
      • Consumer can read the message, process the message, and finally record the offset, of course, if the crash before the record offset, the new consumer will repeat the consumption of some messages, this is said "at least once."
      • "Exact once" can be resolved by dividing the submission into two phases: once the offset has been saved, the message is processed successfully and then submitted again. But there's a simpler way to do this: Save the offset of the message and the result of the message being processed. For example, when processing a message with Hadoop ETL, the processed result and offset are stored in HDFS simultaneously, so that both the message and the Offser are processed at the same time.

Roaming Kafka The transaction definition of the message transmission of the design article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.