Kafka 1, 0.8

Last Update:2018-12-05 Source: Internet

Author: User

Tags server hosting

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Document directory

Kafka replication high-level design

Https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.8+Quick+Start

0.8 is a huge step forward in functionality from 0.7.x

This release includes the following major features:

Partitions are now replicated.Supports partition copies to avoid data loss caused by broker failure.
Previusly the topic wowould remain available in the case of Server failure, but individual partitions within that topic cocould disappear when the server hosting them stopped. if a broker failed permanently any unconsumed data it hosted wocould be lost.
Starting with 0.8 all partitions have a replication factor and we get the prior behavior as the special case where replication factor = 1.
Replicas have a notion of committed messages and guarantee that committed messages won't be lost as long as at least one replica ves. Replica logs are byte-for-byte identical reply SS replicas.
Producer and consumer are replication aware.Support for producer and consumer of replica
When running inSyncMode, by default, the producer send () request blocks until the messages sent is committed to the active replicas. as a result the sender can depend on the guarantee that a message sent will not be lost.
Latency sensitive producers have the option to tune this to block only on the write to the leader broker or to run completelyAsyncIf they are willing to forsake this guarantee.
The consumer will only see messages that have been committed.
The consumer has been moved to a "Long poll" model where fetch requests block until there is data available.
This enables low latency without frequent polling. In general End-to-End message latency from producer to broker to consumer of only a few milliseconds is now possible.
We now retain the key used in the producer for partitioning with each message, so the consumer knows the partitioning key.
The key used by the producer for partitioning is saved, and the consumer knows the key.
We have moved from directly addressing messages with a byte offset to using a logical offset (I. e. 0, 1, 2, 3 ...).Use logical offset to replace the previous physical offset
The offset still works exactly the same-it is a monotonically increasing number that represents a point-in-time in the log-but now it is no longer tied to byte layout.
This has several advantages:
(1) It is aesthetically (aesthetic point of view) Nice,
(2) It makes it trivial to calculate the next offset or to traverse messages in reverse order,
(3) It fixes a corner case (extreme) interaction between consumer commit () and compressed message batches. Data is still transferred using the same efficient zero-copy mechanic as before.
We have removed the zookeeper dependency from the producer and replaced it with a simple cluster Metadata API.
We now support multiple Data Directories (I. e. A jbod setup ).
We now expose both the partition and the offset for each message in the high-level consumer.
Expose specific partition and offset information in High-level consumer
We have substantially improved our integration testing, adding a new integration test framework and over 100 distributed regression and performance test scenarios that we run on every checkin.

In my opinion, the main changes

1. Increase the security of the broker. In the original solution, the broker's fail will cause data loss. It is a bit too difficult to say, so the replica feature is necessary.

2. Logic offset is used. The advantages mentioned above are described. However, when physical offset is used, a bunch of advantages are also described.
In fact, it is the balance of efficiency and ease of use. Previously, for the pursuit of efficiency, we used physical offset.
Now, considering that physical offset is too difficult to use, we have to make a compromise and change it to logical offset. There is no difference in nature, but we only need to add a ing between logical offset and physical offset, to make the physical offset transparent to users

3. Better support for python, Kafka-Python

Pure Python implementation with full protocol support. consumer and producer implementations supported ded, gzip and snappy compression supported.

Maintainer:David Arthur
License:Apache v.2.0

Https://github.com/mumrah/kafka-python

Kafka replication high-level design

Https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Replication

In fact, Kafka only needs to learn from the nosql solution.

There are mainly two problems that we need to solve here:

How to assign replicas of a partition to broker servers evenly?
For a given partition, how to propagate every message to all replicas?

Replica placements

Solve the first problem, if the even distribution of partition replicas?
The formulas in the original text do not know how to get them, nor do they match the graph. This is strange.
In short, this is a simple problem. It seems that the design does not take into account the distribution of the current broker's partition replica, but is allocated based on the number of partition, number of replica and number of brokers.
In addition, the final allocation result will be recorded in zookeeper, so there is no consistent hash problem. When the broker list changes, a large number of replica migration is not required.

For each topic, we want to divide the partitions evenly among all the brokers.
We sort the list of brokers and the list of partitions. If there are n brokers, we assign the ith partition to the (I mod n) Th broker.

The first replica of this partition will reside on this assigned broker and is referred to as the preferred replica of this partition. we want to place the other replicas in such a way that if a broker is down, its load is spread evenly to all other ving brokers, instead of to a single one. in order to achieve that, suppose there are m partitions assigned to a broker I. the jth replica of partition K will be assigned to broker (I + J + k) mod n.

We store the information about the replica assignment for ach partition inZookeeper.

Incrementally add brokers online

When a new broker is added, We will automatically move some partitions from existing brokers to the new one.
Out goal isMinimizeThe amount of data movement while maintaining a balanced load on each broker.

When a new broker is added, although a large amount of data migration is not required, a small amount of data needs to be migrated.
Given the algorithm in the original article, randomly select M/N partitions to move to B, in fact, and simply just randomly select M/N

Data Replication

There are two common strategies for keeping replicas in sync,Primary-Backup ReplicationAndQuorum-based Replication.

In primary-Backup replication, the leader waits until the write completes on every replica in the group before acknowledging the client. if one of the replicas is down, the leader drops it from the current group and continues to write to the remaining replicas. A failed replica is allowed to rejoin the group if it comes back and catches up with the leader. with F replicas, primary-Backup replication can tolerate F-1 failures.
In the quorum-based approach, the leader waits until a write completes on a majority of the replicas. the size of the replica Group doesn't change even when some replicas are down. if there are 2f + 1 replicas, quorum-based replication can tolerate F replica failures. if the leader fails, it needs at least F + 1 replicas to elect a new leader.

In both cases, one replica is designated as the leader and the rest of the replicas are called followers. All write requests go through the leader and the leader propagates the writes to the follower.

There are tradeoffs between the 2 approaches:

The quorum-based approach has better write latency than the primary-Backup one. A delay (e.g ., long GC) in any replica increases the write latency in the latter, but not the former.
Given the same number of replicas, the primary-Backup approach tolerates more concurrent failures.
A replication factor of 2 works well with the primary-Backup approach. In quorum-based replication, both replicas have to be up for the system to be available.

WeChoose the primary-Backup replication in KafkaSince it tolerates more failures and works well with 2 replicas. A hiccup can happen when a replica is down or becomes slow. however, those are relatively rare events and the hiccup time can be partitioned by tuning varous timeout parameters.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More