Kafka Learning Path (iii)--Advanced

Last Update:2016-07-19 Source: Internet

Author: User

Tags zookeeper client

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Design principle

Kafka is designed to be a unified information gathering platform that collects feedback in real time and needs to be able to support large volumes of data with good fault tolerance.

Durability

Kafka using files to store messages directly determines that Kafka relies heavily on the performance of the file system itself. And no matter what OS, the optimization of the file system itself is almost impossible. File Cache/ Direct memory mapping is a common means. Because Kafka is a append operation on a log file, the cost of disk retrieval is small, and in order to reduce the number of disk writes, the broker temporarily buffer the message, when the number (or size) of the message When a certain threshold is reached, it is then flush to disk, which reduces the number of disk IO calls.

Performance

There are many performance points to consider, and in addition to disk IO , we also need to consider network IO, which is directly related to the Kafka throughput problem. Kafka did not provide too many superb skills; for the producer end, the message can be buffer up, when the number of messages reached a certain threshold, the bulk sent to the broker; The same is true for consumer , where bulk fetch multiple messages. However, the size of the message volume can be specified by a configuration file. For the Kafka broker side, there is a sendfile system call that can potentially improve the performance of network IO: Mapping the file's data into system memory, the socket reads the corresponding memory area directly, without having to copy and swap the process again. In fact, for producer/consumer/broker, CPU expenditure should be small, so enabling the message compression mechanism is a good strategy , compression requires a small amount of CPU resources, but for Kafka, Network IO should be considered more. You can compress any messages that are transmitted over the network. Kafka supports a variety of compression methods such as Gzip/snappy.

Producers

Load Balancer: producer will hold the socket connection with all partition leader under topic, and the message is sent directly from the socket to the broker by producer, without any " Routing layer. In fact, the message is routed to which partition, which is determined by the producer client . For example, "random" "Key-hash" "polling" and so on, if a topic partitions, it is necessary to implement "message balanced distribution" on the producer side.

where partition leader 's position (Host:port) is registered in zookeeper, producer as the Zookeeper client, A Change event has been registered for watch to monitor partition leader.

Asynchronous send: A number of messages in the client buffer at the moment, and send them in bulk to broker, small data io too much, will slow down the overall network latency, batch delay delivery in fact, improve network efficiency. However, there are some pitfalls, such as when producer fails, messages that have not yet been sent will be lost.

Consumers

Consumer The end sends a "fetch" request to the broker and informs it to get the offset of the message, and thereafter the consumer will get a certain number of messages; the consumer can reset offset to re-consume the message.

in the JMS implementation, the topic model is based on push, which is where the broker pushes the message to the consumer side. however , in the Kafka, the Pull method is used, that is, consumer after the broker to establish a connection, the initiative to pull (or fetch) the message, the model has some advantages, First, the consumer can be based on their own consumption ability to fetch the message and processing, and can control the progress of the message consumption (offset), in addition, consumers can have good control of the amount of message consumption, batch fetch.

For other JMS implementations, the location of the message consumption is reserved by producer in order to avoid sending messages repeatedly or resend messages that do not have a successful consumption, and also to control the state of the messages. This requires that the JMS broker needs too much extra work. in Kafka , the message in partition is only a consumer in consumption, and there is no control of the state of the message, there is no complex message confirmation mechanism, the kafkabroker end is quite lightweight. When the message is received by consumer, Consumer can save the offset of the last message locally and register the offset with zookeeper intermittently. This shows that the consumer client is also lightweight.

Message delivery mechanism

for JMS implementations, the message transfer guarantee is straightforward: there is only one time (exactly once). Slightly different in Kafka:

1) at the most once: up to once, this is similar to the "non-persistent" message in JMS. Send once, regardless of success or failure, will not resend.

2) at least once: the message is sent at least once, and if the message fails to accept success, it may be re-sent until it is successfully received.

3) exactly once: messages are sent only once.

At the most once: consumer fetch message, then save offset, then process the message; When the client saves the offset, but an exception occurs during message processing, some messages fail to continue processing. Then the "unhandled" message will not be fetch until then, This is "atmost once".

At least once: the consumer fetch message, then processes the message, and then saves the offset. If the message processing succeeds, but the save operation fails to succeed when the zookeeper exception is saved in the offset phase. This leads to the possibility of getting the last message that has been processed in the next fetch, which is "at least once", because the offset is not submitted to zookeeper,zookeeper in a timely manner or to the previous offset state.

Exactly Once:kafka is not strictly implemented (based on 2-phase commit, transaction), we think this strategy is not necessary in Kafka.

Usually "At-least-once" is our first choice. (The data received is always better than the lost data compared to the at once).

6. Copy Backup

Kafka Copy each partition data to multiple servers, any one partition has one leader and multiple follower (can not); The number of backups can be set through the broker configuration file. Leader handles all Read-write requests, follower needs to be synchronized with leader. Follower and consumer, consumer messages are saved in a local log; leader responsible for tracking all follower status, if follower is "behind" too much or fails, leader will remove it from the Replicas Sync list. This message is considered "committed" when all follower save a message, so consumer can consume it at this point . Even if only one replicas instance survives, The message can still be sent and received normally, as long as the zookeeper cluster survives. (Unlike other distributed storage, such as hbase requires a "majority" to survive.)

When the leader fails, it is necessary to select a new leader in the followers, perhaps follower behind the leader, so you need to choose a "up-to-date" follower. When choosing follower, you need to take into account a problem is the number of partition leader that are already hosted on the new leaderServer , if there are too many partition leader on one server, This means that the server will be subjected to more IO pressure. In the election of new leader, "load balancing" needs to be considered.

Log

If the name of a topic is "my_topic" and it has 2 partitions, the log will be stored in the MY_TOPIC_0 and my_topic_1 two directories, and a sequence of "Log Entries" (log entries) is saved in the log file, each log The entry format is "4 bytes of number n means the length of the message" + "n bytes of message content"; Each log has an offset to uniquely mark a message with an offset value of 8 bytes that represents the starting position of this message in this partition: At the physical storage level, each partition has multiple logfile (called segment). The segmentfile is named "Minimum offset". Kafka. For example "00000000000.kafka", where "minimum offset" Offset that represents the start message in this segment.

The segments list information held in each of these partiton is stored in zookeeper.

a new file will be created when the segment file size reaches a certain threshold (which can be set by the profile, default 1G). when the number of messages in buffer reaches the threshold, the log information is triggered to flush to the log file, and the flush to log file is also triggered if " distance from the last flush" reaches the threshold. If the broker fails, It is very likely that messages that have not yet been flush to the file will be lost. Because of unexpected server implementation, the log file format can still be corrupted (tail of the file) Then it is required that when the server Qidong is required to detect whether the last segment file structure is legitimate and make the necessary repairs.

when getting a message, you specify offset and maximum chunk dimensions, offset is used to denote the starting position of the message, and chunk size is used to indicate the total length of the maximum get message (the number of bars representing the message indirectly). Based on offset, You can find the segment file where this message resides, and then, based on the minimum offset value of segment, get its relative position in file and read the output directly.

the deletion policy for log files is simple: start a background thread to periodically scan the log file list and delete files that have been saved longer than the threshold (depending on when the file was created). To avoid deleting files, there is still a read operation (consumer consumption), Take the Copy-on-write way.

simply put, when copying an object it is not true that the data of the original object is copied to another address in memory, but instead points to the same location as the original object in the memory mapping table of the new object, and sets the copy-on-write bit of that memory to 1. When performing a read operation on this object, the memory data is not changed and can be executed directly. At the time of writing, the original object is actually copied to the new address, modify the new object's memory mapping table to this new location, and then write here.

Distribution

Kafka use zookeeper to store meta information and use the Zookeeperwatch mechanism to discover meta-information changes and make corresponding actions (such as consumer failure, triggering load balancing, etc.)

1) Broker node Registry: When a kafkabroker is started, it first registers its own node information (temporary Znode) to zookeeper, and when the broker and zookeeper are disconnected, This znode will also be deleted.

Format:/BROKER/IDS/[0...N]-->host:port; where [0..N] represents the broker ID, each broker's configuration file needs to specify the ID of a numeric type (global non-repeatable). The value of the Znode for this broker's Host:port information.

2) broker Topic Registry: When a broker starts, it registers its own Topic and partitions information to zookeeper, which is still a temporary znode.

Format:/BROKER/TOPICS/[TOPIC]/[0...N] where [0..N] represents the partition index number.

3) Consumer and Consumer Group: when each Consumer client is created, it registers its own information with zookeeper, which is primarily for "load balancing".

Multiple consumer in a group can be interleaved to consume all partitions of a topic; In short, all partitions of this topic are guaranteed to be consumed by this group and consumed for performance reasons. Let the partition be dispersed to each consumer relatively evenly.

4) Consumer ID Registry: Each Consumer has a unique ID (host:uuid, which can be specified by the configuration file or generated by the system), which is used to mark consumer information.

Format:/consumers/[group_id]/ids/[consumer_id]

is still a temporary znode, the value of this node is {"topic_name": #streams ...}, which represents the topic + partitions list currently consumed by this consumer.

5) Consumer offset Tracking: used to track the largest offset in the partition currently consumed by each Consumer.

Format:/consumers/[group_id]/offsets/[topic]/[broker_id-partition_id]-->offset_value

This znode is a persistent node, and you can see that offset is related to group_id to show that when one consumer in group fails, the other consumer can continue to consume.

6) Partition Owner registry: used to mark Partition by which consumer consumption. Temporary Znode format:

/CONSUMERS/[GROUP_ID]/OWNERS/[TOPIC]/[BROKER_ID-PARTITION_ID]-->CONSUMER_NODE_ID the action that is triggered when consumer is started:

A) first carry out "Consumer ID Registry";

B) then register a watch under the "Consumer ID Registry" node to listen for "leave" and "join" of the other Consumer in the current group, as long as this znode path is changed under the node list, Will trigger load balancing for the consumer under this group. (such as a consumer failure, then other consumer take over partitions).

C) under the "Broker ID Registry" node, register a watch to monitor the broker's survival, and if the broker list changes, it will trigger all consumer balance under groups.

1) The Producer end uses zookeeper to "discover" the broker list, and to establish a socket connection and send messages to each Partitionleader under topic.

2) broker -side uses zookeeper to register broker information, and to monitor partitionleader survivability.

3) The Consumer end uses zookeeper to register Consumer information, including partition lists for Consumer consumption, and also to discover broker lists, and partition Leader establishes the socket connection and obtains the message.

Kafka Learning Path (iii)--Advanced

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More