Kafka controller Election Process Analysis

Source: Internet
Author: User
Tags failover

Tag: Create connection utils DUP top SSI handle code result

1. Overview when using kafka at ordinary times, more attention may be paid to the Kafka system layer. Let's take a look at the Kafka controller and understand the election process of the Kafka controller. 2. The content Kafka controller is actually a broker of the Kafka system. In addition to the common broker function, it also has the function of selecting the leader node in the topic partition. When the Kafka system is started, one of the brokers will be elected as controllers, responsible for managing the topic partitions and replica statuses, and will execute the management tasks of partition reallocation. If the current controller fails during the Kafka system operation, the Kafka system will re-elect a new controller from other normal brokers. 2.1 The Controller startup sequence is in the Kafka cluster. Each broker instantiates a kafkacontroller class at startup. This class executes a series of business logic and selects the leader node of the topic partition. The steps are as follows:

The first started proxy node will create a temporary node/controller in the zookeeper system, and write the registration information of the node to make the node a controller;
When other proxy nodes are successively started, they will try to create/controller nodes in the zookeeper system. However, because/controller nodes already exist, therefore, the message "Create/controller node failure exception" is thrown. If a proxy node fails to be created, the system determines that a controller has been successfully created in the Kafka Cluster Based on the returned results, this ensures the uniqueness of the controller of the Kafka cluster;
Other proxy nodes register corresponding listeners on the controller. each listener monitors the status changes of the respective proxy nodes. When the status of the monitored node changes, the corresponding listening function is triggered for processing.
2.2 how to view the Controller priority?

The Controller creation priority is based on the order in which the Kafka system proxy node is successfully started. You can view the Controller creation priority by changing the startup sequence of the proxy node of the Kafka system. Then, you can view the/controller temporary node content in the zookeeper system, for example:
Enter the zookeeper Cluster

[[Email protected] bin] $ zkcli. Sh-server dn1: 2181

Run the command

[ZK: dn1: 2181 (connected) 1] GET/Controller
After the command is successfully executed, you can see that the controller is successfully created on proxy node 0 (dn1 node), as shown in:

Image Description (50 words at most)

The current startup sequence is: dn1, dn2, dn3, and the order is changed to: dn3, dn1, and dn2. Run the "get/controller" command in the zookeeper system again. The output result is as follows:

Image Description (50 words at most)

2.3 When the controller is closed or disconnected from the zookeeper system, the temporary nodes on the zookeeper system will be cleared. The listener in the Kafka cluster receives the change notification. Each proxy node attempts to create a temporary node for the Controller in the zookeeper system. The first proxy node successfully created in the zookeeper system will become a new controller. Each newly elected controller obtains an incremental controller_epoch value in the zookeeper system. 3. the core idea of the election controller in the election process of the leader node in the topic partition is: Each proxy node competes fairly to seize the temporary/controller nodes created in the zookeeper system. The first successfully created proxy node will become the controller, it also has the function of selecting the leader node in the topic partition. Shows the election process:

Image Description (50 words at most)

When the Kafka system instantiates the kafkacontroller class, the election process of the leader node in the topic partition begins. The core classes involved include kafkacontroller, zookeeperleaderelector, leaderchangelistener, and sessionexpirationlistener.

Kafkacontroller: when instantiating the zookeeperleaderelector class, two key callback functions are set: oncontrollerfailover and oncontrollerresignation;
Zookeeperleaderelector: implements the leader node election function for the topic partition, but it does not handle "Session Timeout between the proxy node and the zookeeper system, it is mainly responsible for creating metadata storage paths, instantiating and changing listeners, and monitoring data changes in real time by subscribing to Data Change listeners, and then starting to execute the leader election logic;
Leaderchangelistener: If the node data is changed, other proxy nodes in the Kafka system may have become the leader. Then, the Kafka controller calls the onresigningasleader function. When the Kafka proxy node goes down or is accidentally deleted, the leader on the node will be re-elected, call the onresigningasleader function to reselect other normal proxy nodes to become the new leader;
Sessionexpirationlistener: When the proxy node of the Kafka system establishes a connection with the zookeeper system, the handlenewsession function in the sessionexpirationlistener is called. The session expiration connection in the zookeeper system is first judged.
4. register the partition and copy state machine

The Controller of the Kafka system is mainly responsible for managing topics, partitions, and copies. When the Kafka system operates on topics, partitions, and copies, the Controller registers a series of listeners on the/brokers/topics node of the zookeeper System and Its subnode paths. When a topic is created using the Kafka application interface or a Kafka system script, the server returns the created result to the client. When the client receives the prompt that the topic is successfully created, the server does not actually create a topic, but only creates the subnode name corresponding to the topic in the/brokers/topics node of the zookeeper system. The proxy node calls the onbecomingleader () function and actually calls the oncontrollerfailover () function. Therefore, when the Controller calls the oncontrollerfailover () function, it will create the partition state machine and replica state machine respectively in the initialization phase. The Code is as follows:

Def oncontrollerfailover (){
If (isrunning ){
Info ("broker % d starting become controller state
Transition ". Format (config. brokerid ))
Readcontrollerepochfromzookeeper ()
Incrementcontrollerepoch (zkutils. zkclient)
// Register the listener on the/brokers/topics Node
Registerreassignedpartitionslistener ()
Registerisrchangenotificationlistener ()
Registerpreferredreplicaelectionlistener ()
Partitionstatemachine. registerlisteners () // register the partition state machine
Replicastatemachine. registerlisteners () // register the copy state machine
Initializecontrollercontext ()
// After the Controller initialization, send a metadata update request before the state machine starts.
Sendupdatemetadatarequest (controllercontext. liveorshuttingdownbrokerids. toseq)

Replicastatemachine. startup () // start the copy state machine partitionstatemachine. startup () // start the partition state machine // register a partition for all topics in auto failover to change the listener controllercontext. alltopics. foreach (topic => partitionstatemachine. registerpartitionchangelistener (topic) Info ("broker % d is ready to serve as the new controller with epoch % d ". format (config. brokerid, EPOCH) maybetriggerpartitionreassignment () maybetriggerpreferredreplicaelection () if (config. autoleaderrebalanceenable) {Info ("starting the partition rebalance schedance") autorebalancescheduler. startup () autorebalancescheduler. schedule ("partition-rebalance-thread", checkandtriggerpartitionrebalance, 5, config. leaderimbalancecheckintervalseconds. tolong, timeunit. seconds)} deletetopicmanager. start ()} else Info ("controller has been shut down, aborting startup/failover ")

}
The topic's partition state machine registers two listeners, topicchangelistener and deletetopiclistener, on the/brokers/topics node in the zookeeper system through the registerlisteners () function. When a topic is created, topic information, topic partitions, and copies are written to the/brokers/topics node of the zookeeper system, which triggers the registration listener for the partition and replica state machine.

5. To sum up the Kafka system, debugging is quite convenient. Download the Kafka source code and import it to the IDE. Then you can start the entire Kafka system. You can use DEBUG to learn about the execution process of the controller.

Kafka controller Election Process Analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.