Kafka Connector and Debezium
1. Introduce
Kafka Connector is a connector that connects Kafka clusters and other databases, clusters, and other systems. Kafka Connector can be connected to a variety of system types and Kafka, the main tasks include reading from Kafka (sink), writing to Kafka (source), so connectors can also be divided into two types: Source Connector, sink Connector.
The Kafka itself has only three connector: File connector, Sqlite Connector, HDFs Connector. Such databases as MySQL, MongoDB, etc. need to have a corresponding Third-party library to be registered in the form of plugins to Kafka connector. Here, I use Debezium to provide a jar package for MySQL connector and MongoDB connector.
2.Quickstart
In the process of initiating a connection, we need to specify not only the configuration properties of the connector, but also the schema attributes of Kafka to indicate how to store or read data records.
Command format:
Path/to/kafka/bin/[connector-standalone/connector-distributed] [schema-properties] [connector-properties]
As an example:
./bin/connect-standalone./etc/schema-registry/connect-avro-standalone.properties.
/etc/kafka/ Connect-file-source.properties
In this mode of operation, our Kafka server exists locally, so we can directly run the corresponding connect file to initiate the connection. The configuration of different properties varies according to the specific implementation of Kafka connector. Also we can do the same thing in the Kafka Rest API, and we do that in actual production activities because Kafka server cannot be deployed locally.
Rest API Command format:
Curl-x POST [data] http://[ip]:[port]
By default, the Kafka port is 8083, and we can initiate, terminate, and view a connect through the rest API. This section can refer to the official documentation.
Installation of 3.Debezium
In Kafka, Debezium is registered as a plugins to Kafka connector, with the practice of simply registering the classpath location in Debezium and moving the jar package to/path/to/kafka/share/ A custom folder that starts with Java/kafka-connect.
4.Kafka Connector working principle and partial source code analysis
Adhering to the "source code, there is no secret" principle, now we look at the start-up process of Kafka Connector, we take this as a breakthrough point, to see the overall situation.
First, call Kafka's Startconnector in the rest API to establish the connection:
Private Boolean Startconnector (String connectorname) {
log.info ("Starting connector {}", connectorname);
Final map<string, string> configprops = Configstate.connectorconfig (connectorname);
Final Connectorcontext ctx = new Herderconnectorcontext (this, connectorname);
Final Targetstate initialstate = configstate.targetstate (connectorname);
Boolean started = Worker.startconnector (Connectorname, Configprops, CTX, this, initialstate);
Immediately request configuration Since this could to be a brand new connector. However, also only update those
//task configs if they are actually different from the existing ones to avoid unneces Sary updates when it
//Just restoring an existing connector.
If (started && initialstate = = targetstate.started)
reconfigureconnectortaskswithretry (connectorname);
return started;
}
In this code, we can see clearly that it does these things: Set the properties of the connection, get the name and handle of the connection, set the target State, start a worker's connection, set the successful check connect state, and then set the task.
This part of the connection to the properties of a simple assignment, we expand the state of the Connect, connect to the state of started and paused. Kafka Connector internally through the state machine thought to control each connector;task is the same.
The startconnector in the worker is more complex:
public boolean startconnector (String connname, map<string, string> connprops, C
Onnectorcontext CTX, Connectorstatus.listener Statuslistener, Targetstate initialstate) { if (Connectors.containskey (connname)) throw new Connectexception ("Connector with Name" + Connname + "Al
Ready exists ");
Final Workerconnector Workerconnector;
ClassLoader Savedloader = Plugins.currentthreadloader ();
try {final Connectorconfig connconfig = new Connectorconfig (plugins, connprops);
Final String Connclass = connconfig.getstring (connectorconfig.connector_class_config);
Log.info ("Creating connector {} of type {}", Connname, Connclass);
Final Connector Connector = Plugins.newconnector (Connclass);
Workerconnector = new Workerconnector (connname, connector, CTX, Statuslistener); Log.info ("instantiated connector {} with Version {} of type {}, Connname, Connector.version (), Connector.getclass ());
Savedloader = plugins.compareandswaploaders (connector);
Workerconnector.initialize (Connconfig);
Workerconnector.transitionto (initialstate);
Plugins.compareandswaploaders (Savedloader);
catch (Throwable t) {log.error ("Failed to start connector {}", connname, T);
Can ' t be put in a finally blocks because it needs to is swapped before the call on//Statuslistener
Plugins.compareandswaploaders (Savedloader);
Statuslistener.onfailure (Connname, T);
return false;
} workerconnector existing = Connectors.putifabsent (Connname, workerconnector);
if (existing!= null) throw new Connectexception ("Connector with Name" + Connname + "already exists");
Log.info ("finished creating connector {}", connname);
return true; }
In Startconnector, we can see that it does a few things: Read the registered plugins and create the corresponding connector action class, and create the workerconnector when the status changes are heard. Initialize Workconnector,run workconnector.
Next we go back to startconnector and see that it also executes reconfigureconnectortaskswithretry, which is used to check whether a task is already executing in its previous history, and if so, the task's properties, state, Context, all loaded:
private void Reconfigureconnector (final String connname, final callback<void> CB) {try {if (!worker.isrunning (Connname))
{Log.info ("skipping reconfiguration of connector {} since it is not running", connname);
Return
map<string, string> configs = Configstate.connectorconfig (connname);
Connectorconfig Connconfig;
List<string> sinktopics = null;
if (Worker.issinkconnector (Connname)) {connconfig = new Sinkconnectorconfig (plugins (), configs);
Sinktopics = Connconfig.getlist (sinkconnectorconfig.topics_config);
else {connconfig = new Sourceconnectorconfig (plugins (), configs); Final list<map<string, string>> taskprops = Worker.connectortaskconfigs (con
Nname, Connconfig.getint (connectorconfig.tasks_max_config), sinktopics); Boolean changed = FALSE;
int currentnumtasks = Configstate.taskcount (connname); if (Taskprops.size ()!= currentnumtasks) {log.debug ("Change in connector task count from {} to {}, Writin
G Updated Task Configurations ", Currentnumtasks, Taskprops.size ());
changed = TRUE;
else {int index = 0; For (map<string, string> taskconfig:taskprops) {if!taskconfig.equals (configstate.taskconfig (New Connectortaskid (Connname, index)))
{log.debug ("Change in task configurations, writing updated task configurations");
changed = TRUE;
Break
} index++; } if (changed) {if (Isleader ()) {Configbackingstore.puttas
Kconfigs (Connname, taskprops); Cb.oncompletion (NULL, nULL); else {//We cannot forward the request on the same thread because this reconfiguration can happen a s a result of connector//addition or removal.
If we blocked waiting for the response from leader, we are kicked out of the worker group. Forwardrequestexecutor.submit (New Runnable () {@Override public void run () {try {String Reconfigurl = Restserver.urljoin (Leaderurl (),
"/connectors/" + connname + "/tasks");
Restserver.httprequest (Reconfigurl, "POST", taskprops, NULL);
Cb.oncompletion (null, NULL); catch (Connectexception e) {log.error ("Request to leader to reconfigure connector tasks
Failed ", e);
Cb.oncompletion (E, null); }
}
});
The catch (Throwable t) {cb.oncompletion (t, NULL); }
}
In addition to judging whether to restore the site and restore the operation of the scene, the most important thing is to put the task tasks into the Configbackingstore listening queue, so that the task can be monitored after the timely start, And if this broke is not our chosen leader, it forwards the task.
About the source section, this side we first briefly introduced, later have time to supplement.