Kafka-sql engine sharing 1. Overview
In most cases, we use Kafka just as message processing. In some cases, we need to read the data in the Kafka cluster multiple times. Of course, we can do this by invoking the Kafka API, but for different business requirements, we need to write a series of different interfaces, compiled, packaged, released, etc. Finally, we can see the results we expected. So, can we have an easy way to implement this part of the functionality, by writing SQL to visualize our results. Today, I want to share some ideas with you, by using the form of SQL to complete these requirements.
2. Content
The architecture and ideas for implementing these functions are not complex. Here the author will be the entire implementation process, through a schematic diagram to present. As shown in the following:
Here I give you a detailed description of the meaning of the message data source storage and Kafka cluster, open low-order and high-order two consumer threads, the results of consumption in the form of RPC shared out (i.e., the requestor). Once the data is shared, the reflux goes through the SQL engine, translating the in-memory data into SQL Tree, where the Apache calcite project is used to take part. We then respond to the SQL request of the Web Console through the Thrift protocol, and finally return the results to the front end, which is visualized with the implementation of the chart.
3. Plug-in configuration
Here, we need to follow calcite JSON Models, for example, for Kafka clusters, we need to configure the content:
{ version: ' 1.0 ', defaultschema: ' Kafka ', schemas: [ { name: ' Kafka ', type: ' Custom ', Factory: ' Cn.smartloli.kafka.visual.engine.KafkaMemorySchemaFactory ', operand: { database: ' kafka_db ' } } ]}
In addition, it is best to make a statement on the table, the configuration content is as follows:
[ { "table": "Kafka", "schemas": { "_plat": "varchar", "_uid": "varchar", "_tm": " VarChar ", " IP ":" varchar ", " country ":" varchar ", " City ":" varchar ", " location ":" Jsonarray " } }]
4. Operation
Below, I show you how to operate the relevant content through SQL. The correlation is as follows:
At the enquiry point, fill in the relevant SQL query statement. Click on the Table button to get the results shown below:
We can export the results obtained in the form of a report.
Of course, we can browse the query history and the currently running Query task under the profile module. As for the other modules, are the auxiliary function (display cluster information, Topic Partition information, etc.) here will not be more than repeat.
5. Summary
Analysis down, the overall structure and implementation of the ideas are not too complicated, there is no too much difficulty, you need to pay attention to some implementation of the details, such as consumer API for the cluster message parameter adjustment, especially the low-order consumption API, especially need to pay attention to the size of its fetch_size, and offset It needs our own maintenance. When using calcite as the SQL tree, we have to follow its JSON Model and standard SQL syntax to manipulate the data source.
6. Concluding remarks
This blog is to share with you here, if you study in the process of learning what is the problem, you can add groups to discuss or send e-mail to me, I will do my best to answer for you, with June encouragement!
Kafka-sql engine