Streaming SQL for Apache Kafka

Source: Internet
Author: User
Tags kafka streams

Ksql is a streaming SQL engine built based on the Kafka streams API , Ksql lowers the threshold for Ingress stream processing and provides a simple, fully interactive SQL interface for processing Kafka data. Ksql is an open source, distributed, extensible, reliable , and real-time component based on the Apache 2.0 license. supports a variety of streaming operations, including aggregation (aggregate), connection (join), time window, session, and so on. The two core concepts of Ksql are stream (stream) and tables (table)" See also: http://www.cnblogs.com/tgzhu/p/7660838.html" , integrated streams and tables that allow tables representing the current state to be connected to the stream representing the current event.

Ksql Project Introduction

In fact, Ksql is quite different from SQL in relational databases. Traditional SQL is an immediate one-time operation, whether the query or update is on the current dataset. Ksql queries and updates are ongoing , and the data set can be continually increased . In short, what Ksql is doing is actually a conversion operation, which is also a streaming process. The project is still in the developer preview stage, and under predictable conditions, Ksql has great potential for real-time monitoring, security detection, online data integration, application development, and more, as described below:

1, real-time monitoring real-time analysis     

 create  table  error_counts as   Select  error_code, count  (* ) from   Monitoring_streamwindow tumbling (SIZE  1   MINUTE)  where  Type =   "  ERROR    

One of these uses is to define custom business-level metrics that are calculated in real time and that you can monitor and alert, just like your CPU load. Another use is to define the concept of application correctness in ksql and check whether it will encounter this problem in the production process. Often, when we think about monitoring, we think of counters and gauges tracking low-level performance statistics. These types of gauges can often tell you that the CPU is heavily loaded, but they don't really tell you if your application is doing what it should do. Ksql allows custom metrics to be defined from the original event stream generated by the application, whether they are log events, database updates, or other types of events.

For example, a Web application might need to check every time a new customer registers for a popular email, creates a new user record, and their credit card is billed. These features may be distributed across different services or applications, and you may want to monitor everything that happens in the SLA for each new customer, such as 30 seconds.

2. Security and anomaly detection

 create  STREAM Possible_ Fraud as  select  card_number, count  (*   from   Authorization_attemptswindow tumbling (SIZE  5   SECONDS)  group  by   Card_number  having  count  (* ) >  3 ;  

Ksql the flow of events into time series data containing numeric values, which are displayed on the UI through visual tools to detect many security-threatening behaviors, such as fraud, intrusion, etc.

3. Online Data integration

 create  STREAM vip_users Span style= "COLOR: #0000ff" >as  select   UserID, page, action  from   Clickstream C  left  join  users u on  c.userid =  u.user_id  Span style= "COLOR: #0000ff" >where  u.level   =   " platinum   ' ;  

Most of the data processing will go through the process of ETL (extract-transform-load), and such a system is usually done through a timed batch operation to complete the data processing, but the time delay caused by batch operation is not acceptable at many times. By using Ksql and Kafka connectors, batch data integration can be transformed into online data integration. For example, through a stream-to-table connection, you can populate the data in the event stream with metadata stored in the data tables, or filter out sensitive information in the data before transferring it to other systems.

4. Application Development

For complex applications, it might be more appropriate to use the native Streams API of Kafka. However, for simple applications, or for people who do not like Java programming, Ksql is a better choice.

Ksql Architecture

    • Ksql is a standalone server, where multiple Ksql servers can form a cluster and can be dynamically added to the server instance.
    • The cluster has a fault-tolerant mechanism, and if one server fails, the other server will take over its work.
    • The Ksql command-line client initiates a query operation to the cluster through the REST API to view the information for the Stream and table, query the data, and view the status of the query.
    • Because it is built on the Streams API, Ksql also inherits the resiliency, state management, and fault tolerance of the Streams API, as well as the one-time (exactly once) semantics. These features are embedded in the Ksql server and add a distributed SQL engine, an automated bytecode generation mechanism for improving query performance, and a REST API for querying and managing.

The core abstraction in Ksql

Ksql uses the Kafka streams API internally, and they share the same core abstraction as the Kafka stream processing. Ksql has two core abstractions that map to two core abstractions in Kafka streams and allow you to manipulate Kafka topics:

1. Stream: The stream is an unrestricted structured data sequence ("facts"). Example: A financial trading flow, "Alice sent $100 to Bob, and Charlie sent $50 to Bob." The facts in the stream are immutable, which means that new facts can be inserted into the stream, but the existing facts are never updated or deleted. Streams can be created from Kafka themes, or derived from existing streams and tables.

CREATE BIGINT VARCHAR VARCHAR   with (kafka_topic='pageviews', value_format= ' JSON ');

2, table: A table is a view of a stream or another table that represents a collection of constantly changing facts. Example: A table with the latest financial information, "Bob's current account balance is $ $". It is equivalent to a traditional database table, but is enriched by the flow semantics of the stream. The facts in the table are mutable, which means that new facts can be inserted into the table, and existing facts can be updated or deleted. You can create tables from the Kafka topic, or you can derive tables from existing streams and tables.

 create  table  users (registertime bigint , gender varchar , RegionID varchar , userid varchar  )  with  (Kafka_ Topic=   "  Users   ", Value_format=   " delimited  " );  

Ksql simplifies streaming applications because it fully integrates the concepts of tables and streams, allowing you to connect to a table that represents the current state by using a stream that represents the events that now occur. A topic in Apache Kafka can be represented as a stream or table in Ksql, depending on the intended semantics of the subject processing. For example, if you want to read the data in a topic as a series of independent values, you can use the Create STREAM. An example of such a flow is capturing Page view events, where each page view event is irrelevant and independent of another Page view event. On the other hand, if you want to read the data in a topic as a collection of updatable values, you will use the Create TABLE. In Ksql, you should read an example of a topic that captures user metadata, where each event represents the latest metadata for a specific user ID, such as the user's name, address, or preference.  

Resources:

    • Https://github.com/confluentinc/ksql
    • https://www.confluent.io/product/ksql/
    • http://geek.csdn.net/news/detail/235801

Streaming SQL for Apache Kafka

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.