International - English

Cart Console

Topic Center

Contact Sales

Home > Others

Streaming SQL in calcite

Last Update:2017-01-18 Source: Internet

Author: User

Tags benchmark

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The general design of flow-type SQL in calcite flow sqlcalcite

The overall syntax should be compatible with SQL, which is consistent with the current trend of stream processing SQL.
If some of the features are not included in standard SQL, try to use the industry benchmark (Oracle). For example, the function of pattern matching, the current stream processing has not yet reached a consensus on the syntax, then in the design, the use of Oracle Data Warehouse match recognize way. There are also sliding window functions.
If there is no current industry benchmark, then the function of the way to expand, tumbling windows and bounce windows, these two windows are not included in standard SQL, so the use of ceil,tumble,hop functions such as the way to achieve the function.
The general idea is to do as little as possible on the basis of compatible standard SQL, to ensure that semantics are consistent with standard SQL and to minimize the syntax of privatization .

Calcite StreamSQL Description

Explicitly defined in the DDL schema is a stream or table, TODO: No examples on official website, to be supplemented
For example, there are three schemas:
Orders (Rowtime, ProductId, orderId, Units)-both the table and the stream
Products (Rowtime, productId, name)-table
Shipments (Rowtime, orderId)-Flow
If the query contains the Stream keyword, it is a streaming query, if not included, is a table query. A table query can return results immediately and end, and streaming queries will only output results but not end.
For example, here is a sample stream query:

SELECT STREAM *FROM Orders;  rowtime | productId | orderId | units----------+-----------+---------+------- 10:17:00 |        30 |       5 |     4 10:17:05 |        10 |       6 |     1 10:18:05 |        20 |       7 |     2 10:18:07 |        30 |       8 |    20 11:02:00 |        10 |       9 |     6 11:04:00 |        10 |      10 |     1 11:09:30 |        40 |      11 |    12 11:24:11 |        10 |      12 |     4

Table Query Example:

SELECT *FROM Orders;  rowtime | productId | orderId | units----------+-----------+---------+------- 08:30:00 |        10 |       1 |     3 08:45:10 |        20 |       2 |     1 09:12:21 |        10 |       3 |    10 09:27:44 |        30 |       4 |     24 records returned.

Stream and table queries cannot be mixed, or they will be an error

SELECT * FROM Shipments;ERROR: Cannot convert stream ‘SHIPMENTS‘ to a tableSELECT STREAM * FROM Products;ERROR: Cannot convert table ‘PRODUCTS‘ to a stream

Other filtering, sorting, having, and so on, are consistent with standard SQL, no longer an example. The
subquery only needs to write the Stream keyword in the outer statement, and the inner layer writes invalid.
such as:

select  STREAM rowtime, productId from  (select  tumble_end (rowtime, interval   ' 1 '  hour ) as  Rowtime, ProductId, count  (*) as  C, sum  (Units) as  su from  Orders group  by  tumble (rowtime, interval   ' 1 '  hour  ), productId) where  C > 2  or  su > 10 ; Rowtime | ProductId----------+-----------10:00:00 |        30 11:00:00 |        10 11:00:00 | 40

Window function Description Tumbling window (Tumbling window)

Data does not overlap between two windows.

SELECT  STREAM Ceil (rowtime to  hour ) as  rowtime, productId, count  (*) as  C, sum  (Units) as  Unitsfrom  ordersgroup  by  Ceil (rowtime to  hour ), productId;  Rowtime | ProductId | C | Units----------+-----------+---------+------- 11:00:00 | 30 | 2 | 24 11:00:00 | 10 | 1 | 1 11:00:00 | 20 | 1 | 7 12:00:00 | 10 | 3 | 11 12:00:00 | 40 | 1 |

The example outputs the statistical results of this hour at the end of each hour, 11 o'clock, and outputs 1 points. Event-driven, internal timer not included.
The following example is equivalent to the above example

 SELECT STREAM tumble_end (rowtime, INTERVAL ' 1 ' HOUR)  as Rowtime, ProductId, COUNT(*) as C, SUM(Units)  as units from  OrdersGROUP by Tumble (Rowtime, INTERVAL ' 1 ' HOUR), productId;  Rowtime |       ProductId | C | Units----------+-----------+---------+-------11:00:00 |       30 |    2 |        24 11:00:00 |       10 |     1 |        1 11:00:00 |       20 |     1 |        7 12:00:00 |       10 |    3 |        11 12:00:00 |       40 |    1 | 12

Another example, it takes less than half an hour to output a result, to 12 minutes for the alignment time,

 SELECT STREAM tumble_end (rowtime, INTERVAL ' MINUTE, time  ' 0:12 ')  as Rowtime, ProductId, COUNT(*)  as C, SUM(Units)  as units< C12>from OrdersGROUP  by Tumble (Rowtime, INTERVAL ' MINUTE ,  time ' 0:12 '), productId;Rowtime |       ProductId | C | Units----------+-----------+---------+-------10:42:00 |       30 |    2 |        24 10:42:00 |       10 |     1 |        1 10:42:00 |       20 |     1 |        7 11:12:00 |       10 |     2 |        7 11:12:00 |       40 |    1 |        12 11:42:00 |       10 |     1 | 4

Bounce window (HOP window)

There is a certain overlap between the data in the two windows.
Bounce windows are generalized tumbling windows that allow data to be stored in a window for longer periods of time.
For example, the data output time is 11:00, but it also contains data from 08:00 to 11:00, and 09:00 to 12:00, and an input row for three output lines.

 SELECT STREAM hop_end (rowtime, INTERVAL ' 1 ' HOUR, INTERVAL ' 3 '  HOUR)  as Rowtime, COUNT(*)  as C, SUM(Units)  as units< C13>from OrdersGROUP  by HOP (Rowtime, INTERVAL ' 1 ' HOUR, INTERVAL ' 3 ' HOUR);Rowtime | C | Units----------+----------+-------11:00:00 |    4 |        27 12:00:00 |    8 | 50

Sliding window (sliding window)

The sliding window in calcite uses the standard over mode, which directly applies the analytic functions in standard SQL.

SELECT STREAM rowtime,  productId,  units,  SUM(units) OVER (ORDER BY rowtime RANGE INTERVAL ‘1‘ HOUR PRECEDING) unitsLastHourFROM Orders;

The following example shows the average order size for the past 10 minutes and the comparison data for last week's average order

select  STREAM *FROM  (select  STREAM rowtime, productId, units, avg  (Units) over product (RANGE interval   ' 10 '  minute  preceding) as  M10, avg  (Units) over product (RANGE interval   ' 7 '  day  preceding) as  D7 Span class= "Hljs-keyword" >from  Orders WINDOW product as  (order  by  rowtime PARTITION by  productId)) where  M10 > D7;

In this example, the window clause is used to define the partial windows, and then the refinement is done in each of the over clauses. For the first time, you can also define all the windows in the window clause.
In this way, two tables, 10-minute and 7-day window data are maintained in the background. You can access these tables directly and do not need to make the displayed queries.

This syntax can also implement some other features:
* Row Group window
* References to rows that have not yet arrived, the stream waits until they arrive.
* Can support other rank and other statistical analysis functions

Cascade Window (Overlay window)

The following query shows a scenario that returns the statistical results for each record, but the result is reset at a fixed time.

SELECT STREAM rowtime,  productId,  units,  SUM(units) OVER (PARTITION BY FLOOR(rowtime TO HOUR)) AS unitsSinceTopOfHourFROM Orders;

This is similar to a sliding window query, but the monotone expression occurs in the partition by clause. As time goes from 10:59:59 to 11:00:00,floor from 10:00:00 to 11:00:00, a new grouping starts to occur. The consolidated result of sum begins to reset.
Calcite knows that the old grouping will never be used again, so all the statistical results of that grouping are removed from the internal store.

Row Group window

Using the window syntax and over mode can be done.

Monotonous and quasi-monotonous

This is the concept put forward by the author in the StreamSQL of calcite.
If a column or expression is incremented or decremented, it becomes monotonous.
If a column or expression is out of order and there is a mechanism (such as a punctuation mark or watermark) to generate a particular value that will never be seen, then the column or expression is quasi-monotonic.
The concept is very south, but in fact it is required that the flow of data is globally ordered. This can be the order of events, or the order of event IDs. In general, we automatically make time for events.
With this order, we can easily implement the watermark function.

Description of the Stream and table

create  VIEW  Hourlyordertotals (Rowtime, ProductId, C, su) as  select  tumble_end (rowtime, interval   ' 1 '  hour ), productId, count  (*), sum  (Units) from  Orders group  Span class= "Hljs-keyword" >by  tumble (rowtime, interval   ' 1 '  hour ), productId;  SELECT STREAM rowtime, productId from hourlyordertotalsWHERE C > 2 OR su > ten;Rowtime | ProductId----------+-----------10:00:00 |        30 11:00:00 |        10 11:00:00 | 40

Look at the view above, is this a table or a stream?
Because it doesn't use the stream keyword, it's bound to be a relationship, a table, but it's a table that can be converted to a stream . You can use it in queries for flows and relationships.
And it's equivalent to the query:

With Hourlyordertotals (Rowtime, ProductId, C, Su) as ( SELECT tumble_end (rowtime, INTERVAL ' 1 ' HOUR), productId, COUNT(*), SUM  (units)  from Orders GROUP  by Tumble (Rowtime, INTERVAL ' 1 ' HOUR), productId) SELECT STREAM rowtime, ProductId from hourlyordertotalsWHERE C > 2 OR su > C18>10;Rowtime | ProductId----------+-----------10:00:00 |        30 11:00:00 |        10 11:00:00 | 40

This approach is not limited to subqueries and views, and each query in streaming SQL is defined as a relational query and is converted to a stream using the Stream keyword in the topmost select clause .

The join on the stream is divided into two types, the join of the stream and the table, and the join of the stream and the stream
A join on a stream is actually a join of a window and a window, or a join between a window and a table, essentially a join between tables, because a window is a table .
For example, the following join between a stream and a table

  SELECT STREAM o.rowtime, O.productid, O.orderid, O.units, P.name, P.unitprice from Orders as
       oJOIN products  as P  on o.productid = P.productid;Rowtime | ProductId | OrderId | Units | name | UnitPrice----------+-----------+---------+-------+ -------+-----------10:17:00 |       30 |     5 | 4 |        Cheese |        17 10:17:05 |       10 |     6 | 1 |      Beer |        0.25 10:18:05 |       20 |     7 | 2 |         Wine |        6 10:18:07 |       30 |    8 | 20 |        Cheese |        17 11:02:00 |       10 |     9 | 6 |      Beer |        0.25 11:04:00 |      10 |     10 | 1 |      Beer |        0.25 11:09:30 |      40 |    11 | 12 |       Bread |        100 11:24:11 |      10 |     12 | 4 |      Beer | 0.25

Order is a stream, and products are tables. After two joins the result is definitely a stream, and then, because there is no window, by default it should be a window that only holds the current data for a length of 1, and the current order data and products do join.
The join of streams and streams is as follows:

SELECT  STREAM o.rowtime, O.productid, O.orderid, s.rowtime as  shiptimefrom  Orders as  ojoin  Shipments as  s on  o.orderid = S.orderid and  s.rowtime between O.rowtime and  o.rowtime + interval   ' 1 '  hour   ; Rowtime | ProductId | OrderId |        Shiptime----------+-----------+---------+---------- 10:17:00 |       30 | 5 |        10:55:00-10:17:05 |       10 | 6 |        10:20:00-11:02:00 |       10 | 9 |        11:58:00-11:24:11 |      10 | 12 | 11:44:00

There is no explicit definition window in this query, but the data range is actually locked by the where condition. In other words, the data is automatically saved in a window.

DML statements

You can use the CREATE VIEW statement for creating views, which you can already see, and you can use the Insert as Select method to import data from the stream into other streams.
Like what:

CREATE VIEW LargeOrders ASSELECT STREAM * FROM Orders WHERE units > 1000;INSERT INTO LargeOrdersSELECT STREAM * FROM Orders WHERE units > 1000;

You can also maintain window data through Upsert statements.

UPSERT INTO OrdersSummarySELECT STREAM productId,  COUNT(*) OVER lastHour AS cFROM OrdersWINDOW lastHour AS (  PARTITION BY productId  ORDER BY rowtime  RANGE INTERVAL ‘1‘ HOUR PRECEDING)

The development plan has been completed

SELECT on stream, WHERE, GROUP by, have, UNION all, ORDER by
Floor and Ceil functions
Monotonicity (monotonicity)
Disabling streaming result Sets
Like what:

    SELECT STREAM * FROM (VALUES (1, ‘abc‘));    ERROR: Cannot stream VALUES

Not completed

Joins of streams and streams
Joins of streams and tables
View-based streaming
The union All (stream merge) on the stream that contains the order by
A relational query on a stream
Window aggregation on a stream (sliding window and Cascade window)
Ignore stream keywords in views and subqueries
The order by on the stream cannot contain offset and limit
Run time check if there is enough history data to query
Quasi-monotonic – a mechanism is required to declare that once the data has been calculated, it is no longer updated, or after the calculation has been completed, if the latest results are refreshed, then updated. such as the watermark function.
HOP and tumble functions, as well as auxiliary hop_start, Hop_end, Tumble_start, tumble_end functions

Reference:
Http://calcite.apache.org/docs/stream.html

Streaming SQL in calcite

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

in sql sql functions in sql server replace apostrophe in sql alphanumeric sorting in sql case in sql w3school cast command in sql ceil function in sql

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Streaming SQL in calcite

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support