High performance complex event processing---pattern matching

Last Update:2018-08-12 Source: Internet

Author: User

Tags new set uppercase letter

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Original thesis: High-performancecomplex Event processing over Streams

The algorithm of this paper is based on Sase language. The language is not described in detail, and only the pattern matching algorithm is described here. 1. Method based on query plan 1.1 basic query plan

The query plan in Sase consists of a subset of 6 operators:

Ø sequence Scanning (sequencescan)

Ø sequence structure (sequenceconstruction)

Ø selection (selection)

Ø window (Windows)

Ø Negation (negation)

Ø conversion (transformation)

Consider a specific example of the query Q3:

In this query, A, B, C, and D represent four different event types. The WHERE clause contains 3 predicates: (1) Two equivalent tests, ATTR1 and ATTR2 are properties common to a, B, C, and D, (2) a simple predicate for a Type event property ATTR3, (3) a predicate that compares the ATTR4 properties of A and B-type events. The capital letter T represents the specified window size.

Figure 1 shows the basic query plan for Q3 and an example of an event flow.

Figure 1 Query Q3 's execution plan

In the event flow below Figure 1, the lowercase letter represents the event, the corresponding uppercase letter represents the event type, and the number below each event is the timestamp of the event. The rounded rectangle represents the operators in the query plan, from the bottom up, which are in turn:

sequence Scanning and construction (Sequence Scan and Construction, abbreviated to SSC)

Sequential scans and sequence constructs are always used together. For queries that use the SEQ structure, the SSC processes the positive components of SEQ, which form a "subsequence type" in the original seq. For example, Q3 's subsequence type is (A,B,D), which deletes the "! C ".

The SSC transforms the event flow into an event flow sequence, and each sequence of events represents a unique SSC subtype type match. In Figure 1, the SSC output creates 7 sequence of events.

The SSC contains the sequence scan operator (ss->), which scans the event flow, detects corresponding matching sequences in the subsequence type, and also includes a sequence constructor operator (sc<-) to create all sequence of events. They will be described in detail later.

selection (Selection)

The selection operator filters each sequence of events by using all predicates, and the sequence output satisfies the condition. In Figure 1, the selection operator filters out 3 of the 7 input event sequences.

window (Windows)

The window operator adds the constraint of the within clause to the pattern match. For each sequence of events, it checks whether the difference in the timestamp of the first and last events is less than the window size T. In Figure 1, the value of T is 6, and the second input event sequence is filtered out.

negation (negation)

The negation operator handles the negative component in the SEQ structure, which is the component that is ignored by SSC. In Figure 1, for each event sequence that is entered, the negation operator checks whether there is a C event in the middle of the sequence B and D events, and C has the same property as B (again, this property is the same as a and D). If such c exists, the sequence of events is deleted. In Figure 1, the second sequence of input events is filtered out.

conversion (transformation)

Finally, the conversion operator converts each sequence of events into a composite event by connecting the properties of all the events in the input sequence.

In the rest of the section, we elaborate on the implementation of SSC and negation operators. 1.2 sequence scanning and construction

sequence Scanning (ss->)

For each SSC subtype, an NFA is created by mapping successive event types to successive NFA states. For example, Figure 2 shows the NFA created by the subtype (A,B,D), and state 0 is the starting state. State 1 identifies the state after a event, state 2 identifies the state after event B, and state 3 identifies the state after event d. State 3 is drawn in a two-layer circle, indicating that it is an NFA "accept state (that is, final State)" (this state is only one). Note that State 1 and state 2 contain a circular arrow with a wildcard "*". Given an event, this state allows the NFA to iterate over the state, or, of course, a state transition at the same time.

Fig. 2 sequence scanning and construction based on NFA

To keep track of these simultaneous states, we use the Run-time stack to record the set of States of activity generated at a particular point in time, and to record how this set causes a new set of active states to occur when an event arrives. Figure 2 shows the Run-time stack (left to right), below which is the event flow. Each active state instance in the stack has one or two forward pointers that point to which state the state comes from. Whenever a event arrives, state 0 is activated to initialize a new search.

sequence Constructs (sc<-)

During a sequence scan, once the acceptance state is reached, the sequence construct is invoked to create the sequence of events. One method of sequence construction is to extract a single source, directed Acyclic Graph,dag from the runtime stack, which begins with the receiving State of the rightmost unit of the stack and traverses the forward pointer until it reaches the starting state. In Figure 2, bold numbers and arrows identify such dag, which are generated when they arrive. The sequence of events can be obtained by enumerating all possible paths from the source point of the DAG to the endpoint. For each path, the edges of the two instances that connect the same state (that is, the "from Loop Edge") are removed, and the remaining edges produce a unique sequence of events. Figure 2 shows the three sequence of events created by DAG.

The complexity of a simple algorithm for searching dag is O (P). P is the number of paths in the Dag, and the worst-case scenario will be the number-level. We use a depth-first search to improve the complexity of O (e), and E is the number of DAG. Because each active state instance has up to two pointers, the upper limit of E is O (2LS), L is the length of the subsequence type (that is, the number of States in the NFA), and S is the number of events. In fact, S can be set to the window size W, so the window's condition is dynamically checked in the DAG search, so the complexity is O (2LW). 1.3 Negation

As mentioned earlier, the negation operator (NG) handles negative components that are ignored by SSC in the SEQ structure. For each input event sequence, Ng performs two tasks on each negative component: (1) Check that the event specified in the negative component appears in a specific time period, (2) If such an event exists, check that it satisfies all relevant predicates. Any event that passes both checks identifies the current sequence of events as false. In the following sections, we focus on the editing and run-time support of a task (1). Support for the task (2) is intuitive and we no longer discuss it further.

When editing, the time period of the task (1) is generated as follows: for the sequence (A,! B,C), the time period is defined as (A.timestamp, c.timestamp); for sequences (! A,B), uses the window size T to define the time period (B.timestamp-t,b.timestamp), for the sequence (A,! b) processing some special, given the window size T, the query does not allow a event after the T-time occurrence of B events, then the time period is (A.timestamp, a.timestamp+t). 2. Optimization technology

2.1 Optimization sequence scanning and construction

The algorithm we described earlier is extremely inefficient when you use a large window. To do this, we use an auxiliary data structure, Active Instance Stack (AIS), to facilitate sequence construction. The algorithm is described below.

Sequence Scan

In a sequence scan, the NFA executes the same way as before. In addition, an AIS is created in each NFA state to store events that trigger the transition to the current state; Such an event is the active instance of the current state (active instance). corresponding to Figure 2, Figure 3 shows the contents of three AIS. In each stack, from top to bottom, the active instance (bold character identifier) indicates the order in which they occurred. From left to right, between two adjacent stacks, we use an extra field for each activity instance e, which stores the most recent instance (most recent instance in the previous stack, abbreviated to rip) in the previous stack when E arrives. As an example of an active instance in the B stack, the most recent instance in the a stack is, so the RIP field is set to. The RIP field means that if the sequence needed to be created, any previous instances (that is) that appear in the a stack should match.

Figure 3 SSC using AIS

Sequence Construction

There is no need to use a pointer record path in this method, and the arrows in the figure are just for easy viewing of the path. When searching from the beginning, because the Rip field is, it is necessary to match all the B events that occurred in the previous stack, that is, and.

2.2 Downward push predicate

To reduce the size of the intermediate result set, we put the conditional judgment of the predicate into the SSC.

2.2.1 Put an equivalence test into the SSC.

Equivalence testing actually plays a role in "grouping". An equivalence test divides a large event stream into smaller event streams, and events in each group have the same property values. An intuitive solution is to group the event streams first and then execute the query plan for each group. For better performance, we use an advanced technique called PAIS (partitioned Active Instance Stack), which has two advantages: (1) creating multiple groupings at the same time, creating an AIS sequence for each group during the sequence scan phase; (2) There is no additional overhead (that is, the cost of grouping) for those types of events unrelated to the query.

The basic idea of Pais is that in each state, a property based on the equivalence test groups the active instance (active Instance) and creates an AIS in the same group. Also, a stack of states must be connected to the stack in the previous state in the same group (using the 2.1 section algorithm). Figure 4 shows an example of this. The equivalence test placed in SSC is based on attributes. The property values for each event are shown below Figure 4.

Figure 4 PAIS

The PAIS algorithm makes two modifications to the AIS algorithm:

1 attribute-based transformation filtering: In any non-starting state, when the NFA decides to convert the current event (for example, from the state 1 pair), Pais the property value of the equivalent test from the current event (obtaining ' 2 '), and then checks the corresponding grouping of the current state (the group ' 2 ' of State 1) Whether the AIS is empty. If the AIS is Non-empty, the previous event with an equal attribute value is present, so it is necessary to convert to a new state, otherwise (the 1 of the Group 2 of the AIS is empty), the current event is discarded.

2) The maintenance of the stack: once the conversion is made, the current event is added to the new state of AIS (added to the stack of Group 2 in State 2), and its RIP field is set to the last instance in the corresponding group in the previous state (the Rip field is set to).

With Pais, you only need to perform sequence construction on the same grouped stack, resulting in a much reduced number of results. In Figure 4, only one sequence of events was generated for the execution of the sequence construct, and the previous method produced three.

2.2.2 Put multiple equivalence tests into SSC.

A query can contain multiple tests, and if the equivalence test is placed in a SSC, the number of intermediate results can be further reduced. An extension of the PAIS algorithm is to create multiple attribute groupings and create an AIS sequence for each group. Here, we propose two alternative methods.

2.2.2.1 Eager filtering in ss->

The first method, called Multi-pais, puts all the equivalence tests in the sequence scan to filter out more events in the "conversion filtering" phase of the PAIS algorithm. We consider a simple subsequence type (A, B) and two equivalent test attributes and. Figure 5 shows this example.

Fig. 5 Multi-packet ais (Multi-pais)

In each NFA state, a PAIS is created for a property. To understand the contents of these stacks, we describe how stacks are structured:

1 cross-attribute conversion filtering: In each non-starting state, when the NFA decides to convert the current event (for example, from the state 1 pair), it is performed in two steps:

(1) For each attribute, find the current state (state 1 corresponding), according to the value of the property to obtain the corresponding stack (,);

(2) The intersection of all (the result is).

If the intersection is non-null, it means that there is a previous event that is equal to the property values of all tests for the current event.

2 Multi-Stack maintenance: In the new state after the conversion, according to the value of the current event, add it to the appropriate stack. For example, in state 2, the stack that is added to the group ' 1 ' is also added to the stack of the group ' 3 '.

When you construct a sequence of execution, you get two sequence of events in, although there is only one correct sequence of results in the end. Therefore, the selection operator also needs to further filter out the remaining 3 sequences.

2.2.2.2 Dynamic filtering in sc<-

The second method, called Dynamicfiltering, puts an equivalence test into a sequence scan and puts other equivalence tests into the sequence structure. When searching for Dag in AIS, other equivalent tests are performed. Dynamic filtering does not filter out a lot of events in a sequence scan compared to Multi-pais, so there are more instances in the stack, but it does not need to incur overhead on "cross-attribute conversion filtering" and "multi-stack maintenance."

Sase can also place simple predicates (predicates that apply to a single event, for example) into a sequence scan.

2.3 Optimized query plan

Figure 6 Optimized query plan

Figure 6 is the query plan after the optimization technique, which has the following differences compared to the basic query plan:

Ø The window operator is placed in the ss-> and sc<-;

Ø The equivalence test was put into the ss->;

Ø simple predicates are put into the ss->;

Ø The equivalence test was put into the sc<-.

In addition, Figure 6 shows an event flow, and the optimized query plan produces only two sequence of events (the base query plan produces 7), so the number of intermediate results is significantly reduced.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More