Window is a very important concept in Flink stream processing, and we will parse the concept of window related and the implementation of correlation. The content of this article is mainly focused on the package org.apache.flink.streaming.api.windowing
.
Window
A Window
collection that represents a finite object. A window has a maximum timestamp, which means that it represents a point in time-all elements that should enter the window have arrived.
The root window object of Flink is an abstract class that provides only an abstract method:
publicabstractlongmaxTimestamp();
Used to get the maximum timestamp. Flink provides a concrete implementation of two windows. When implemented Window
, subclasses should override and both equals
methods so that hashCode
logically two equal windows are considered to be the same.
Globalwindow
GlobalWindow
is a global window that is implemented as a singleton pattern . It maxTimestamp
is set to Long.MAX_VALUE
.
Inside the class is a serializer defined by a static class GlobalWindow
: Serializer
.
TimeWindow
TimeWindow
Represents a time interval window, which is reflected in the two properties that its constructor needs to inject:
- Start: The start of the time interval
- End: Cutoff of the time interval
TimeWindow
The time interval represented is [start, end]. Its maxTimestamp
implementation is:
publiclongmaxTimestamp() { return1; }
In its implementation of equals, in addition to the regular comparison (comparing references, comparing instances of Class), start
end
These two properties are also compared.
TimeWindow
The serializer is also implemented internally, which is primarily for start
and end
two properties.
Windowassigner
The window allocator for the element. Used to assign an element to one or more windows. The abstract class defines three abstract methods:
- Assignwindows: Assigns a timestamp
timestamp
element element
to one or more windows and returns a collection of Windows
- Getdefaulttrigger: Returns
WindowAssigner
the default trigger associated with
- Getwindowserializer: Returns
WindowAssigner
the serializer for the allocated window
Built-in Windowassigner
The entire type inheritance graph is as follows:
There are a lot of time-based windows here, and there are two concepts, respectively, 时间类型
and 窗口类型
:
Time Type:
- Eventtime: User-given custom timestamp (event timestamp)
- Processingtime: Local timestamp of the subtask host executing the current task (System time stamp)
Window type:
- Sliding: Sliding windows may overlap (an element may be in multiple windows)
- Tumbling: Non-overlapping windows (
assignWindows
typically returned in a method Collections.singletonList()
)
Globalwindows
The allocator corresponds to the window GlobalWindow
, which assigns all the elements to the same GlobalWindow
(essentially, GlobalWindow
there is only one instance). is GlobalWindow
implemented as a singleton mode, as is the case with the implementation method GlobalWindows
.
Method implementation:
- Assignwindows: The implementation of a method returns
GlobalWindow
a collection object that holds a single instance
- Getdefaulttrigger: The implementation is to return a non-action
NerverTrigger
Tumblingeventtimewindows
Returns a collection of stored-order instances based on the given window size, combined with event-time TimeWindow
. getDefaultTrigger
method returns EventTimeTrigger
an instance of the type.
Tumblingprocessingtimewindows
Returns a collection of stored-order instances, based on the size of the given window, combined with Processing-time TimeWindow
. It is important to note that this is based on the local timestamp of the host where the current task is running. getDefaultTrigger
method returns an instance of the ProcessingTimeTrigger
type.
Slidingprocessingtimewindows
The sliding window differs from the Tumbling window, which specifies a value in addition to the size of the window 滑动
slide
. The so-called sliding window can be understood like this: every 10 seconds in a minute. Here one minute is the window size, which is the sliding value every 10 seconds.
In the sliding window, assignWindows
the method returns is no longer a single window, but a collection of Windows. First, the number of Windows is computed: and size/slide
then the loop initializes the size
different window objects within the given window slide
.
Slidingeventtimewindows
A SlidingProcessingTimeWindows
window-like start
parameter is computed in a way that relies on the system timestamp.
Evictor
Evitor: Chinese as expulsion; it is used to exclude certain elements in a window.
It is time to reject the element: after the trigger is triggered, before the window is processed (apply windowfunction)
The interface defines only one method:
intintsizewindow);
The return value of the interface indicates the number of elements to reject.
Built-in Evitor
The Flink has three built-in implementations Evitor
:
- Timeevitor
- Countevitor
- Deltaevitor
Timeevitor
The Evitor is based on a given retention period (keep time) as the culling rule, which is broadly implemented as follows:
public int evict (iterable<streamrecord<object>> elements, int size, W window) {int toevict = 0 ; long currenttime = iterables.getlast (elements). Gettimestamp (); long evictcutoff = currenttime-windowsize; for (streamrecord<object> record:elements) {if (Record.gettimestamp () > Evictcutoff) {break ; } toevict++; } return toevict; }
The general logic is to take the timestamp of the last element as the "current" time, then subtract the desired "window size" and get a base timestamp (only those elements that are larger than the base timestamp) are taken.
Then, starting from the first element, the loop compares each element, if it is smaller than the base time stamp, it accumulates the reject statistic, and once the timestamp of an element is greater than the base timestamp, it jumps out of the loop and no longer accumulates (because the elements in the local window are time-ordered, which is guaranteed by the Flink runtime. If the timestamp is greater than the base timestamp from an element, then all subsequent elements satisfy this condition, so there is no need to loop anymore.
Countevictor
Capacity-based Evictor, which evict
size
determines how many elements should be rejected by the second parameter of the method. The specific implementation:
intintsizewindow) { if (size > maxCount) { return (int) (size - maxCount); else { return0; } }
Deltaevictor
Based on a given threshold threshold
and deltaFunction
to be judged. It is also a comparison between the current element and the last element to calculate the delta and threshold values.
Time
There is only one class in Flink that Time
defines the window's time interval. The time by default refers to the time in the execution environment. To create an Time
object, you need two parameters:
- Size: The amount of time interval (numeric)
- Unit:
TimeUnit
An instance of the units that represents the time interval
Many of the static methods provided by this class provide settings for different unit.
Trigger
A Trigger (trigger) is used to determine when a window's collection of elements triggers a calculation and when the result is emit.
In coarse-grained terms, Flink provides three types of triggering methods:
- by element
- by system time
- By event Time
This is reflected in the three main abstract methods of trigger:
- Onelement: Triggered for each element, this is primarily for element-based triggers, such as what we'll see later
CountTrigger
- Onprocessingtime: Triggered by processing-time(flink system time stamp) timer
- Oneventtime: Triggered by event-time(event timestamp) timer
All of these methods have a common parameter: TriggerContext
.
Triggercontext
As the name implies, it provides context information when the trigger executes, but it is just Trigger
the internal interface:
- Getcurrentwatermark: Returns the current
watermark
- Registerprocessingtimetimer: Registers a timer for a system time, triggers
onProcessingTime
- Registereventtimetimer: Registers an event time timer, triggering
onEventTime
- Deleteprocessingtimetimer: Timer to remove system time
- Deleteeventtimetimer: Timer to delete event time
- Getpartitionedstate: Interface for acquiring state for failed recovery
Among them, the REGISTERXXX/DELETEXXX mode is mainly for the above two time -based triggers. And the last method is getKeyValueState
also very important, because it is used to get the state of the window, for example, some of the triggers are dependent on some context state, and those states are relying on this method.
Trigerresult
Trigger
After the three triggering methods defined in the method are called, the final result is to return an outcome to determine the behavior that occurs after the trigger (for example, to call the window function or discard the Windows), which is expressed by the definition trigger trigger behavior TriggerResult
. It is an enumeration type with so many enumerated values:
- The Fire:window will be evaluated using the window function and then emit the result, but the element is not cleaned and still in the window
- PURGE: Clears the elements in the window
FIRE_AND_PURGE
: Simultaneous FIRE
behavior with PURGE
two types of attributes
- CONTINUE: No action is done
Built-in Trigger
Flink built-in implementation of a number of triggers, the complete class diagram is as follows:
These triggers have some commonality, as explained here:
- Because Flink
Trigger
has previously encapsulated callbacks of various trigger types into different methods (OnXXX), the core logic of the subsequent various types of triggers will be primarily in their particular related onXXX methods, and the unrelated OnXXX method will return directly TriggerResult.CONTINUE
( In fact, personally think this design method is not appropriate, because it is not conducive to expansion )
- Because there are a number of trigger types that depend on certain state values of the context (such as the typical continuousxxxtrigger below), these state values are
TriggerContext
accessed through the getPartitionedState
method
Eventtimetrigger
Trigger based on event time, corresponding toonEventTime
Processingtimetrigger
Triggers based on the current system time, corresponding toonProcessingTime
Continuouseventtimetrigger
The trigger is a trigger that is continuously triggered at the specified time interval based on the event time, and its first trigger depends on Watermark
. The first-time trigger judgment is in onelement
, which registers the next (and first) time the eventtime timer is triggered, and then identifies its primary
status as false. The implementation is as follows:
publiconElementlong timestamp, W window, TriggerContext ctx) throws Exception { ValueState<Boolean> first = ctx.getPartitionedState(stateDesc); if (first.value()) { long start = timestamp - (timestamp % interval); long nextFireTimestamp = start + interval; ctx.registerEventTimeTimer(nextFireTimestamp); first.update(false); return TriggerResult.CONTINUE; } return TriggerResult.CONTINUE; }
A continuous trigger relies on the onEventTime
timer that keeps registering the next trigger in:
publiconEventTime(long time, W window, TriggerContext ctx) { ctx.registerEventTimeTimer(time + interval); return TriggerResult.FIRE; }
Continuousprocessingtimetrigger
A trigger that is continuously triggered at a specified time interval based on the system time, and is based on the saved state value fire-timestamp
to determine if a trigger is required, although its cyclic registration process is in progress onElement
.
Counttrigger
Triggered based on a given cumulative value, since the accumulated value is not based on time but on element-based, all of its triggering mechanisms are implemented in onElement
, the logic is simple, the first summation if greater than the given threshold is triggered:
publiclong timestamp, W window, TriggerContext ctx) throws IOException { count = ctx.getPartitionedState(stateDesc); longcount1; count.update(currentCount); if (currentCount >= maxCount) { count.update(0L); return TriggerResult.FIRE; } return TriggerResult.CONTINUE; }
Purgingtrigger
The trigger is similar to a wrapper that transforms any given trigger into a purging trigger. Its implementation mechanism is that it receives a trigger instance, then executes the corresponding onxxx of the instance on each onxxx callback and obtains TriggerResult
the instance, makes the corresponding judgment, and finally returns the FIRE_AND_PURGE
enumeration value.
Deltatrigger
Based on DeltaFunction
a given threshold trigger, the trigger calculates a delta value between the last arriving element and the current element compared to a given threshold, which is triggered if it is higher than the given threshold value. Because it is element-based, the main logic is implemented in onElement
.
Summary
This article is also focused on the analysis of the concept of the window, for the moment they are not too much relevance, which we will later analyze how they relate to implement the complete window mechanism.
Scan code Attention public number: Apache_flink
Apache Flink Source Parsing Stream-window