Apache Flink Source Parsing Stream-window

Source: Internet
Author: User
Tags emit apache flink

Window is a very important concept in Flink stream processing, and we will parse the concept of window related and the implementation of correlation. The content of this article is mainly focused on the package org.apache.flink.streaming.api.windowing .

Window

A Window collection that represents a finite object. A window has a maximum timestamp, which means that it represents a point in time-all elements that should enter the window have arrived.

The root window object of Flink is an abstract class that provides only an abstract method:

    publicabstractlongmaxTimestamp();

Used to get the maximum timestamp. Flink provides a concrete implementation of two windows. When implemented Window , subclasses should override and both equals methods so that hashCode logically two equal windows are considered to be the same.

Globalwindow

GlobalWindowis a global window that is implemented as a singleton pattern . It maxTimestamp is set to Long.MAX_VALUE .

Inside the class is a serializer defined by a static class GlobalWindow : Serializer .

TimeWindow

TimeWindowRepresents a time interval window, which is reflected in the two properties that its constructor needs to inject:

    • Start: The start of the time interval
    • End: Cutoff of the time interval

TimeWindowThe time interval represented is [start, end]. Its maxTimestamp implementation is:

    publiclongmaxTimestamp() {        return1;    }

In its implementation of equals, in addition to the regular comparison (comparing references, comparing instances of Class), start end These two properties are also compared.

TimeWindowThe serializer is also implemented internally, which is primarily for start and end two properties.

Windowassigner

The window allocator for the element. Used to assign an element to one or more windows. The abstract class defines three abstract methods:

    • Assignwindows: Assigns a timestamp timestamp element element to one or more windows and returns a collection of Windows
    • Getdefaulttrigger: Returns WindowAssigner the default trigger associated with
    • Getwindowserializer: Returns WindowAssigner the serializer for the allocated window
Built-in Windowassigner

The entire type inheritance graph is as follows:

There are a lot of time-based windows here, and there are two concepts, respectively, 时间类型 and 窗口类型 :

Time Type:

    • Eventtime: User-given custom timestamp (event timestamp)
    • Processingtime: Local timestamp of the subtask host executing the current task (System time stamp)

Window type:

    • Sliding: Sliding windows may overlap (an element may be in multiple windows)
    • Tumbling: Non-overlapping windows ( assignWindows typically returned in a method Collections.singletonList() )
Globalwindows

The allocator corresponds to the window GlobalWindow , which assigns all the elements to the same GlobalWindow (essentially, GlobalWindow there is only one instance). is GlobalWindow implemented as a singleton mode, as is the case with the implementation method GlobalWindows .

Method implementation:

    • Assignwindows: The implementation of a method returns GlobalWindow a collection object that holds a single instance
    • Getdefaulttrigger: The implementation is to return a non-actionNerverTrigger
Tumblingeventtimewindows

Returns a collection of stored-order instances based on the given window size, combined with event-time TimeWindow . getDefaultTriggermethod returns EventTimeTrigger an instance of the type.

Tumblingprocessingtimewindows

Returns a collection of stored-order instances, based on the size of the given window, combined with Processing-time TimeWindow . It is important to note that this is based on the local timestamp of the host where the current task is running. getDefaultTriggermethod returns an instance of the ProcessingTimeTrigger type.

Slidingprocessingtimewindows

The sliding window differs from the Tumbling window, which specifies a value in addition to the size of the window 滑动 slide . The so-called sliding window can be understood like this: every 10 seconds in a minute. Here one minute is the window size, which is the sliding value every 10 seconds.

In the sliding window, assignWindows the method returns is no longer a single window, but a collection of Windows. First, the number of Windows is computed: and size/slide then the loop initializes the size different window objects within the given window slide .

Slidingeventtimewindows

A SlidingProcessingTimeWindows window-like start parameter is computed in a way that relies on the system timestamp.

Evictor

Evitor: Chinese as expulsion; it is used to exclude certain elements in a window.

It is time to reject the element: after the trigger is triggered, before the window is processed (apply windowfunction)

The interface defines only one method:

    intintsizewindow);

The return value of the interface indicates the number of elements to reject.

Built-in Evitor

The Flink has three built-in implementations Evitor :

    • Timeevitor
    • Countevitor
    • Deltaevitor
Timeevitor

The Evitor is based on a given retention period (keep time) as the culling rule, which is broadly implemented as follows:

 public   int  evict  (iterable<streamrecord<object>> elements, int  size, W window) {int  toevict = 0 ;        long  currenttime = iterables.getlast (elements). Gettimestamp ();        long  evictcutoff = currenttime-windowsize; for  (streamrecord<object> record:elements) {if  (Record.gettimestamp () > Evictcutoff) {break ;        } toevict++;    } return  toevict; }

The general logic is to take the timestamp of the last element as the "current" time, then subtract the desired "window size" and get a base timestamp (only those elements that are larger than the base timestamp) are taken.

Then, starting from the first element, the loop compares each element, if it is smaller than the base time stamp, it accumulates the reject statistic, and once the timestamp of an element is greater than the base timestamp, it jumps out of the loop and no longer accumulates (because the elements in the local window are time-ordered, which is guaranteed by the Flink runtime. If the timestamp is greater than the base timestamp from an element, then all subsequent elements satisfy this condition, so there is no need to loop anymore.

Countevictor

Capacity-based Evictor, which evict size determines how many elements should be rejected by the second parameter of the method. The specific implementation:

    intintsizewindow) {        if (size > maxCount) {            return (int) (size - maxCount);        else {            return0;        }    }
Deltaevictor

Based on a given threshold threshold and deltaFunction to be judged. It is also a comparison between the current element and the last element to calculate the delta and threshold values.

Time

There is only one class in Flink that Time defines the window's time interval. The time by default refers to the time in the execution environment. To create an Time object, you need two parameters:

    • Size: The amount of time interval (numeric)
    • Unit: TimeUnit An instance of the units that represents the time interval

Many of the static methods provided by this class provide settings for different unit.

Trigger

A Trigger (trigger) is used to determine when a window's collection of elements triggers a calculation and when the result is emit.

In coarse-grained terms, Flink provides three types of triggering methods:

    • by element
    • by system time
    • By event Time

This is reflected in the three main abstract methods of trigger:

    • Onelement: Triggered for each element, this is primarily for element-based triggers, such as what we'll see laterCountTrigger
    • Onprocessingtime: Triggered by processing-time(flink system time stamp) timer
    • Oneventtime: Triggered by event-time(event timestamp) timer

All of these methods have a common parameter: TriggerContext .

Triggercontext

As the name implies, it provides context information when the trigger executes, but it is just Trigger the internal interface:

    • Getcurrentwatermark: Returns the currentwatermark
    • Registerprocessingtimetimer: Registers a timer for a system time, triggersonProcessingTime
    • Registereventtimetimer: Registers an event time timer, triggeringonEventTime
    • Deleteprocessingtimetimer: Timer to remove system time
    • Deleteeventtimetimer: Timer to delete event time
    • Getpartitionedstate: Interface for acquiring state for failed recovery

Among them, the REGISTERXXX/DELETEXXX mode is mainly for the above two time -based triggers. And the last method is getKeyValueState also very important, because it is used to get the state of the window, for example, some of the triggers are dependent on some context state, and those states are relying on this method.

Trigerresult

TriggerAfter the three triggering methods defined in the method are called, the final result is to return an outcome to determine the behavior that occurs after the trigger (for example, to call the window function or discard the Windows), which is expressed by the definition trigger trigger behavior TriggerResult . It is an enumeration type with so many enumerated values:

    • The Fire:window will be evaluated using the window function and then emit the result, but the element is not cleaned and still in the window
    • PURGE: Clears the elements in the window
    • FIRE_AND_PURGE: Simultaneous FIRE behavior with PURGE two types of attributes
    • CONTINUE: No action is done
Built-in Trigger

Flink built-in implementation of a number of triggers, the complete class diagram is as follows:

These triggers have some commonality, as explained here:

    • Because Flink Trigger has previously encapsulated callbacks of various trigger types into different methods (OnXXX), the core logic of the subsequent various types of triggers will be primarily in their particular related onXXX methods, and the unrelated OnXXX method will return directly TriggerResult.CONTINUE ( In fact, personally think this design method is not appropriate, because it is not conducive to expansion )
    • Because there are a number of trigger types that depend on certain state values of the context (such as the typical continuousxxxtrigger below), these state values are TriggerContext accessed through the getPartitionedState method
Eventtimetrigger

Trigger based on event time, corresponding toonEventTime

Processingtimetrigger

Triggers based on the current system time, corresponding toonProcessingTime

Continuouseventtimetrigger

The trigger is a trigger that is continuously triggered at the specified time interval based on the event time, and its first trigger depends on Watermark . The first-time trigger judgment is in onelement , which registers the next (and first) time the eventtime timer is triggered, and then identifies its primary status as false. The implementation is as follows:

    publiconElementlong timestamp, W window, TriggerContext ctx) throws Exception {        ValueState<Boolean> first = ctx.getPartitionedState(stateDesc);        if (first.value()) {            long start = timestamp - (timestamp % interval);            long nextFireTimestamp = start + interval;            ctx.registerEventTimeTimer(nextFireTimestamp);            first.update(false);            return TriggerResult.CONTINUE;        }        return TriggerResult.CONTINUE;    }

A continuous trigger relies on the onEventTime timer that keeps registering the next trigger in:

    publiconEventTime(long time, W window, TriggerContext ctx) {        ctx.registerEventTimeTimer(time + interval);        return TriggerResult.FIRE;    }
Continuousprocessingtimetrigger

A trigger that is continuously triggered at a specified time interval based on the system time, and is based on the saved state value fire-timestamp to determine if a trigger is required, although its cyclic registration process is in progress onElement .

Counttrigger

Triggered based on a given cumulative value, since the accumulated value is not based on time but on element-based, all of its triggering mechanisms are implemented in onElement , the logic is simple, the first summation if greater than the given threshold is triggered:

    publiclong timestamp, W window, TriggerContext ctx) throws IOException {        count = ctx.getPartitionedState(stateDesc);        longcount1;        count.update(currentCount);        if (currentCount >= maxCount) {            count.update(0L);            return TriggerResult.FIRE;        }        return TriggerResult.CONTINUE;    }
Purgingtrigger

The trigger is similar to a wrapper that transforms any given trigger into a purging trigger. Its implementation mechanism is that it receives a trigger instance, then executes the corresponding onxxx of the instance on each onxxx callback and obtains TriggerResult the instance, makes the corresponding judgment, and finally returns the FIRE_AND_PURGE enumeration value.

Deltatrigger

Based on DeltaFunction a given threshold trigger, the trigger calculates a delta value between the last arriving element and the current element compared to a given threshold, which is triggered if it is higher than the given threshold value. Because it is element-based, the main logic is implemented in onElement .

Summary

This article is also focused on the analysis of the concept of the window, for the moment they are not too much relevance, which we will later analyze how they relate to implement the complete window mechanism.

Scan code Attention public number: Apache_flink

Apache Flink Source Parsing Stream-window

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.